粒子分布的熊猫数据框:按 ID 分组并找到半通量和半通量半径
Panda dataframe of distribution of particles: group by ID and find the half flux and the half flux radius
我正在使用熊猫数据框;我有一个粒子分布,它们与分布中心的距离,以及相关的通量。我想找到包含在“半通量半径”(或“半光半径”)中的总通量,根据定义,它是包含一半通量的半径。我给你举个例子,然后我问你是否知道如何制作它。
这里我列出了2个粒子分布,由dist_ID标识,每个粒子到分布中心R的距离,以及每个粒子的通量。
dist_ID R flux
0 702641.0 5.791781 0.097505
1 702641.0 2.806051 0.015750
2 702641.0 3.254907 0.086941
3 702641.0 8.291544 0.081764
4 702641.0 4.901959 0.053561
5 702641.0 8.630691 0.144661
...
228 802663.0 95.685763 0.025735
229 802663.0 116.070396 0.026012
230 802663.0 112.806001 0.022163
231 802663.0 229.388117 0.026154
例如,考虑dist_ID=702641.0
的粒子分布,粒子分布的总通量为“通量”之和:total_flux=0.48
;
半通量为 half_flux=total_flux/2.=0.24
;
包围一半通量的半径是R_2<R_hf<R_3
(其中粒子2的R_2=3.25
;粒子3的R_3=8.29
),所以我认为R_h
是那个的上限间隔,即 R_hf=R_3
.
我想要一种方法,按 dist_ID
和 Panda 数据框分组,每个分布的 half_flux
和 R_hf
。谢谢
如果你想要一半的通量,可以通过
来完成
df.groupby("dist_ID").apply(lambda x: x.flux.sum()/2)
输出
dist_ID
702641.0 16.838466
802663.0 276.975139
dtype: float64
不确定您想如何计算半径,但希望这能帮助您解决问题。
可以这样做:
import pandas as pd
data = {'dist_ID': [702641.0,702641.0,702641.0,702641.0,702641.0,702641.0,802663.0,802663.0,802663.0,802663.0],
'R': [5.791781,2.806051,3.254907,8.291544,4.901959,8.630691,95.685763,116.070396,112.806001,229.388117],
'flux': [0.097505,0.015750,0.086941,0.081764,0.053561,0.144661,0.025735,0.026012,0.022163,0.026154]}
df = pd.DataFrame(data)
# Sort DF
df = df.sort_values(['dist_ID', 'R'])
# Calculate cumsum
df['flux_cumsum'] = df.groupby('dist_ID')['flux'].transform(pd.Series.cumsum)
# Calculate half_flux
df_halfflux = df.groupby('dist_ID').apply(lambda x: x.flux.sum() / 2).to_frame().rename(columns={0:'half_flux'})
df = pd.merge(df,df_halfflux, how="left", on=['dist_ID'])
# Calculate discrepancy
df['flux_diff'] = abs(df.half_flux- df.flux_cumsum)
print(df)
# Find R_hf-row
df = df.groupby(['dist_ID', 'half_flux']).agg({'flux_diff': 'min'}).rename(columns={'flux_diff': 'R_hf'})
print(df)
上层代码输出:
dist_ID R flux flux_cumsum half_flux flux_diff
0 702641.0 2.806051 0.015750 0.015750 0.240091 0.224341
1 702641.0 3.254907 0.086941 0.102691 0.240091 0.137400
2 702641.0 4.901959 0.053561 0.156252 0.240091 0.083839
3 702641.0 5.791781 0.097505 0.253757 0.240091 0.013666
4 702641.0 8.291544 0.081764 0.335521 0.240091 0.095430
5 702641.0 8.630691 0.144661 0.480182 0.240091 0.240091
6 802663.0 95.685763 0.025735 0.025735 0.050032 0.024297
7 802663.0 112.806001 0.022163 0.047898 0.050032 0.002134
8 802663.0 116.070396 0.026012 0.073910 0.050032 0.023878
9 802663.0 229.388117 0.026154 0.100064 0.050032 0.050032
R_hf
dist_ID half_flux
702641.0 0.240091 0.013666
802663.0 0.050032 0.002134
我正在使用熊猫数据框;我有一个粒子分布,它们与分布中心的距离,以及相关的通量。我想找到包含在“半通量半径”(或“半光半径”)中的总通量,根据定义,它是包含一半通量的半径。我给你举个例子,然后我问你是否知道如何制作它。
这里我列出了2个粒子分布,由dist_ID标识,每个粒子到分布中心R的距离,以及每个粒子的通量。
dist_ID R flux
0 702641.0 5.791781 0.097505
1 702641.0 2.806051 0.015750
2 702641.0 3.254907 0.086941
3 702641.0 8.291544 0.081764
4 702641.0 4.901959 0.053561
5 702641.0 8.630691 0.144661
...
228 802663.0 95.685763 0.025735
229 802663.0 116.070396 0.026012
230 802663.0 112.806001 0.022163
231 802663.0 229.388117 0.026154
例如,考虑dist_ID=702641.0
的粒子分布,粒子分布的总通量为“通量”之和:total_flux=0.48
;
半通量为 half_flux=total_flux/2.=0.24
;
包围一半通量的半径是R_2<R_hf<R_3
(其中粒子2的R_2=3.25
;粒子3的R_3=8.29
),所以我认为R_h
是那个的上限间隔,即 R_hf=R_3
.
我想要一种方法,按 dist_ID
和 Panda 数据框分组,每个分布的 half_flux
和 R_hf
。谢谢
如果你想要一半的通量,可以通过
来完成df.groupby("dist_ID").apply(lambda x: x.flux.sum()/2)
输出
dist_ID
702641.0 16.838466
802663.0 276.975139
dtype: float64
不确定您想如何计算半径,但希望这能帮助您解决问题。
可以这样做:
import pandas as pd
data = {'dist_ID': [702641.0,702641.0,702641.0,702641.0,702641.0,702641.0,802663.0,802663.0,802663.0,802663.0],
'R': [5.791781,2.806051,3.254907,8.291544,4.901959,8.630691,95.685763,116.070396,112.806001,229.388117],
'flux': [0.097505,0.015750,0.086941,0.081764,0.053561,0.144661,0.025735,0.026012,0.022163,0.026154]}
df = pd.DataFrame(data)
# Sort DF
df = df.sort_values(['dist_ID', 'R'])
# Calculate cumsum
df['flux_cumsum'] = df.groupby('dist_ID')['flux'].transform(pd.Series.cumsum)
# Calculate half_flux
df_halfflux = df.groupby('dist_ID').apply(lambda x: x.flux.sum() / 2).to_frame().rename(columns={0:'half_flux'})
df = pd.merge(df,df_halfflux, how="left", on=['dist_ID'])
# Calculate discrepancy
df['flux_diff'] = abs(df.half_flux- df.flux_cumsum)
print(df)
# Find R_hf-row
df = df.groupby(['dist_ID', 'half_flux']).agg({'flux_diff': 'min'}).rename(columns={'flux_diff': 'R_hf'})
print(df)
上层代码输出:
dist_ID R flux flux_cumsum half_flux flux_diff
0 702641.0 2.806051 0.015750 0.015750 0.240091 0.224341
1 702641.0 3.254907 0.086941 0.102691 0.240091 0.137400
2 702641.0 4.901959 0.053561 0.156252 0.240091 0.083839
3 702641.0 5.791781 0.097505 0.253757 0.240091 0.013666
4 702641.0 8.291544 0.081764 0.335521 0.240091 0.095430
5 702641.0 8.630691 0.144661 0.480182 0.240091 0.240091
6 802663.0 95.685763 0.025735 0.025735 0.050032 0.024297
7 802663.0 112.806001 0.022163 0.047898 0.050032 0.002134
8 802663.0 116.070396 0.026012 0.073910 0.050032 0.023878
9 802663.0 229.388117 0.026154 0.100064 0.050032 0.050032
R_hf
dist_ID half_flux
702641.0 0.240091 0.013666
802663.0 0.050032 0.002134