pandas groupby.sum 的规范化
pandas normalization of groupby.sum
我有一个 pandas 数据框,如下所示:
**I SI weights**
1 3 0.3
2 4 0.2
1 3 0.5
1 5 0.5
我需要这样做:给定一个I值,考虑SI的每个值并加上总权重。最后,对于每一个实现,我应该有这样的东西:
I = 1 SI = 3 weight = 0.8
SI = 5 weight = 0.5
I = 2 SI = 4 weight = 0.2
这很容易通过调用 groupby 和 sum 来实现:
name = ['I', 'SI','weight']
Location = 'Simulationsdata/prova.csv'
df = pd.read_csv(Location, names = name,sep='\t',encoding='latin1')
results = df.groupby(['I', 'real', 'SI']).weight.sum()
现在我希望将权重归一化为 1,因此它应该是这样的:
I = 1 SI = 3 weight = 0.615
SI = 5 weight = 0.385
I = 2 SI = 4 weight = 1
我试过这个:
for idx2, j in enumerate(results.index.get_level_values(1).unique()):
norm = [float(i)/sum(results.loc[j]) for i in results.loc[j]]
但是当我尝试为每个 I 绘制 SI 的分布时,我发现 SI 也被归一化了,我不希望这种情况发生。
P.s。这个问题和有关,但是涉及到问题的另外一个方面,我觉得还是分开问比较好
您应该能够将 weight
列除以它自己的总和:
# example data
df
I SI weight
0 1 3 0.3
1 2 4 0.2
2 1 3 0.5
3 1 5 0.5
# two-level groupby, with the result as a DataFrame instead of Series:
# df['col'] gives a Series, df[['col']] gives a DF
res = df.groupby(['I', 'SI'])[['weight']].sum()
res
weight
I SI
1 3 0.8
5 0.5
2 4 0.2
# Get the sum of weights for each value of I,
# which will serve as denominators in normalization
denom = res.groupby('I')['weight'].sum()
denom
I
1 1.3
2 0.2
Name: weight, dtype: float64
# Divide each result value by its index-matched
# denominator value
res.weight = res.weight / denom
res
weight
I SI
1 3 0.615385
5 0.384615
2 4 1.000000
我有一个 pandas 数据框,如下所示:
**I SI weights**
1 3 0.3
2 4 0.2
1 3 0.5
1 5 0.5
我需要这样做:给定一个I值,考虑SI的每个值并加上总权重。最后,对于每一个实现,我应该有这样的东西:
I = 1 SI = 3 weight = 0.8
SI = 5 weight = 0.5
I = 2 SI = 4 weight = 0.2
这很容易通过调用 groupby 和 sum 来实现:
name = ['I', 'SI','weight']
Location = 'Simulationsdata/prova.csv'
df = pd.read_csv(Location, names = name,sep='\t',encoding='latin1')
results = df.groupby(['I', 'real', 'SI']).weight.sum()
现在我希望将权重归一化为 1,因此它应该是这样的:
I = 1 SI = 3 weight = 0.615
SI = 5 weight = 0.385
I = 2 SI = 4 weight = 1
我试过这个:
for idx2, j in enumerate(results.index.get_level_values(1).unique()):
norm = [float(i)/sum(results.loc[j]) for i in results.loc[j]]
但是当我尝试为每个 I 绘制 SI 的分布时,我发现 SI 也被归一化了,我不希望这种情况发生。
P.s。这个问题和
您应该能够将 weight
列除以它自己的总和:
# example data
df
I SI weight
0 1 3 0.3
1 2 4 0.2
2 1 3 0.5
3 1 5 0.5
# two-level groupby, with the result as a DataFrame instead of Series:
# df['col'] gives a Series, df[['col']] gives a DF
res = df.groupby(['I', 'SI'])[['weight']].sum()
res
weight
I SI
1 3 0.8
5 0.5
2 4 0.2
# Get the sum of weights for each value of I,
# which will serve as denominators in normalization
denom = res.groupby('I')['weight'].sum()
denom
I
1 1.3
2 0.2
Name: weight, dtype: float64
# Divide each result value by its index-matched
# denominator value
res.weight = res.weight / denom
res
weight
I SI
1 3 0.615385
5 0.384615
2 4 1.000000