将值转换为 %

Converting values into %

我在下面有一个数据框,其中包含每家商店的肉类、蔬菜和面包的销售额。我想把值转换成%,比如Store N的值会变成74%、7%和19%。换句话说,74%是肉的销售额占N店总销售额的百分比。最简单的做法是什么?

import pandas as pd

df=pd.DataFrame({'Store':['N','S','E','W']
                    ,'Meat':[200,250,100,400]
                    ,'Veg':[20,100,30,80]
                    ,'Bread':[50,230,150,100]})
df=df[['Store','Meat','Veg','Bread']]    

您可以手动计算百分比:

df['MeatPerc'] = df['Meat']/df['Meat'].sum()

不使用循环的纯 pandas 解决方案是:

df.ix[:, 1:] = (df.ix[:, 1:].T / df.ix[:, 1:].sum(1)).T
print(df)

结果:

  Store      Meat       Veg     Bread
0     N  0.740741  0.074074  0.185185
1     S  0.431034  0.172414  0.396552
2     E  0.357143  0.107143  0.535714
3     W  0.689655  0.137931  0.172414

您还可以将 pandas.apply 与 lambda 函数一起使用:

df.ix[:, 1:]=df.ix[:,1:].apply(lambda x: x*100/x.sum(), axis=1)

这给你:

  Store       Meat        Veg      Bread
0     N  74.074074   7.407407  18.518519
1     S  43.103448  17.241379  39.655172
2     E  35.714286  10.714286  53.571429
3     W  68.965517  13.793103  17.241379

你可以先set_index with column Store, then divide by div of sum and last reset_index:

df.set_index('Store', inplace=True)
df = df.div(df.sum(1), axis=0)
print (df.reset_index())
  Store      Meat       Veg     Bread
0     N  0.740741  0.074074  0.185185
1     S  0.431034  0.172414  0.396552
2     E  0.357143  0.107143  0.535714
3     W  0.689655  0.137931  0.172414

通过 ix or iloc 选择的另一种解决方案:

df.ix[:,'Meat':] = df.ix[:,'Meat':].div(df.ix[:,'Meat':].sum(1), axis=0)
print (df)
  Store      Meat       Veg     Bread
0     N  0.740741  0.074074  0.185185
1     S  0.431034  0.172414  0.396552
2     E  0.357143  0.107143  0.535714
3     W  0.689655  0.137931  0.172414

df.iloc[:,1:] = df.iloc[:,1:].div(df.iloc[:,1:].sum(1), axis=0)
print (df)
  Store      Meat       Veg     Bread
0     N  0.740741  0.074074  0.185185
1     S  0.431034  0.172414  0.396552
2     E  0.357143  0.107143  0.535714
3     W  0.689655  0.137931  0.172414

时间:

In [187]: %timeit (jez1(df))
100 loops, best of 3: 4.07 ms per loop

In [188]: %timeit (jez2(df1))
100 loops, best of 3: 5.61 ms per loop

In [189]: %timeit (jez3(df2))
100 loops, best of 3: 5.44 ms per loop

In [190]: %timeit (ric(df3))
100 loops, best of 3: 6.18 ms per loop

In [191]: %timeit (ogi(df4))
1 loop, best of 3: 2.2 s per loop

计时代码s:

#random dataframe
np.random.seed(100)

#10 data columns + first Store col, 10k rows
df = pd.DataFrame(np.random.randint(10, size=(10000,10)), columns=list('ABCDEFGHIJ'))
df.index = 'a' + df.index.astype(str)
df = df.reset_index().rename(columns={'index':'Store'})
print (df)
df1, df2, df3, df4 = df.copy(), df.copy(), df.copy(), df.copy()

def jez1(df):
    df = df.set_index('Store')
    df = 100 * df.div(df.sum(1), axis=0)
    return (df.reset_index())


def jez2(df):
    df.ix[:,'A':] = df.ix[:,'A':].div(df.ix[:,'A':].sum(1), axis=0)
    return df
def jez3(df):    
    df.iloc[:,1:] = df.iloc[:,1:].div(df.iloc[:,1:].sum(1), axis=0)
    return df

def ric(df):    
    df.ix[:, 1:] = (df.ix[:, 1:].T / df.ix[:, 1:].sum(1)).T
    return df

def ogi(df):    
    df.ix[:, 1:]=df.ix[:,1:].apply(lambda x: x/x.sum(), axis=1)
    return df    

print (jez1(df))
print (jez2(df1)) 
print (jez3(df2)) 
print (ric(df3))
print (ogi(df4))