将值添加到数据框的所有行

Question

我有两个 pandas 数据帧 df1（长度为 2）和 df2（长度约 30 行）。 df1 的索引值总是不同的，并且永远不会出现在 df2 中。我想将 df1 的列的平均值添加到 df2[=23= 的相应列]。示例：将 0.6 添加到 c1 的所有行，将 0.9 添加到 c2 的所有行等...

df1: Date c1 c2 c3 c4 c5 c6 ... c10 2017-09-10 0.5 0.6 1.2 0.7 1.3 1.8 ... 1.3 2017-09-11 0.7 1.2 1.3 0.4 0.7 0.4 ... 1.5 df2: Date c1 c2 c3 c4 c5 c6 ... c10 2017-09-12 0.9 0.1 1.4 0.9 1.5 1.9 ... 1.9 2017-09-13 0.2 1.8 1.2 1.4 2.7 0.8 ... 1.1 : : : : 2017-10-10 1.5 0.9 1.5 0.9 1.6 1.8 ... 1.7 2017-10-11 2.7 1.1 1.9 0.4 0.8 0.8 ... 1.3

我该怎么做？

Answer 1

如果所有列都在两个数据框中，则只需

for col in df2.columns:
    df2[col] = df2[col] + df1[col].mean()

如果列不一定在两者中，则：

for col in df2.columns:
    if col in df1.columns:
        df2[col] = df2[col] + df1[col].mean()

Answer 2

可能有更有效的方法，但这里有一个快速而肮脏的解决方案。希望对您有所帮助！

d = {'c1': [0.5,0.7], 'c2': [0.6,1.2],'c3': [1.2,1.3]}
df1 = pd.DataFrame(data=d, index=['2017-09-10','2017-09-11'])
df2 = pd.DataFrame(data=d, index=['2017-09-12','2017-09-13'])

df1

      Date   c1 c2  c3
2017-09-10  0.5 0.6 1.2
2017-09-11  0.7 1.2 1.3

df2

Date   c1   c2  c3
2017-09-12  0.5 0.6 1.2
2017-09-13  0.7 1.2 1.3

df1中各列的平均值可以使用describe()函数得到

df1.describe().ix['mean']

c1    0.60
c2    0.90
c3    1.25

现在，只需将系列添加到 df2

df2 + df1.describe().ix['mean']

Date     c1 c2  c3
2017-09-12  1.1 1.5 2.45
2017-09-13  1.3 2.1 2.55

Answer 3

在 df1 上使用 mean 时，默认情况下会计算每一列并生成 pd.Series。

添加 pd.Series 到 pd.DataFrame 时，它会将 pd.Series 的索引与 pd.DataFrame 的列对齐，并沿 pd.DataFrame 的索引广播=18=]...默认。

唯一棘手的一点是处理 Date 列。

选项 1

m = df1.mean()
df2.loc[:, m.index] += m

df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

如果我知道 'Date' 总是在第一列，我可以：

df2.iloc[:, 1:] += df1.mean()
df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

选项 2
请注意，我在 set_index 中使用了 append=True 参数，以防索引中有您不想弄乱的内容。

df2.set_index('Date', append=True).add(df1.mean()).reset_index('Date')

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

如果你不关心索引，你可以将其缩短为

df2.set_index('Date').add(df1.mean()).reset_index()

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

Answer 4

这可能是另一种方法，只是稍微简化了一点

import pandas as pd
import numpy as np
from datetime import datetime, timedelta 
date_today=datetime.now()

#Creating df1 & df2 
df1=pd.DataFrame(
    {
        'Date':[date_today,date_today],
        'c1':[0.5,0.4],
        'c2':[0.6,0.3]
    }
)
df2=pd.DataFrame(
    {
        'Date':[date_today,date_today,date_today],
        'c1':[0.9,0.7,0.6],
        'c2':[0.8,0.4,0.3]
    }
)


#getting average of column c1
avg=df1["c1"].mean()

#Adding the average to your existing column of df2
df2['c1']+avg

将值添加到数据框的所有行

Adding values to all rows of dataframe

python

addition

dataframe

pandas