Pandas

Question

我有一个 DataFrame (df1)

index abc bcd def 
20150101 0.5 0.3 0.2
20150102 0.7 0.9 1.6
20150103 1.7 2.9 4.6
.................

第二个数据帧 (df2)

index a b c ...(about 100 columns)
0 0 1 8 ...
1 9 5 3 ...
2 2 3 7 ..

我想遍历第二个数据框中的每一列，并且需要在每个循环中形成一个数据框，例如

 index abc bcd def col
 20150101 0.5 0.3 0.2 0
 20150102 0.7 0.9 1.6 9
 20150103 1.7 2.9 4.6 2

并且需要处理这个新数据框以进行其他计算

我是运行这个，

    for col in df2.iteritems():
        df1['new_col'] = col

出现错误：ValueError：值的长度与索引的长度不匹配

如果我像

那样从 col 形成一个系列

    for col in df2.iteritems():
        c = col[1].astype(float)
        s = pd.Series(c)
         dfb['col'] = s

给予

index abc bcd def col
20150101 0.5 0.3 0.2 NaN
20150102 0.7 0.9 1.6 NaN
20150103 1.7 2.9 4.6 NaN

请提出解决方案。提前致谢！

Answer 1

我想你可以试试concat:

for col in df2.columns:
    print pd.concat([df1.reset_index(),df2[col]], axis=1)

      index  abc  bcd  def  a
0  20150101  0.5  0.3  0.2  0
1  20150102  0.7  0.9  1.6  9
2  20150103  1.7  2.9  4.6  2
      index  abc  bcd  def  b
0  20150101  0.5  0.3  0.2  1
1  20150102  0.7  0.9  1.6  5
2  20150103  1.7  2.9  4.6  3
      index  abc  bcd  def  c
0  20150101  0.5  0.3  0.2  8
1  20150102  0.7  0.9  1.6  3
2  20150103  1.7  2.9  4.6  7

编辑：

IIUC 你只需要 select DataFrames，最后一列至少有一个值 1，使用 any:

dfs = {}

for col in df2.columns:
    df = pd.concat([df1.reset_index(),df2[col]], axis=1)
    #print df
    if (df2[col] == 1).any():
        print df
        #storing in dictionary of dataframes  
        dfs[col] = df
      index  abc  bcd  def  b
0  20150101  0.5  0.3  0.2  1
1  20150102  0.7  0.9  1.6  5
2  20150103  1.7  2.9  4.6  3       

print dfs['b']        
      index  abc  bcd  def  b
0  20150101  0.5  0.3  0.2  1
1  20150102  0.7  0.9  1.6  5
2  20150103  1.7  2.9  4.6  3

Pandas - 遍历数据框的列并将此列与其他数据框连接

Pandas - looping through columns of a dataframe and join this column with other dataframe

python

loops

multiple-columns

dataframe