数据框并在 for 循环中更新新列值

Dataframe and updating a new column value in a for loop

我正在尝试使用方法和 forloop 更新数据框中的值。我将数据帧传递给方法并使用 for 循环来计算我想要放入最后一列的值。

方法在这里

def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00

for i, row in df.iterrows():
    #Get all values from each row
    
    #Find typical price
    tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
    tpv = tp * row['VOLUME']
    sumTpv= sumTpv + tpv
    sumVolume = sumVolume + row['VOLUME']
    vwap = sumTpv / sumVolume
    #Find VWAP
    #df.assign(VWAP = vwap)
    #row.assign(VWAP = vwap)
    #row["VWAP"] = vwap
    df.set_value(row, 'VWAP', vwap)
    df = df.reindex(row = row)
    df[row] = df[row].astype(float)
    dayVwap = dayVwap + vwap
    
    

print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df

并且在将 df 传递给方法之前,Dataframe 中已经包含了我添加的列。像这样

df["VWAP"] = ""
#do vwap calculation
df = vwap2(df)

但是值要么都一样,要么不写,要么不写。我尝试了几件事但没有成功。

更新

这是我正在使用的数据,我每次都从 Google 中提取它:

                       CLOSE   HIGH      LOW    OPEN  VOLUME        TP  \
 2018-05-10 22:30:00  97.3600  97.48  97.3000  97.460  371766  97.86375   
 1525991460000000000  97.2900  97.38  97.1800  97.350  116164  97.86375   
 1525991520000000000  97.3100  97.38  97.2700  97.270   68937  97.86375   
 1525991580000000000  97.3799  97.40  97.3101  97.330   46729  97.86375   
 1525991640000000000  97.2200  97.39  97.2200  97.365   64823  97.86375   

                          TPV        SumTPV    SumVol       VWAP  
 2018-05-10 22:30:00  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991460000000000  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991520000000000  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991580000000000  3.722224e+08  1.785290e+09  18291710  97.601027  
 1525991640000000000  3.722224e+08  1.785290e+09  18291710  97.601027  

如您所见,所有计算的内容都是相同的。

这是我现在正在使用的。

def vwap2(df):
sumTpv = 0.00
sumVolume = 0
dayVwap = 0.00

for i, row in df.iterrows():
    #Get all values from each row
    
    #Find typical price
    tp = (row['HIGH'] + row['LOW'] + row['CLOSE'] + row['OPEN']) / 4
    df['TP'] = tp
    
    tpv = tp * row['VOLUME']
    df['TPV'] = tpv
    
    sumTpv= sumTpv + tpv
    df['SumTPV'] = sumTpv
    
    sumVolume = sumVolume + row['VOLUME']
    df['SumVol'] = sumVolume
    
    vwap = sumTpv / sumVolume
    #Find VWAP
    #row.assign(VWAP = vwap)
    #row["VWAP"] = vwap
    #df.set_value(row, 'VWAP', vwap)
    df["VWAP"] = vwap
    dayVwap = dayVwap + vwap
    
    
print('Day VWAP = ', dayVwap)
print('TPV sum = ', sumTpv)
print('Day Volume = ', sumVolume)
return df

IIUC,您不需要循环,甚至 apply - 您可以使用直接列分配和 cumsum() 来获得您要查找的内容。

一些示例数据:

import numpy as np
import pandas as pd

N = 20
high = np.random.random(N)
low = np.random.random(N)
close = np.random.random(N)
opening = np.random.random(N)
volume = np.random.random(N)
data = {"HIGH":high, "LOW":low, "CLOSE":close, "OPEN":opening, "VOLUME":volume}
df = pd.DataFrame(data)

df.head()
      CLOSE      HIGH       LOW      OPEN    VOLUME
0  0.848676  0.260967  0.004188  0.139342  0.931406
1  0.771065  0.356639  0.495715  0.652106  0.988217
2  0.288206  0.567776  0.023687  0.809410  0.134134
3  0.832711  0.508586  0.031569  0.120774  0.891948
4  0.857051  0.391618  0.155635  0.069054  0.628036

直接赋值tptpv列,然后应用cumsum得到sumTpvsumVolume:

df["tp"] = (df['HIGH'] + df['LOW'] + df['CLOSE'] + df['OPEN']) / 4
df["tpv"] = df.tp * df['VOLUME']
df["sumTpv"] = df.tpv.cumsum()
df["sumVolume"] = df.VOLUME.cumsum()
df["vwap"] = df.sumTpv.div(df.sumVolume)

df.head()
      CLOSE      HIGH       LOW      OPEN    VOLUME        tp       tpv  \
0  0.848676  0.260967  0.004188  0.139342  0.931406  0.313293  0.291803   
1  0.771065  0.356639  0.495715  0.652106  0.988217  0.568881  0.562178   
2  0.288206  0.567776  0.023687  0.809410  0.134134  0.422270  0.056641   
3  0.832711  0.508586  0.031569  0.120774  0.891948  0.373410  0.333063   
4  0.857051  0.391618  0.155635  0.069054  0.628036  0.368340  0.231331   

     sumTpv  sumVolume      vwap  
0  0.291803   0.931406  0.313293  
1  0.853982   1.919624  0.444869  
2  0.910622   2.053758  0.443393  
3  1.243685   2.945706  0.422203  
4  1.475016   3.573742  0.412737  

更新(根据 OP 评论):
要将 dayVwap 作为所有 vwap 的总和,请使用 dayVwap = df.vwap.sum().