将 Pandas 数据帧行中的值读入方程并将结果输入回数据帧

Reading values from Pandas dataframe rows into equations and entering result back into dataframe

我有一个数据框。对于数据帧的每一行:我需要从两个列索引中读取值,将这些值传递给一组方程,将每个方程的结果输入到同一行中它自己的列索引中,转到下一行并重复。

阅读类似问题的回复后,我尝试了:

import pandas as pd

DF = pd.read_csv("...")

Equation_1 = f(x, y)
Equation_2 = g(x, y)

for index, row in DF.iterrows():
    a = DF[m]
    b = DF[n]

    DF[p] = Equation_1(a, b)
    DF[q] = Equation_2(a, b)

此代码不是遍历 DF,为每一行读取和输入新值,而是遍历 DF 并为每一行输入相同的值。我不确定我在这里做错了什么。

此外,根据我的阅读,将 DF 视为 NumPy 数组并立即对整个数组执行计算而不是迭代实际上更快。不知道我会怎么做。

谢谢。

# If your equations are simple enough, do operations column-wise in Pandas:

import pandas as pd

test = pd.DataFrame([[1,2],[3,4],[5,6]])
test # Default column names are 0, 1
test[0] # This is column 0 
test.icol(0) # This is COLUMN 0-indexed, returned as a Series 
test.columns=(['S','Q']) # Column names are easier to use
test #Column names! Use them column-wise:
test['result'] = test.S**2 + test.Q
test # results stored in DataFrame

# For more complicated stuff, try apply, as in Python pandas apply on more columns :

def toyfun(df):
    return df[0]-df[1]**2


test['out2']=test[['S','Q']].apply(toyfun, axis=1)

# You can also define the column names when you generate the DataFrame:
test2 = pd.DataFrame([[1,2],[3,4],[5,6]],columns = (list('AB')))

事实证明,这非常容易。必须做的就是定义两个变量并为它们分配所需的列。然后将“要替换的行”设置为等同于包含变量的方程式。

Pandas 已经知道它必须将等式应用于每一行并且 return 每个值都应用于其正确的索引。我没有意识到会这么简单,正在寻找更明确的代码。

例如,

import pandas as pd

df = pd.read_csv("...") # df is a large 2D array

A = df[0]
B = df[1]

f(A,B) = ....

df[3] = f(A,B)