将 Pandas 数据帧行中的值读入方程并将结果输入回数据帧
Reading values from Pandas dataframe rows into equations and entering result back into dataframe
我有一个数据框。对于数据帧的每一行:我需要从两个列索引中读取值,将这些值传递给一组方程,将每个方程的结果输入到同一行中它自己的列索引中,转到下一行并重复。
阅读类似问题的回复后,我尝试了:
import pandas as pd
DF = pd.read_csv("...")
Equation_1 = f(x, y)
Equation_2 = g(x, y)
for index, row in DF.iterrows():
a = DF[m]
b = DF[n]
DF[p] = Equation_1(a, b)
DF[q] = Equation_2(a, b)
此代码不是遍历 DF,为每一行读取和输入新值,而是遍历 DF 并为每一行输入相同的值。我不确定我在这里做错了什么。
此外,根据我的阅读,将 DF 视为 NumPy 数组并立即对整个数组执行计算而不是迭代实际上更快。不知道我会怎么做。
谢谢。
# If your equations are simple enough, do operations column-wise in Pandas:
import pandas as pd
test = pd.DataFrame([[1,2],[3,4],[5,6]])
test # Default column names are 0, 1
test[0] # This is column 0
test.icol(0) # This is COLUMN 0-indexed, returned as a Series
test.columns=(['S','Q']) # Column names are easier to use
test #Column names! Use them column-wise:
test['result'] = test.S**2 + test.Q
test # results stored in DataFrame
# For more complicated stuff, try apply, as in Python pandas apply on more columns :
def toyfun(df):
return df[0]-df[1]**2
test['out2']=test[['S','Q']].apply(toyfun, axis=1)
# You can also define the column names when you generate the DataFrame:
test2 = pd.DataFrame([[1,2],[3,4],[5,6]],columns = (list('AB')))
事实证明,这非常容易。必须做的就是定义两个变量并为它们分配所需的列。然后将“要替换的行”设置为等同于包含变量的方程式。
Pandas 已经知道它必须将等式应用于每一行并且 return 每个值都应用于其正确的索引。我没有意识到会这么简单,正在寻找更明确的代码。
例如,
import pandas as pd
df = pd.read_csv("...") # df is a large 2D array
A = df[0]
B = df[1]
f(A,B) = ....
df[3] = f(A,B)
我有一个数据框。对于数据帧的每一行:我需要从两个列索引中读取值,将这些值传递给一组方程,将每个方程的结果输入到同一行中它自己的列索引中,转到下一行并重复。
阅读类似问题的回复后,我尝试了:
import pandas as pd
DF = pd.read_csv("...")
Equation_1 = f(x, y)
Equation_2 = g(x, y)
for index, row in DF.iterrows():
a = DF[m]
b = DF[n]
DF[p] = Equation_1(a, b)
DF[q] = Equation_2(a, b)
此代码不是遍历 DF,为每一行读取和输入新值,而是遍历 DF 并为每一行输入相同的值。我不确定我在这里做错了什么。
此外,根据我的阅读,将 DF 视为 NumPy 数组并立即对整个数组执行计算而不是迭代实际上更快。不知道我会怎么做。
谢谢。
# If your equations are simple enough, do operations column-wise in Pandas:
import pandas as pd
test = pd.DataFrame([[1,2],[3,4],[5,6]])
test # Default column names are 0, 1
test[0] # This is column 0
test.icol(0) # This is COLUMN 0-indexed, returned as a Series
test.columns=(['S','Q']) # Column names are easier to use
test #Column names! Use them column-wise:
test['result'] = test.S**2 + test.Q
test # results stored in DataFrame
# For more complicated stuff, try apply, as in Python pandas apply on more columns :
def toyfun(df):
return df[0]-df[1]**2
test['out2']=test[['S','Q']].apply(toyfun, axis=1)
# You can also define the column names when you generate the DataFrame:
test2 = pd.DataFrame([[1,2],[3,4],[5,6]],columns = (list('AB')))
事实证明,这非常容易。必须做的就是定义两个变量并为它们分配所需的列。然后将“要替换的行”设置为等同于包含变量的方程式。
Pandas 已经知道它必须将等式应用于每一行并且 return 每个值都应用于其正确的索引。我没有意识到会这么简单,正在寻找更明确的代码。
例如,
import pandas as pd
df = pd.read_csv("...") # df is a large 2D array
A = df[0]
B = df[1]
f(A,B) = ....
df[3] = f(A,B)