pandas DataFrame 不同列的不同操作

Question

我想对我的 DataFrame 的每一列执行特定操作，特别是将给定操作应用于除最后一列之外的所有列。

我在 google 的帮助下完成了这项工作，它有效但对我来说似乎很令人毛骨悚然。

你能帮我改进一下吗？

d = {
    'col1': [1, 2, 4, 7], 
    'col2': [3, 4, 9, 1], 
    'col3': [5, 2, 11, 4], 
    'col4': [True, True, False, True]
}
df = pd.DataFrame(data=d)

def do_nothing(x):
    return x

def minor(x):
    return x<2

def multi_func(functions):
    def f(col):
        return functions[col.name](col)
    return f

result = df.apply(multi_func({'col1': minor, 'col2': minor,
                               'col3': minor, 'col4': do_nothing}))

谢谢大家

Answer 1

改用 aggregate 函数，它允许 func 参数有更多选项：

res = df.aggregate({'col1': minor, 'col2': minor, 'col3': minor, 'col4': do_nothing})

print(res)

输出（在相关脚本的上下文中）：


    col1   col2   col3   col4
0   True  False  False   True
1  False  False  False   True
2  False  False  False  False
3  False   True  False   True

将所有这些写得“更聪明”一点的一个选择是使文字 2 成为一个变量，并用一个能更好地反映输入处理方式的名称替换 do_nothing：

import pandas as pd
 
d = {
    'col1': [1, 2, 4, 7], 
    'col2': [3, 4, 9, 1], 
    'col3': [5, 2, 11, 4], 
    'col4': [True, True, False, True]
}
df = pd.DataFrame(data=d)

# identity function:
copy = lambda x: x

# lt (less than arg). returns a function that compares to the bound argument:
def lt(arg):
    return lambda x: x < arg

res = df.aggregate({'col1': lt(2), 'col2': lt(2), 'col3': lt(2), 'col4': copy})

print(res)

与上面相同的输出。

pandas DataFrame 不同列的不同操作

Different operations on different columns of a pandas DataFrame

python

apply

pandas