如何在 python 中为从 excel 导入的数据框执行自定义函数？

Question

我有一个 excel 文件 payments，大约有 50,000 行，下一个结构是：

我通过这段代码将数据从excel导入到python：

test= pd.read_excel(r'D:\Users\Desktop\test_stack.xlsx')

但是，当我想执行下面指示的自定义函数时，会生成以下错误：

TypeError: unsupported operand type(s) for /: 'str' and 'str'

显然，E列被读取为字符串(str)，因此无法执行这些功能。应该注意的是，这些函数是迭代的，因此它们遍历包含在 {} 中并用逗号分隔的每个元素 [=34] =]payments，执行操作，稍后创建相应的列。

同样，当我手动执行创建数据框的函数时，函数执行没有问题，但我需要以提到的 excel 文件的格式执行它们。

test= pd.DataFrame({'id':['F8510004123','A3100002543','Z3510002123'],
                     'product':['retail','retail','others'],
                     'type':['E','E','D'],
                     'quantity':[25,34,150],
                     'nro_ope':[2,3,26],
                     'payments':[[1030.97,1030.97,584.91],[1610.74,1610.74,1610.74,1610.74,1611.14],[1007.52,1007.52,1007.52,1007.52,500,500,852.95]]
                     })

;

def var_payments (x) :
    variation  = [round(abs(a/b -1),3) for a,b in permutations(x,2)]
    return variation 

;

def count_var_pay (x) :
    count = 0
    for element in x:
        if element >= 0.05 :
            count += 1
        else:
             0                    
    return count

;

def flag_var_payments (x) : 
    if x >= 2 :
        return 'Yes'
    else : 
        return 'No'

;

test['var_payments'] = test.payments.apply(lambda x:var_payments(x))

test['count_p'] = test.var_payments.apply(lambda x:count_var_pay(x))

test['flag'] = test.count_p.apply(lambda x:flag_var_payments(x))

我该怎么做？更改 excel 文件中 payments 列的格式？转换列数据？

我很注意你的意见。

感谢支持。

Answer 1

将{}转换为[]然后使用pd.eval转换为列表：

df['payments'] = pd.eval(df['payments'].replace({r'{': '[', r'}': ']'}, regex=True))

输出：

>>> df
            id product type  quantity  nro_ope                                           payments
0  F8510004123  retail    E        25        2                         [1030.97, 1030.97, 584.91]
1  A3100002543  retail    E        34        3      [1610.74, 1610.74, 1610.74, 1610.74, 1611.14]
2  Z3510002123  others    D       150       26  [1007.52, 1007.52, 1007.52, 1007.52, 500, 500,...

>>> df.iloc[0, 5]
[1030.97, 1030.97, 584.91]

>>> type(df.iloc[0, 5])
list

Answer 2

我不知道您在阅读 excel 文件时如何处理“{}”，但是如果您在 中更改了 payment test DataFrame，会遇到同样的错误：

'payments':["{1030.97,1030.97,584.91}","{1610.74,1610.74,1610.74,1610.74,1611.14}","{1007.52,1007.52,1007.52,1007.52,500,500,852.95}"]

TypeError：/ 的操作数类型不受支持：'str' 和 'str'

希望能有所帮助。

如何在 python 中为从 excel 导入的数据框执行自定义函数？

How to execute a custom function in python for a data frame imported from excel?

python

arrays

lambda

dataframe

pandas