使用将函数应用于多列,根据来自其他列的值将新列创建到数据框中

Create new column into dataframe based on values from other columns using apply function onto multiple columns

我正在使用 apply 函数创建一个新列,即 ERROR_TV_TIC 到基于现有列 [TV_TIC 和 ERRORS] 值的数据框中。我不确定我做错了什么。在某些情况下它可以工作,而在另一些情况下它不会并抛出错误。

数据帧:

ERRORS|TV_TIC
|2.02101E+41
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']|nan
['Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan

运行时的代码:

def validate_tv_tic(trades):
    tv_tiv_errors = list() 
    if pd.isnull(trades['TV_TIC']):
        tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
    if pd.notnull(trades['TV_TIC']) and len(trades['TV_TIC']) != 42:
        tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
    return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)

不起作用时的代码: 现在这里的条件是在系列的 2 列上,我确保我传递的是“&”而不是“and”

def validate_tv_tic(trades):
    tv_tiv_errors = list()
    if pd.isnull(trades['ERRORS']) & pd.isnull(trades['TV_TIC']):
        tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
    if pd.isnull(trades['ERRORS']) & pd.notnull(trades['TV_TIC']) & len(trades['TV_TIC']) != 42:
        tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
    return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)

我收到错误:('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index 3')

使用“and”的错误描述

使用“&”时的错误描述

我的直觉是 pd.isnull 是某个地方导致问题但不确定。

代码没有问题。数据框内的数据存在问题。

列错误是字符串列表,当 > 1 项作为列值存在时抛出错误。所以,第 3 行和第 4 行出现错误

ERRORS

['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']
['Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']

找到根本原因后,我将列表更改为字符串,其中元素由非逗号元素分隔,这对我有用。

更改了我的 return 函数语句 validate_tv_tiv 来自

return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

return ' & '.join(errors) if len(errors) > 0 else np.nan

这创建了我的数据框列错误,如下所示:

ERRORS

Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)
Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing