使用将函数应用于多列,根据来自其他列的值将新列创建到数据框中
Create new column into dataframe based on values from other columns using apply function onto multiple columns
我正在使用 apply 函数创建一个新列,即 ERROR_TV_TIC 到基于现有列 [TV_TIC 和 ERRORS] 值的数据框中。我不确定我做错了什么。在某些情况下它可以工作,而在另一些情况下它不会并抛出错误。
数据帧:
ERRORS|TV_TIC
|2.02101E+41
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']|nan
['Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
运行时的代码:
def validate_tv_tic(trades):
tv_tiv_errors = list()
if pd.isnull(trades['TV_TIC']):
tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
if pd.notnull(trades['TV_TIC']) and len(trades['TV_TIC']) != 42:
tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)
不起作用时的代码:
现在这里的条件是在系列的 2 列上,我确保我传递的是“&”而不是“and”
def validate_tv_tic(trades):
tv_tiv_errors = list()
if pd.isnull(trades['ERRORS']) & pd.isnull(trades['TV_TIC']):
tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
if pd.isnull(trades['ERRORS']) & pd.notnull(trades['TV_TIC']) & len(trades['TV_TIC']) != 42:
tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)
我收到错误:('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index 3')
使用“and”的错误描述
使用“&”时的错误描述
我的直觉是 pd.isnull 是某个地方导致问题但不确定。
代码没有问题。数据框内的数据存在问题。
列错误是字符串列表,当 > 1 项作为列值存在时抛出错误。所以,第 3 行和第 4 行出现错误
ERRORS
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']
['Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
找到根本原因后,我将列表更改为字符串,其中元素由非逗号元素分隔,这对我有用。
更改了我的 return 函数语句 validate_tv_tiv
来自
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
至
return ' & '.join(errors) if len(errors) > 0 else np.nan
这创建了我的数据框列错误,如下所示:
ERRORS
Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)
Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
我正在使用 apply 函数创建一个新列,即 ERROR_TV_TIC 到基于现有列 [TV_TIC 和 ERRORS] 值的数据框中。我不确定我做错了什么。在某些情况下它可以工作,而在另一些情况下它不会并抛出错误。
数据帧:
ERRORS|TV_TIC
|2.02101E+41
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']|nan
['Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
运行时的代码:
def validate_tv_tic(trades):
tv_tiv_errors = list()
if pd.isnull(trades['TV_TIC']):
tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
if pd.notnull(trades['TV_TIC']) and len(trades['TV_TIC']) != 42:
tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)
不起作用时的代码: 现在这里的条件是在系列的 2 列上,我确保我传递的是“&”而不是“and”
def validate_tv_tic(trades):
tv_tiv_errors = list()
if pd.isnull(trades['ERRORS']) & pd.isnull(trades['TV_TIC']):
tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
if pd.isnull(trades['ERRORS']) & pd.notnull(trades['TV_TIC']) & len(trades['TV_TIC']) != 42:
tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)
我收到错误:('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index 3')
使用“and”的错误描述
使用“&”时的错误描述
我的直觉是 pd.isnull 是某个地方导致问题但不确定。
代码没有问题。数据框内的数据存在问题。
列错误是字符串列表,当 > 1 项作为列值存在时抛出错误。所以,第 3 行和第 4 行出现错误
ERRORS
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']
['Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
找到根本原因后,我将列表更改为字符串,其中元素由非逗号元素分隔,这对我有用。
更改了我的 return 函数语句 validate_tv_tiv 来自
return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan
至
return ' & '.join(errors) if len(errors) > 0 else np.nan
这创建了我的数据框列错误,如下所示:
ERRORS
Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)
Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing