使用字符串函数 arg 命名 pandas DF 中的新功能
Using a string function arg to name new feature in pandas DF
我正在尝试编写一个 python 函数,它允许我向 pandas df 添加功能以进行机器学习。我想我误解了如何在 python 函数中使用字符串。
该函数查看 df 的一行,检查行标识符是否在未来许多个月(下面的行数)具有相同的标识符。如果是,它将未来行的 'start' 特征的值添加到新特征列,否则将初始行的 'end' 添加到新特征列。这是一个定制的移位功能。
添加此功能后,我想将另一列 1 或 0 作为新功能添加到带有适当列标签的 df 中。这将被标记为 'feat_so_many_months_in_future_is_higher_or_lower'。
问题是我什至无法到达阈值部分附近的第二个二进制文件。我在添加具有适当名称的第一个新功能时遇到问题。
def binary_up_down(name_of_new_feature, months_in_future, percent_threshold):
name_of_new_feature = []
for i in range(0, df.shape[0], 1):
try:
if df['identifier'][i]==df['identifier'][i + months_in_future]:
name_of_new_feature.append(df['start'][i + months_in_future])
else:
name_of_new_feature.append(df['end'][i])
except KeyError:
name_of_new_feature.append(df['end'][i])
df[str(name_of_new_feature)]=name_of_new_feature
### Add test to check if shifted value is above or below threshold and name new feature
appropriately ###
return df
我的想法是调用函数如下:
binary_up_down('feat_value_in_1m', 1, 5)
#Then
binary_up_down('feat_value_in_3m', 3, 5) # and on an on...
当我运行代码这行似乎是问题所在:
df[str(name_of_new_feature)] = name_of_new_feature
...因为它添加了所有新的特征列值作为列名!
非常感谢任何指点!
您正在用函数第一行中的列表替换 name_of_new_feature
。我建议将其重命名为 value_of_new_feature
def binary_up_down(name_of_new_feature, months_in_future, percent_threshold):
value_of_new_feature = []
for i in range(0, df.shape[0], 1):
try:
if df['identifier'][i]==df['identifier'][i + months_in_future]:
value_of_new_feature .append(df['start'][i + months_in_future])
else:
value_of_new_feature .append(df['end'][i])
except KeyError:
value_of_new_feature .append(df['end'][i])
df[name_of_new_feature]=value_of_new_feature
### Add test to check if shifted value is above or below threshold and name new feature
appropriately ###
return df
我正在尝试编写一个 python 函数,它允许我向 pandas df 添加功能以进行机器学习。我想我误解了如何在 python 函数中使用字符串。
该函数查看 df 的一行,检查行标识符是否在未来许多个月(下面的行数)具有相同的标识符。如果是,它将未来行的 'start' 特征的值添加到新特征列,否则将初始行的 'end' 添加到新特征列。这是一个定制的移位功能。
添加此功能后,我想将另一列 1 或 0 作为新功能添加到带有适当列标签的 df 中。这将被标记为 'feat_so_many_months_in_future_is_higher_or_lower'。
问题是我什至无法到达阈值部分附近的第二个二进制文件。我在添加具有适当名称的第一个新功能时遇到问题。
def binary_up_down(name_of_new_feature, months_in_future, percent_threshold):
name_of_new_feature = []
for i in range(0, df.shape[0], 1):
try:
if df['identifier'][i]==df['identifier'][i + months_in_future]:
name_of_new_feature.append(df['start'][i + months_in_future])
else:
name_of_new_feature.append(df['end'][i])
except KeyError:
name_of_new_feature.append(df['end'][i])
df[str(name_of_new_feature)]=name_of_new_feature
### Add test to check if shifted value is above or below threshold and name new feature
appropriately ###
return df
我的想法是调用函数如下:
binary_up_down('feat_value_in_1m', 1, 5)
#Then
binary_up_down('feat_value_in_3m', 3, 5) # and on an on...
当我运行代码这行似乎是问题所在:
df[str(name_of_new_feature)] = name_of_new_feature
...因为它添加了所有新的特征列值作为列名!
非常感谢任何指点!
您正在用函数第一行中的列表替换 name_of_new_feature
。我建议将其重命名为 value_of_new_feature
def binary_up_down(name_of_new_feature, months_in_future, percent_threshold):
value_of_new_feature = []
for i in range(0, df.shape[0], 1):
try:
if df['identifier'][i]==df['identifier'][i + months_in_future]:
value_of_new_feature .append(df['start'][i + months_in_future])
else:
value_of_new_feature .append(df['end'][i])
except KeyError:
value_of_new_feature .append(df['end'][i])
df[name_of_new_feature]=value_of_new_feature
### Add test to check if shifted value is above or below threshold and name new feature
appropriately ###
return df