具有单个 table 和 Min 原语的 Featuretools 给出错误
Featuretools with a single table and the Min primitive gives an error
我的环境是:
Operating system version.... Windows-10-10.0.17134-SP0
Python version is........... 3.6.5
pandas version is........... 0.23.0
numpy version is............ 1.14.3
Featuretools................ 0.3.0
我的 pandas 数据框看起来像:
df
index BoxRatio Thrust Velocity OnBalRun vwapGain
0 1 0.324000 0.615000 1.525000 3.618000 0.416000
1 2 0.938249 0.366377 2.402230 6.393223 2.667106
2 3 0.317000 -0.281000 0.979000 1.489000 0.506000
3 4 0.289000 -0.433000 0.796000 2.081000 0.536000
4 5 1.551115 -0.103734 0.731682 1.752156 0.667016
我试过以下方法:
es = ft.EntitySet('Pattern')
es.entity_from_dataframe(dataframe=df,
entity_id='my_id',
index='index')
def log10(column):
return np.log10(column)
Log10 = make_trans_primitive(function=log10,
input_types=[Numeric],
return_type=Numeric)
from featuretools.primitives import (Count, Sum, Mean, Median, Std, Min, Max, Multiply)
feature_matrix, feature_names = ft.dfs(entityset=es,
target_entity='my_id',
trans_primitives=[Log10])
print('feature_names:\n')
for item in feature_names:
print(' ' + item)
给出以下内容:
feature_names:
<Feature: + BoxRatio>
<Feature: + Thrust>
<Feature: + Velocity>
<Feature: + OnBalRun>
<Feature: + vwapGain>
<Feature: + LOG10(BoxRatio)>
<Feature: + LOG10(Thrust)>
<Feature: + LOG10(Velocity)>
<Feature: + LOG10(OnBalRun)>
<Feature: + LOG10(vwapGain)>
到目前为止一切顺利...现在如果我添加 "Min" 原语,我会得到:
Traceback (most recent call last):
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 112, in <module>
Main()
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 95, in Main
trans_primitives=[Log10, Min])
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\dfs.py", line 184, in dfs
features = dfs_object.build_features(verbose=verbose)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 218, in build_features
all_features, max_depth=self.max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 365, in _run_dfs
all_features, entity, max_depth=max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 514, in _build_transform_features
new_f = trans_prim(*matching_input)
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'
我希望看到每列特征的最小值(就像 Log10 原语一样)。当然我可以定义我自己的 Min 原语,但我希望有一个简单的解决方案。
查尔斯
这里的问题是,Min 是一个聚合原语,而 Log 是一个转换原语。
聚合原语将相关实例作为输入并输出单个值。它们应用于实体集中的父子关系。例如,Min 接受值列表和 returns 列表的最小值。
变换基元从一个实体中获取一个或多个变量作为输入,并为该实体输出一个新变量。它们应用于单个实体。例如,log 接收值列表和 returns 与输入中每个项目的日志长度相同的列表。
您可以在有关原语的文档中阅读更多内容:https://docs.featuretools.com/automated_feature_engineering/primitives.html
我的环境是:
Operating system version.... Windows-10-10.0.17134-SP0
Python version is........... 3.6.5
pandas version is........... 0.23.0
numpy version is............ 1.14.3
Featuretools................ 0.3.0
我的 pandas 数据框看起来像:
df
index BoxRatio Thrust Velocity OnBalRun vwapGain
0 1 0.324000 0.615000 1.525000 3.618000 0.416000
1 2 0.938249 0.366377 2.402230 6.393223 2.667106
2 3 0.317000 -0.281000 0.979000 1.489000 0.506000
3 4 0.289000 -0.433000 0.796000 2.081000 0.536000
4 5 1.551115 -0.103734 0.731682 1.752156 0.667016
我试过以下方法:
es = ft.EntitySet('Pattern')
es.entity_from_dataframe(dataframe=df,
entity_id='my_id',
index='index')
def log10(column):
return np.log10(column)
Log10 = make_trans_primitive(function=log10,
input_types=[Numeric],
return_type=Numeric)
from featuretools.primitives import (Count, Sum, Mean, Median, Std, Min, Max, Multiply)
feature_matrix, feature_names = ft.dfs(entityset=es,
target_entity='my_id',
trans_primitives=[Log10])
print('feature_names:\n')
for item in feature_names:
print(' ' + item)
给出以下内容:
feature_names:
<Feature: + BoxRatio>
<Feature: + Thrust>
<Feature: + Velocity>
<Feature: + OnBalRun>
<Feature: + vwapGain>
<Feature: + LOG10(BoxRatio)>
<Feature: + LOG10(Thrust)>
<Feature: + LOG10(Velocity)>
<Feature: + LOG10(OnBalRun)>
<Feature: + LOG10(vwapGain)>
到目前为止一切顺利...现在如果我添加 "Min" 原语,我会得到:
Traceback (most recent call last):
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 112, in <module>
Main()
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 95, in Main
trans_primitives=[Log10, Min])
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\dfs.py", line 184, in dfs
features = dfs_object.build_features(verbose=verbose)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 218, in build_features
all_features, max_depth=self.max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 365, in _run_dfs
all_features, entity, max_depth=max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 514, in _build_transform_features
new_f = trans_prim(*matching_input)
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'
我希望看到每列特征的最小值(就像 Log10 原语一样)。当然我可以定义我自己的 Min 原语,但我希望有一个简单的解决方案。
查尔斯
这里的问题是,Min 是一个聚合原语,而 Log 是一个转换原语。
聚合原语将相关实例作为输入并输出单个值。它们应用于实体集中的父子关系。例如,Min 接受值列表和 returns 列表的最小值。
变换基元从一个实体中获取一个或多个变量作为输入,并为该实体输出一个新变量。它们应用于单个实体。例如,log 接收值列表和 returns 与输入中每个项目的日志长度相同的列表。
您可以在有关原语的文档中阅读更多内容:https://docs.featuretools.com/automated_feature_engineering/primitives.html