使用featuretool，如何单独使用dfs原语？

Question

你能帮帮我吗？当我使用 feturestools 时，我使用 iris 数据集，它有 4 个特征如下：f1、f2、f3、f4，当我使用 ft.dfsI 时有 3 个问题。 1.我发现feature_matrix的特征太多了。 'divide_by_feature' 和 'modulo_numeric' 没有单独作用于原始特征。它首先作用divide_by_feature'然后得到4个新特征，然后对原始特征和新特征作用'modulo_numeric'。我希望这两个原语可以分别作用于原始特征。我应该怎么做？ 2. 我使用像 trans_primitives = ['subtract_numeric_scalar', 'modulo_numeric'] 这样的变换原语。我发现subtract_numeric_scalar可以传值，但是不知道怎么传？ 3. 我想知道如何使用所有的变换图元？默认情况下，trans_primitives=None，现在，我可以这样解决：trans_primitives = ['is_null','diff',...]，但是，我认为这是麻烦。

你能给我一些建议吗？谢谢！

enter image description here

Answer 1

您可以使用max_depth来控制特征的复杂度。当max_depth=1时，图元将仅使用原始特征。

features = ft.dfs(
    entityset=es,
    target_entity='data',
    trans_primitives=['divide_by_feature', 'modulo_numeric'],
    features_only=True,
    max_depth=1,
)

[<Feature: f1>,
<Feature: f2>,
<Feature: f3>,
<Feature: f4>,
<Feature: 1 / f3>,
<Feature: 1 / f1>,
<Feature: 1 / f2>,
<Feature: 1 / f4>,
<Feature: f1 % f2>,
<Feature: f4 % f3>,
<Feature: f4 % f2>,
<Feature: f1 % f3>,
<Feature: f2 % f4>,
<Feature: f4 % f1>,
<Feature: f3 % f2>,
<Feature: f3 % f1>,
<Feature: f2 % f1>,
<Feature: f3 % f4>,
<Feature: f2 % f3>,
<Feature: f1 % f4>]

您可以使用参数创建基元的实例。这是将值传递给 subtract_numeric_scalar.

的方法

from featuretools.primitives import SubtractNumericScalar

ft.dfs(
    ...
    trans_primitives=[SubtractNumericScalar(value=2)]
)

您可以通过从基元列表中提取名称来使用所有变换基元。

primitives = ft.list_primitives()
primitives = primitives.groupby('type')
transforms = primitives.get_group('transform')
transforms = transforms.name.values.tolist()

['less_than_scalar',
'divide_numeric',
'latitude',
'add_numeric',
'week',
'greater_than_equal_to_scalar',
'and',
'multiply_numeric_scalar',
'not',
'second',
'greater_than_scalar',
'modulo_numeric_scalar',
'scalar_subtract_numeric_feature',
'diff',
'day',
'cum_min',
'divide_by_feature',
'less_than_equal_to',
'time_since',
'time_since_previous',
'cum_count',
'year',
'is_null',
'num_characters',
'equal_scalar',
'is_weekend',
'less_than_equal_to_scalar',
'longitude',
'add_numeric_scalar',
'month',
'less_than',
'or',
'multiply_boolean',
'percentile',
'minute',
'not_equal_scalar',
'greater_than_equal_to',
'modulo_by_feature',
'multiply_numeric',
'negate',
'hour',
'cum_max',
'greater_than',
'modulo_numeric',
'subtract_numeric_scalar',
'isin',
'cum_mean',
'divide_numeric_scalar',
'num_words',
'absolute',
'cum_sum',
'not_equal',
'weekday',
'equal',
'haversine',
'subtract_numeric']

如果有帮助请告诉我。

使用featuretool，如何单独使用dfs原语？

Using featuretool, how to use dfs primitives individually?

featuretools