如何使用 MultiplyNumeric 将日期转换为权重和相同的值 table?
How do you use MultiplyNumeric for a date transformed to a weight and a value of the same table?
我的主要目标是考虑具有更高价值的最新信息的功能。
所以,这个想法是通过一个新的原语 t运行sformation "WeightTimeUntil" 计算一个权重因子,之后可以被 t运行sformation 原语使用 "MultiplyNumeric" 获取加权值。
我使用 Will Koehrsen 的演练 walkthrough 作为数据和实体设置的起点。
因此我 运行 陷入以下问题:
- featuretools 没有选择我想要实现的组合(见下文)
- featuretools 似乎因为类型不匹配而没有选择组合?!
- 通过更改我想乘以权重因子的值的类型,我设法获得了正确的组合,但不是正确的目标
- 对于target equal client,featuretools根本没有选择我想要得到的组合。只有当我使用日期和值是列的目标均等贷款时,featuretools 使用正确的组合
这里是"WeightTimeUntil"原语
的代码
def weight_time_until(array, time):
diff = pd.DatetimeIndex(array) - time
s = np.floor(diff.days/365/0.5)
aWidth = 9
a = math.log(0.1) / ( -(aWidth -1) )
w = np.exp(-a*s)
return w
WeightTimeUntil = make_trans_primitive(function=weight_time_until,
input_types=[Datetime],
return_type=Numeric,
uses_calc_time=True,
description="Calculates weight time until the cutoff time",
name="weight_time_until")
这里是DFS执行代码:
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
trans_primitives = [WeightTimeUntil, MultiplyNumeric])
这里是功能列表:
<Feature: income>,
<Feature: credit_score>,
<Feature: join_month>,
<Feature: log_income>,
<Feature: SUM(loans.loan_amount)>,
<Feature: SUM(loans.rate)>,
<Feature: SUM(payments.payment_amount)>,
<Feature: WEIGHT_TIME_UNTIL(joined)>,
<Feature: join_month * log_income>,
<Feature: income * log_income>,
<Feature: income * join_month>,
<Feature: credit_score * join_month>,
<Feature: credit_score * log_income>,
<Feature: credit_score * income>,
<Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_start))>,
<Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_end))>,
<Feature: SUM(loans.loan_amount * rate)>,
<Feature: income * SUM(loans.loan_amount)>,
<Feature: credit_score * SUM(loans.loan_amount)>,
<Feature: log_income * SUM(payments.payment_amount)>,
<Feature: log_income * WEIGHT_TIME_UNTIL(joined)>,
<Feature: income * SUM(payments.payment_amount)>,
<Feature: join_month * SUM(loans.rate)>,
<Feature: income * SUM(loans.rate)>,
<Feature: join_month * SUM(loans.loan_amount)>,
<Feature: SUM(loans.rate) * SUM(payments.payment_amount)>,
<Feature: credit_score * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.rate) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: income * WEIGHT_TIME_UNTIL(joined)>,
<Feature: log_income * SUM(loans.loan_amount)>,
<Feature: SUM(loans.loan_amount) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.loan_amount) * SUM(payments.payment_amount)>,
<Feature: credit_score * SUM(loans.rate)>,
<Feature: log_income * SUM(loans.rate)>,
<Feature: credit_score * SUM(payments.payment_amount)>,
<Feature: SUM(payments.payment_amount) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: join_month * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.loan_amount) * SUM(loans.rate)>,
<Feature: join_month * SUM(payments.payment_amount)>
我期待这样的事情:
SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>
这里的问题是 SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>
是深度 3 特征,因为您正在堆叠 Sum
、MultiplyNumeric
和 WeightTimeUntil
。您可以在文档 here.
中阅读有关深度的更多信息
您可以通过像这样增加对 dfs 的调用的允许深度来解决此问题
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
max_depth=3,
trans_primitives = [WeightTimeUntil, MultiplyNumeric])
另一种方法是将您的特征作为种子特征提供,不计入最大深度。你可以这样做
seed_features=[ft.Feature(es["loans"]["loan_start"], primitive=WeightTimeUntil)]
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
seed_features=seed_features,
trans_primitives = [MultiplyNumeric])
我更喜欢第二种方法,因为它会创建您想要的功能,但总体上功能较少。
我的主要目标是考虑具有更高价值的最新信息的功能。
所以,这个想法是通过一个新的原语 t运行sformation "WeightTimeUntil" 计算一个权重因子,之后可以被 t运行sformation 原语使用 "MultiplyNumeric" 获取加权值。
我使用 Will Koehrsen 的演练 walkthrough 作为数据和实体设置的起点。
因此我 运行 陷入以下问题:
- featuretools 没有选择我想要实现的组合(见下文)
- featuretools 似乎因为类型不匹配而没有选择组合?!
- 通过更改我想乘以权重因子的值的类型,我设法获得了正确的组合,但不是正确的目标
- 对于target equal client,featuretools根本没有选择我想要得到的组合。只有当我使用日期和值是列的目标均等贷款时,featuretools 使用正确的组合
这里是"WeightTimeUntil"原语
的代码def weight_time_until(array, time):
diff = pd.DatetimeIndex(array) - time
s = np.floor(diff.days/365/0.5)
aWidth = 9
a = math.log(0.1) / ( -(aWidth -1) )
w = np.exp(-a*s)
return w
WeightTimeUntil = make_trans_primitive(function=weight_time_until,
input_types=[Datetime],
return_type=Numeric,
uses_calc_time=True,
description="Calculates weight time until the cutoff time",
name="weight_time_until")
这里是DFS执行代码:
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
trans_primitives = [WeightTimeUntil, MultiplyNumeric])
这里是功能列表:
<Feature: income>,
<Feature: credit_score>,
<Feature: join_month>,
<Feature: log_income>,
<Feature: SUM(loans.loan_amount)>,
<Feature: SUM(loans.rate)>,
<Feature: SUM(payments.payment_amount)>,
<Feature: WEIGHT_TIME_UNTIL(joined)>,
<Feature: join_month * log_income>,
<Feature: income * log_income>,
<Feature: income * join_month>,
<Feature: credit_score * join_month>,
<Feature: credit_score * log_income>,
<Feature: credit_score * income>,
<Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_start))>,
<Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_end))>,
<Feature: SUM(loans.loan_amount * rate)>,
<Feature: income * SUM(loans.loan_amount)>,
<Feature: credit_score * SUM(loans.loan_amount)>,
<Feature: log_income * SUM(payments.payment_amount)>,
<Feature: log_income * WEIGHT_TIME_UNTIL(joined)>,
<Feature: income * SUM(payments.payment_amount)>,
<Feature: join_month * SUM(loans.rate)>,
<Feature: income * SUM(loans.rate)>,
<Feature: join_month * SUM(loans.loan_amount)>,
<Feature: SUM(loans.rate) * SUM(payments.payment_amount)>,
<Feature: credit_score * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.rate) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: income * WEIGHT_TIME_UNTIL(joined)>,
<Feature: log_income * SUM(loans.loan_amount)>,
<Feature: SUM(loans.loan_amount) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.loan_amount) * SUM(payments.payment_amount)>,
<Feature: credit_score * SUM(loans.rate)>,
<Feature: log_income * SUM(loans.rate)>,
<Feature: credit_score * SUM(payments.payment_amount)>,
<Feature: SUM(payments.payment_amount) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: join_month * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.loan_amount) * SUM(loans.rate)>,
<Feature: join_month * SUM(payments.payment_amount)>
我期待这样的事情:
SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>
这里的问题是 SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>
是深度 3 特征,因为您正在堆叠 Sum
、MultiplyNumeric
和 WeightTimeUntil
。您可以在文档 here.
您可以通过像这样增加对 dfs 的调用的允许深度来解决此问题
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
max_depth=3,
trans_primitives = [WeightTimeUntil, MultiplyNumeric])
另一种方法是将您的特征作为种子特征提供,不计入最大深度。你可以这样做
seed_features=[ft.Feature(es["loans"]["loan_start"], primitive=WeightTimeUntil)]
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
seed_features=seed_features,
trans_primitives = [MultiplyNumeric])
我更喜欢第二种方法,因为它会创建您想要的功能,但总体上功能较少。