为什么实体索引显示为 id 而不是 index
Why is an entity index displayed as id and not index
下面定义了一个EntitySet。我在交易 table tx
中将 did
声明为 Index
,但它注册为 Id
,而不是 Index
。这是为什么?
objective是去掉下面的警告。
在什么情况下 Index
赋值会被覆盖为 Id
(主键还是外键?),并且 did
注册为 Id
与警告有关?
一个uid
可以在tx
table中有多个did
。
es = ft.EntitySet(id="the_entity_set")
# hse
es = es.entity_from_dataframe(entity_id="hse",
dataframe=hse,
index="uid",
variable_types={"Gender": ft.variable_types.Categorical,
"Income": ft.variable_types.Numeric,
"dob" : ft.variable_types.Datetime})
# types
es = es.entity_from_dataframe(entity_id="types",
dataframe=types,
index="type_id",
variable_types={"type": ft.variable_types.Categorical})
# files
es = es.entity_from_dataframe(entity_id="files",
dataframe=files,
index="file_id",
variable_types={"file": ft.variable_types.Categorical})
# uid_donations
es = es.entity_from_dataframe(entity_id="uid_txlup",
dataframe=uid_txlup,
index="did",
variable_types={"uid": ft.variable_types.Categorical})
# transactions
es = es.entity_from_dataframe(entity_id="tx",
dataframe=tx,
index="did",
time_index="dt",
variable_types={"file_id": ft.variable_types.Categorical,
"type_id": ft.variable_types.Categorical,
"amt": ft.variable_types.Numeric})
rels = [
ft.Relationship(es["files"]["file_id"],es["tx"]["file_id"]),
ft.Relationship(es["types"]["type_id"],es["tx"]["type_id"]),
ft.Relationship(es["hse"]["uid"], es["uid_txlup"]["uid"]),
ft.Relationship(es["uid_txlup"]["did"],es["tx"]["did"])
]
es.add_relationships( rels )
这就是 EntitySet 的样子
Entityset: the_entity_set
Entities:
hse [Rows: 100, Columns: 4]
types [Rows: 8, Columns: 2]
files [Rows: 2, Columns: 2]
uid_txlup [Rows: 336, Columns: 2]
tx [Rows: 336, Columns: 5]
Relationships:
tx.file_id -> files.file_id
tx.type_id -> types.type_id
uid_txlup.uid -> hse.uid
tx.did -> uid_txlup.did
es.entities
[Entity: hse
Variables:
uid (dtype: index)
Gender (dtype: categorical)
Income (dtype: numeric)
dob (dtype: datetime)
Shape:
(Rows: 100, Columns: 4), Entity: types
Variables:
type_id (dtype: index)
type (dtype: categorical)
Shape:
(Rows: 8, Columns: 2), Entity: files
Variables:
file_id (dtype: index)
file (dtype: categorical)
Shape:
(Rows: 2, Columns: 2), Entity: uid_txlup
Variables:
did (dtype: index)
uid (dtype: categorical)
Shape:
(Rows: 336, Columns: 2), Entity: tx
Variables:
did (dtype: id) ### <<< external key ???
dt (dtype: datetime)
file_id (dtype: categorical)
type_id (dtype: categorical)
amt (dtype: numeric)
Shape:
(Rows: 336, Columns: 5)]
为什么当我调用 fts
时 did
显示为 Id
而不是 Index
?
这是警告:
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_entity="hse",
agg_primitives=["sum","mode","percent_true"],
where_primitives=["count", "avg_time_between"],
max_depth=2)
feature_defs
.../anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/entityset/entityset.py:432: FutureWarning: 'did' is both an index level and a column label.
Defaulting to column, but this will raise an ambiguity error in a future version
end_entity_id=child_eid)
实体集中的关系将始终介于父实体中的 Id
变量和子实体中的 Index
变量之间。因此,当您添加关系时,无论您指定什么,featuretools 都会自动将变量从子实体转换为 Index
类型。
如果实体之间存在一对一关系,则变量有可能既是 Index
又是 Id
。在这种情况下,您应该将两个实体合二为一。
下面定义了一个EntitySet。我在交易 table tx
中将 did
声明为 Index
,但它注册为 Id
,而不是 Index
。这是为什么?
objective是去掉下面的警告。
在什么情况下 Index
赋值会被覆盖为 Id
(主键还是外键?),并且 did
注册为 Id
与警告有关?
一个uid
可以在tx
table中有多个did
。
es = ft.EntitySet(id="the_entity_set")
# hse
es = es.entity_from_dataframe(entity_id="hse",
dataframe=hse,
index="uid",
variable_types={"Gender": ft.variable_types.Categorical,
"Income": ft.variable_types.Numeric,
"dob" : ft.variable_types.Datetime})
# types
es = es.entity_from_dataframe(entity_id="types",
dataframe=types,
index="type_id",
variable_types={"type": ft.variable_types.Categorical})
# files
es = es.entity_from_dataframe(entity_id="files",
dataframe=files,
index="file_id",
variable_types={"file": ft.variable_types.Categorical})
# uid_donations
es = es.entity_from_dataframe(entity_id="uid_txlup",
dataframe=uid_txlup,
index="did",
variable_types={"uid": ft.variable_types.Categorical})
# transactions
es = es.entity_from_dataframe(entity_id="tx",
dataframe=tx,
index="did",
time_index="dt",
variable_types={"file_id": ft.variable_types.Categorical,
"type_id": ft.variable_types.Categorical,
"amt": ft.variable_types.Numeric})
rels = [
ft.Relationship(es["files"]["file_id"],es["tx"]["file_id"]),
ft.Relationship(es["types"]["type_id"],es["tx"]["type_id"]),
ft.Relationship(es["hse"]["uid"], es["uid_txlup"]["uid"]),
ft.Relationship(es["uid_txlup"]["did"],es["tx"]["did"])
]
es.add_relationships( rels )
这就是 EntitySet 的样子
Entityset: the_entity_set
Entities:
hse [Rows: 100, Columns: 4]
types [Rows: 8, Columns: 2]
files [Rows: 2, Columns: 2]
uid_txlup [Rows: 336, Columns: 2]
tx [Rows: 336, Columns: 5]
Relationships:
tx.file_id -> files.file_id
tx.type_id -> types.type_id
uid_txlup.uid -> hse.uid
tx.did -> uid_txlup.did
es.entities
[Entity: hse
Variables:
uid (dtype: index)
Gender (dtype: categorical)
Income (dtype: numeric)
dob (dtype: datetime)
Shape:
(Rows: 100, Columns: 4), Entity: types
Variables:
type_id (dtype: index)
type (dtype: categorical)
Shape:
(Rows: 8, Columns: 2), Entity: files
Variables:
file_id (dtype: index)
file (dtype: categorical)
Shape:
(Rows: 2, Columns: 2), Entity: uid_txlup
Variables:
did (dtype: index)
uid (dtype: categorical)
Shape:
(Rows: 336, Columns: 2), Entity: tx
Variables:
did (dtype: id) ### <<< external key ???
dt (dtype: datetime)
file_id (dtype: categorical)
type_id (dtype: categorical)
amt (dtype: numeric)
Shape:
(Rows: 336, Columns: 5)]
为什么当我调用 fts
时 did
显示为 Id
而不是 Index
?
这是警告:
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_entity="hse",
agg_primitives=["sum","mode","percent_true"],
where_primitives=["count", "avg_time_between"],
max_depth=2)
feature_defs
.../anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/entityset/entityset.py:432: FutureWarning: 'did' is both an index level and a column label.
Defaulting to column, but this will raise an ambiguity error in a future version
end_entity_id=child_eid)
实体集中的关系将始终介于父实体中的 Id
变量和子实体中的 Index
变量之间。因此,当您添加关系时,无论您指定什么,featuretools 都会自动将变量从子实体转换为 Index
类型。
如果实体之间存在一对一关系,则变量有可能既是 Index
又是 Id
。在这种情况下,您应该将两个实体合二为一。