为什么实体索引显示为 id 而不是 index

Why is an entity index displayed as id and not index

下面定义了一个EntitySet。我在交易 table tx 中将 did 声明为 Index,但它注册为 Id,而不是 Index。这是为什么?

objective是去掉下面的警告。

在什么情况下 Index 赋值会被覆盖为 Id(主键还是外键?),并且 did 注册为 Id与警告有关?

一个uid可以在txtable中有多个did

es = ft.EntitySet(id="the_entity_set")

# hse
es = es.entity_from_dataframe(entity_id="hse",
                              dataframe=hse,
                              index="uid",
                              variable_types={"Gender": ft.variable_types.Categorical,
                                              "Income": ft.variable_types.Numeric,
                                              "dob"   : ft.variable_types.Datetime})

# types
es = es.entity_from_dataframe(entity_id="types",
                              dataframe=types,
                              index="type_id",
                              variable_types={"type": ft.variable_types.Categorical})

# files
es = es.entity_from_dataframe(entity_id="files",
                              dataframe=files,
                              index="file_id",
                              variable_types={"file": ft.variable_types.Categorical})

# uid_donations
es = es.entity_from_dataframe(entity_id="uid_txlup",
                              dataframe=uid_txlup,
                              index="did",
                              variable_types={"uid": ft.variable_types.Categorical})

# transactions
es = es.entity_from_dataframe(entity_id="tx",
                              dataframe=tx,
                              index="did",
                              time_index="dt",
                              variable_types={"file_id": ft.variable_types.Categorical,
                                              "type_id": ft.variable_types.Categorical,
                                              "amt":     ft.variable_types.Numeric})

rels = [
    ft.Relationship(es["files"]["file_id"],es["tx"]["file_id"]),
    ft.Relationship(es["types"]["type_id"],es["tx"]["type_id"]),
    ft.Relationship(es["hse"]["uid"],      es["uid_txlup"]["uid"]),
    ft.Relationship(es["uid_txlup"]["did"],es["tx"]["did"])
]

es.add_relationships( rels )

这就是 EntitySet 的样子

Entityset: the_entity_set
  Entities:
    hse [Rows: 100, Columns: 4]
    types [Rows: 8, Columns: 2]
    files [Rows: 2, Columns: 2]
    uid_txlup [Rows: 336, Columns: 2]
    tx [Rows: 336, Columns: 5]
  Relationships:
    tx.file_id -> files.file_id
    tx.type_id -> types.type_id
    uid_txlup.uid -> hse.uid
    tx.did -> uid_txlup.did


es.entities

[Entity: hse
   Variables:
     uid (dtype: index)
     Gender (dtype: categorical)
     Income (dtype: numeric)
     dob (dtype: datetime)
   Shape:
     (Rows: 100, Columns: 4), Entity: types
   Variables:
     type_id (dtype: index)
     type (dtype: categorical)
   Shape:
     (Rows: 8, Columns: 2), Entity: files
   Variables:
     file_id (dtype: index)
     file (dtype: categorical)
   Shape:
     (Rows: 2, Columns: 2), Entity: uid_txlup
   Variables:
     did (dtype: index)
     uid (dtype: categorical)
   Shape:
     (Rows: 336, Columns: 2), Entity: tx
   Variables:
     did (dtype: id)            ### <<< external key ???
     dt (dtype: datetime)
     file_id (dtype: categorical)
     type_id (dtype: categorical)
     amt (dtype: numeric)
   Shape:
     (Rows: 336, Columns: 5)]

为什么当我调用 ftsdid 显示为 Id 而不是 Index

这是警告:

feature_matrix, feature_defs = ft.dfs(entityset=es,
                                      target_entity="hse",
                                      agg_primitives=["sum","mode","percent_true"],
                                      where_primitives=["count", "avg_time_between"],
                                      max_depth=2)

feature_defs


.../anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/entityset/entityset.py:432: FutureWarning: 'did' is both an index level and a column label.
Defaulting to column, but this will raise an ambiguity error in a future version
  end_entity_id=child_eid)

实体集中的关系将始终介于父实体中的 Id 变量和子实体中的 Index 变量之间。因此,当您添加关系时,无论您指定什么,featuretools 都会自动将变量从子实体转换为 Index 类型。

如果实体之间存在一对一关系,则变量有可能既是 Index 又是 Id。在这种情况下,您应该将两个实体合二为一。