根据分隔符并考虑内容拆分列

Split column based on delimiter and considering the content

我需要根据内容将包含一些 'structured' 数据的 pandas 列的内容拆分到许多其他列中。

结构是“column_name1”/“value1”/“column_name2”/“value2”/...

例如,“subscriptions”一词将成为列的名称,“sub-id”、“sub-id2”将成为值。

转换这个:

ResourceID
/subscriptions/sub-id/resourceGroups/rg-name/providers/Microsoft.MachineLearningServices/workspaces/work-ml/providers/Microsoft.EventGrid/extensionTopics/default
/subscriptions/sub-id2/resourceGroups/rg-name2/providers/Microsoft.Sql/servers/name-sqlserver/databases/name-BD

进入这个:

subscriptions resourceGroups providers workspaces servers providers extensionTopics databases
sub-id rg-name Microsoft.MachineLearningServices work-ml NaN Microsoft.EventGrid default NaN
sub-id2 rg-name2 Microsoft.Sql NaN name-sqlserver NaN NaN name-BD

如有任何帮助,我们将不胜感激。

每一行看起来像:/key1/val1/key2/val2/... 所以拆分每个部分,压缩键和值然后创建一个字典。最后,使用 pd.DataFrame.from_records 创建您期望的数据框:

data = df['ResourceID'].str.strip('/').str.split('/') \
                       .apply(lambda x: dict(zip(x[::2], x[1::2])))
out = pd.DataFrame.from_records(data)
print(out)

# Output
  subscriptions resourceGroups            providers workspaces extensionTopics         servers databases
0        sub-id        rg-name  Microsoft.EventGrid    work-ml         default             NaN       NaN
1       sub-id2       rg-name2        Microsoft.Sql        NaN             NaN  name-sqlserver   name-BD

作为@Corralien 的其他答案的替代方案,您可以使用 splitexpand 参数来执行此操作:

df = df["ResourceID"].str.split("/", expand=True)
out = df[df.columns[2::2]]
out.columns = df.loc[0, df.columns[1::2]]