将包含 key/value 对的 Pandas 系列映射到包含数据的新列
Map Pandas Series Containing key/value pairs to a new columns with data
我有一个包含 pandas 系列(第 2 列)的数据框,如下所示:
column 1
column 2
column 3
1123
Requested By = John Doe 1\n Requested On = 12 October 2021\n Comments = This is a generic request
INC29192
1251
NaN
INC18217
1918
Requested By = John Doe 2\n Requested On = 2 September 2021\n Comments = This is another generic request
INC19281
我正在努力将第 2 列数据提取、拆分和映射到一系列新的列名,并为该记录提供适当的数据(如果可能,那是有可用数据的地方,因为我有 NaN)。
所需的输出类似于(为了便于阅读,我删除了第 3 列数据):
column 1
column 3
Requested By
Requested On
Comments
1123
INC29192
John Doe 1
12 October 2021
This is a generic request
1251
INC18217
NaN
NaN
NaN
1918
INC19281
John Doe 2
2 September 2021
This is another generic request
我花了很多时间尝试各种方法,从 lambda 函数到推导式再到爆炸方法,但还没有完全找到提供所需输出的解决方案 - 任何想法都非常感谢!
首先,我会将 column 2
值转换为字典,然后将它们转换为 Dataframes 并将它们连接到您的 df:
df['column 2'] = df['column 2'].apply(lambda x:
{y.split(' = ',1)[0]:y.split(' = ',1)[1]
for y in x.split(r'\n ')}
if not pd.isna(x) else {})
df = df.join(pd.DataFrame(df['column 2'].values.tolist())).drop('column 2', axis=1)
print(df)
输出:
column 1 column 3 Requested By Requested On Comments
0 1123 INC29192 John Doe 1 12 October 2021 This is a generic request
1 1251 INC18217 NaN NaN NaN
2 1918 INC19281 John Doe 2 2 September 2021 This is another generic request
我有一个包含 pandas 系列(第 2 列)的数据框,如下所示:
column 1 | column 2 | column 3 |
---|---|---|
1123 | Requested By = John Doe 1\n Requested On = 12 October 2021\n Comments = This is a generic request | INC29192 |
1251 | NaN | INC18217 |
1918 | Requested By = John Doe 2\n Requested On = 2 September 2021\n Comments = This is another generic request | INC19281 |
我正在努力将第 2 列数据提取、拆分和映射到一系列新的列名,并为该记录提供适当的数据(如果可能,那是有可用数据的地方,因为我有 NaN)。
所需的输出类似于(为了便于阅读,我删除了第 3 列数据):
column 1 | column 3 | Requested By | Requested On | Comments |
---|---|---|---|---|
1123 | INC29192 | John Doe 1 | 12 October 2021 | This is a generic request |
1251 | INC18217 | NaN | NaN | NaN |
1918 | INC19281 | John Doe 2 | 2 September 2021 | This is another generic request |
我花了很多时间尝试各种方法,从 lambda 函数到推导式再到爆炸方法,但还没有完全找到提供所需输出的解决方案 - 任何想法都非常感谢!
首先,我会将 column 2
值转换为字典,然后将它们转换为 Dataframes 并将它们连接到您的 df:
df['column 2'] = df['column 2'].apply(lambda x:
{y.split(' = ',1)[0]:y.split(' = ',1)[1]
for y in x.split(r'\n ')}
if not pd.isna(x) else {})
df = df.join(pd.DataFrame(df['column 2'].values.tolist())).drop('column 2', axis=1)
print(df)
输出:
column 1 column 3 Requested By Requested On Comments
0 1123 INC29192 John Doe 1 12 October 2021 This is a generic request
1 1251 INC18217 NaN NaN NaN
2 1918 INC19281 John Doe 2 2 September 2021 This is another generic request