在数据框中拆分字符串

Splitting a string in dataframe

我有一个这样的数据框:

col1|col2
{"test":"23","test1":"12"}|1992
{"test":"24","test1":"19","test3":"24"}|1993
{"test":"27","test1":"20","test3":"21","test4":"40"}|1994

我想要这样的数据框:

col1_a|col1_b|col2
test|23|1992
test1|12|1992
test|24|1993
test1|19|1993
.
.
.
.
.
.

我怎样才能实现这个解决方案? 虽然数据是字典类型,但它在dataframe中存储为字符串

考虑以下 df 例如:

In [2063]: df = pd.DataFrame({'col1':[{"test":"23","test1":"12"}, {"test":"24","test1":"19","test3":"24"}, {"test":"27","test1":"20","test3":"21","test4":"40"}], 'col2':[1992, 1993, 1994]})

In [2064]: df
Out[2064]: 
                                                col1  col2
0                      {'test': '23', 'test1': '12'}  1992
1       {'test': '24', 'test1': '19', 'test3': '24'}  1993
2  {'test': '27', 'test1': '20', 'test3': '21', '...  1994

您可以使用 df.apply with df.explode():

In [2085]: df.col1 = df.col1.apply(lambda x: list(x.items()))

In [2086]: df = df.explode('col1')

In [2091]: df[['col1_a', 'col1_b']] = pd.DataFrame(df.col1.tolist(), index=df.index)

In [2093]: df = df[['col1_a', 'col1_b', 'col2']]

In [2094]: df
Out[2094]: 
  col1_a col1_b  col2
0   test     23  1992
0  test1     12  1992
1   test     24  1993
1  test1     19  1993
1  test3     24  1993
2   test     27  1994
2  test1     20  1994
2  test3     21  1994
2  test4     40  1994

将字典值扩展到列,然后 melt/pivot 向下 table。

df = pd.DataFrame([[{"test":"23","test1":"12"},1992],
[{"test":"24","test1":"19","test3":"24"},1993],
[{"test":"27","test1":"20","test3":"21","test4":"40"},1994]],columns=['c1','c2'])

pd.DataFrame(df['c1'].values.tolist(), index=df.c2) \
    .reset_index() \
    .melt(id_vars='c2',var_name='col1_a',value_name='col1_b') \
    .dropna()

输出:

    c2  col1_a  col1_b
0   1992    test    23
1   1993    test    24
2   1994    test    27
3   1992    test1   12
4   1993    test1   19
5   1994    test1   20
7   1993    test3   24
8   1994    test3   21
11  1994    test4   40