用查找 table df 中的值替换 Pandas 系列中的多个字符串

Question

我有一个这样的 DataFrame，其中 type 列是用 ~:

分隔的字符串

id | types    |
---------------
1  | A1~B1    |
2  | B1       |
3  | A1~A2~B2 |

我需要根据查找 table 替换 'type' 列中的字符串，如下所示，其中两列都是字符串。这样做时，我需要确保最终输出在 types.

之间有逗号

type | description      |
------------------------
A1   | This is good     |
A2   | This is OK       |
B1   | This is not good |
B2   | This is bad      |

所以最终的输出是这样的：

id | types                                 |
--------------------------------------------
1  | This is good, This is not good        |
2  | This is not good                      |
3  | This is good, This is OK, This is bad |

我读到 .map() 是一个很好用的函数，但我一直无法弄清楚如何将它应用到这种情况下。提前致谢。

Answer 1

map 确实是一种方法，但是需要几个步骤才能获得您想要的输出。如果是以type为索引的系列，可以映射到lookup_table上。不过，首先，您需要拆分分隔符 ~:

df['types'] = (df.types.str.split('~', expand=True)
               .apply(lambda x:
                      ', '.join(x.map(lookup_table
                                     .set_index('type')['description'])
                               .fillna('')), 1)
               .str.strip(', '))

>>> df
   id                                  types
0   1         This is good, This is not good
1   2                       This is not good
2   3  This is good, This is OK, This is bad

Answer 2

让你的第一个 table 成为 df1，第二个 df2。

我假设第二个数据帧中的类型扮演了数据帧索引的角色。

df1.map(lambda x: ','.join([df2[i] for i in x.split('~')]))

Answer 3

以上大部分答案都使用 apply，不会向量化。我建议使用 str.replace:

string_map = {
    'A1': 'This is good',
    'A2': 'This is OK',
    'B1': 'This is not good',
    'B2': 'This is bad',
    '~': ', '
}
df = pd.DataFrame([{'type': 'A1~B1'}, {'type': 'B1'}, {'type': 'A1~A2~B2'}])
df_desc = df.copy()
for key, value in string_map.items():
    df_desc['type'] = df_desc['type'].str.replace(key, value)

在这里，我假设映射字典中的映射数量远小于 DataFrame 中的行数。

如果您在 DataFrame 中有 string_map（称之为 df_map），您可以通过运行以下内容从中创建字典：string_map = df_map.set_index('type')['description'].to_dict()。确保 df_map.

中有一行 {type: '~', 'description': ', '}

Answer 4

一行

df.types.str.replace('~', '|').agg(lambda k: df2.loc[df2.type.str.contains(k)].description.str.cat(sep=', ')

解释：

您可以使用replace将~替换为|。这样，您将获得诸如

之类的字符串

A1|B1

可以使用 str.contains 轻松搜索，例如

df2.loc[df2.type.str.contains('A1|B1')]

returns

    type    description
0   A1  This is good
2   B1  This is not good

要将这些 description 值连接到 {}, {} 中，只需使用 str.cat。所以上面给出了

...description.str.cat(sep=', ')

'This is good, This is not good'

Answer 5

您可以创建一个系列映射 type 到 description:

s = df_types.set_index('type')['description']

然后通过列表理解映射您的价值观：

df['types'] = [', '.join(map(s.get, x.split('~'))) for x in df['types'].values]

pd.Series.map 也可以使用类似的逻辑，但效率可能较低。

Answer 6

使用 get_dummies，然后 replace（重命名）列，然后 dot

newdf=df1['types'].str.get_dummies(sep='~').rename(columns=dict(zip(df2.type,df2.description+',')))
newdf.dot(newdf.columns)
Out[232]: 
id
1          This is good,This is not good,
2                           This is good,
3    This is good,This is OK,This is bad,
dtype: object
newdf.dot(newdf.columns).str[:-1]
Out[233]: 
id
1          This is good,This is not good
2                           This is good
3    This is good,This is OK,This is bad
dtype: object

用查找 table df 中的值替换 Pandas 系列中的多个字符串

Replacing a multiple strings in a Pandas Series with values from a lookup table df

python

lookup

data-manipulation

pandas