分区（如果适用）

Question

我在数据框中有一列，其中包含大约 140 万行聊天对话，其中每个单元格中的一般格式为 (1)：

'name agent : conversation'

但是，并非列中的所有单元格都采用这种格式。有些单元格只是 (2):

'对话'

我使用以下代码仅获取结构类似于 (1) 的单元格的对话：

only_transcripts['msgText'] = only_transcripts['msgText'].str.partition(':', expand = True)[2]

但是，如果单元格不包含冒号 (:)，显然这段代码 returns 没有任何意义。

有没有一种快速的方法，可以避免 for 循环遍历每个实例，只在包含冒号的单元格中应用上述代码？

Answer 1

通过原始列添加 Series.fillna 以替换 NaNs，这也是另一种解决方案 split 并通过索引

选择第二个列表

only_transcripts['msgText'] =(only_transcripts['msgText'].str.split(':')
                                                         .str[1]
                                                         .fillna(only_transcripts['msgText']))

分区（如果适用）

Partition if applicable

python

dataframe

pandas

partition