用 python 数据帧中的新结尾替换单词结尾

Question

我有一个充满法语单词、词尾和新词尾的 Dataframe。我想创建一个第 4 列来替代这个词：

word   |ending|new ending|what i want|
--------------------------------------
placer |cer   |ceras     |placeras   |
placer |cer   |cerait    |placerait  |
placer |cer   |ceront    |placeront  |
finir  |ir    |iras      |finiras    |

所以基本上是用第 3 列中的内容替换第 1 列中的等效内容。

有什么想法吗？

Answer 1

这是使用 .loc 访问器的一种方法：

import pandas as pd

df = pd.DataFrame({'word': ['placer', 'placer', 'placer'],
                   'ending': ['cer', 'cer', 'cer'],
                   'new_ending': ['ceras', 'cerait', 'ceront']})

df['result'] = df['word']
df['lens'] = df['ending'].map(len)

df.loc[pd.Series([i[-j:] for i, j in zip(df['word'], df['lens'])]) == df['ending'], 'result'] = \
pd.Series([i[:-j] for i, j in zip(df['word'], df['lens'])]) + df['new_ending']

df = df[['word', 'ending', 'new_ending', 'result']]

#      word ending new_ending     result
# 0  placer    cer      ceras   placeras
# 1  placer    cer     cerait  placerait
# 2  placer    cer     ceront  placeront

Answer 2

使用apply():

df['new_word'] = df.apply(
    lambda row: row['word'].replace(row['ending'], row['new ending']),
    axis=1
)
#     word ending new ending   new_word
#0  placer    cer      ceras   placeras
#1  placer    cer     cerait  placerait
#2  placer    cer     ceront  placeront
#3   finir     ir       iras    finiras

正如@jpp 所指出的，这种方法的一个警告是，如果结尾出现在字符串的中间，它将无法正常工作。

在这种情况下，请参考this post如何替换字符串的末尾。

Answer 3

这是另一个解决方案：

df.word.replace(df.ending, '', regex=True).str.cat(df["new ending"].astype(str))

和输出：

0     placeras
1    placerait
2    placeront

用 python 数据帧中的新结尾替换单词结尾

Replace ending of words with a new ending in python dataframe

python

string

text-mining

dataframe

pandas