Pandas - 合并具有相似值的行(名称拼写变体)

Pandas - Combine rows with similar values (name spelling variations)

我有以下 Python Pandas 数据框:


   Name        Sales Qty
0 JOHN BARNES   10
1 John Barnes    5
2 John barnes    4
3 Peter K.       4
4 Peter K        6
5 Peter Krammer  5
6 Charles        3
7 CHARLES        2
8 Julie Moore    3
9 Julie moore    7
10

And many more, with same name spelling variations.

我想合并具有相似值的行,这样我就有了以下数据框:

  Name           Sales Qty
0 John Barness   19
1 Peter Krammer  15
2 Charles         5
3 Julie Moore    10

and many more

我该怎么办?

正如您在评论中看到的那样,要求含糊不清,但我已将我所知道的总数制成表格。我通过将名称小写并删除句点来计算总数,然后使用 str.title().

将其转换为大写
import pandas as pd
import io

data = '''
 Name Sales
0 "JOHN BARNES" 10
1 "John Barnes" 5
2 "John barnes" 4
3 "Peter K." 4
4 "Peter K" 6
5 "Peter Krammer" 5
6 "Charles"  3
7 "CHARLES"  2
8 "Julie Moore" 3
9 "Julie moore" 7
'''

df = pd.read_csv(io.StringIO(data), sep='\s+')
df['lower'] = df['Name'].str.lower()
df['lower'] = df['lower'].str.replace('.','')
new = df.groupby('lower')['Sales'].sum().reset_index()
new['lower'] = new['lower'].str.title()

new
    lower   Sales
0   Charles 5
1   John Barnes 19
2   Julie Moore 10
3   Peter K 10
4   Peter Krammer   5