如何提取 pd.Dataframe 列中全名的状态?
How to extract status in full name in pd.Dataframe column?
我有数据集。这是'Name'的专栏:
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
...
151 Pears, Mrs. Thomas (Edith Wearne)
152 Meo, Mr. Alfonzo
153 van Billiard, Mr. Austin Blyler
154 Olsen, Mr. Ole Martin
155 Williams, Mr. Charles Duane
并且需要提取名字、状态和名字。当我在简单的字符串上尝试这个时,没问题:
full_name="Braund, Mr. Owen Harris"
first_name=full_name.split(',')[0]
second_name=full_name.split('.')[1]
print('First name:',first_name)
print('Second name:',second_name)
status = full_name.replace(first_name, '').replace(',','').split('.')[0]
print('Status:',status)
>First name: Braund
>Second name: Owen Harris
>Status: Mr
但是在尝试使用 pandas 执行此操作后,我失败了,状态为:
df['first_Name'] = df['Name'].str.split(',').str.get(0) #its ok, worsk well
但在此之后:
status= df['Name'].str.replace(df['first_Name'], '').replace(',','').split('.').str.get(0)
我弄错了:
>>TypeError: 'Series' objects are mutable, thus they cannot be hashed
有哪些可能的解决方案?
编辑:感谢您的回答和提取列。我愿意
def extract_name_data(row):
row.str.extract('(?P<first_name>[^,]+), (?P<status>\w+.) (?P<second_name>[^(]+\w) ?')
last_name = row['second_name']
title = row['status']
first_name = row['first_name']
return first_name, second_name, status
并获得
AttributeError: 'str' object has no attribute 'str'
可以做什么?行是 df['Name']
您可以将 str.extract
与 named capturing groups 一起使用:
df['Name'].str.extract('(?P<first_name>[^,]+), (?P<status>\w+.) (?P<second_name>[^(]+\w) ?')
输出:
first_name status second_name
0 Braund Mr. Owen Harris
1 Cumings Mrs. John Bradley
2 Heikkinen Miss. Laina
3 Futrelle Mrs. Jacques Heath
4 Allen Mr. William Henry
5 Pears Mrs. Thomas
6 Meo Mr. Alfonzo
7 van Billiard Mr. Austin Blyler
8 Olsen Mr. Ole Martin
9 Williams Mr. Charles Duane
您也可以将您的原始代码稍作修改放入 Pandas .apply()
函数中,如下所示:
只需将 Python 中的变量名替换为 Pandas 中的列名。
例如,在 .apply()
函数的 lambda 函数中将 full_name
替换为 x['Name']
,将 first_name
替换为 x['first_Name']
:
df['status'] = df.apply(lambda x: x['Name'].replace(x['first_Name'], '').replace(',','').split('.')[0], axis=1)
虽然这可能不是最有效的方法,但它是一种轻松地将 Python 中的现有代码修改为 Pandas 中可用版本的方法。
结果:
print(df)
Name first_Name status
0 Braund, Mr. Owen Harris Braund Mr
1 Cumings, Mrs. John Bradley (Florence Briggs Th... Cumings Mrs
2 Heikkinen, Miss. Laina Heikkinen Miss
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Futrelle Mrs
4 Allen, Mr. William Henry Allen Mr
151 Pears, Mrs. Thomas (Edith Wearne) Pears Mrs
152 Meo, Mr. Alfonzo Meo Mr
153 van Billiard, Mr. Austin Blyler van Billiard Mr
154 Olsen, Mr. Ole Martin Olsen Mr
155 Williams, Mr. Charles Duane Williams Mr
我有数据集。这是'Name'的专栏:
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
...
151 Pears, Mrs. Thomas (Edith Wearne)
152 Meo, Mr. Alfonzo
153 van Billiard, Mr. Austin Blyler
154 Olsen, Mr. Ole Martin
155 Williams, Mr. Charles Duane
并且需要提取名字、状态和名字。当我在简单的字符串上尝试这个时,没问题:
full_name="Braund, Mr. Owen Harris"
first_name=full_name.split(',')[0]
second_name=full_name.split('.')[1]
print('First name:',first_name)
print('Second name:',second_name)
status = full_name.replace(first_name, '').replace(',','').split('.')[0]
print('Status:',status)
>First name: Braund
>Second name: Owen Harris
>Status: Mr
但是在尝试使用 pandas 执行此操作后,我失败了,状态为:
df['first_Name'] = df['Name'].str.split(',').str.get(0) #its ok, worsk well
但在此之后:
status= df['Name'].str.replace(df['first_Name'], '').replace(',','').split('.').str.get(0)
我弄错了:
>>TypeError: 'Series' objects are mutable, thus they cannot be hashed
有哪些可能的解决方案?
编辑:感谢您的回答和提取列。我愿意
def extract_name_data(row):
row.str.extract('(?P<first_name>[^,]+), (?P<status>\w+.) (?P<second_name>[^(]+\w) ?')
last_name = row['second_name']
title = row['status']
first_name = row['first_name']
return first_name, second_name, status
并获得
AttributeError: 'str' object has no attribute 'str'
可以做什么?行是 df['Name']
您可以将 str.extract
与 named capturing groups 一起使用:
df['Name'].str.extract('(?P<first_name>[^,]+), (?P<status>\w+.) (?P<second_name>[^(]+\w) ?')
输出:
first_name status second_name
0 Braund Mr. Owen Harris
1 Cumings Mrs. John Bradley
2 Heikkinen Miss. Laina
3 Futrelle Mrs. Jacques Heath
4 Allen Mr. William Henry
5 Pears Mrs. Thomas
6 Meo Mr. Alfonzo
7 van Billiard Mr. Austin Blyler
8 Olsen Mr. Ole Martin
9 Williams Mr. Charles Duane
您也可以将您的原始代码稍作修改放入 Pandas .apply()
函数中,如下所示:
只需将 Python 中的变量名替换为 Pandas 中的列名。
例如,在 .apply()
函数的 lambda 函数中将 full_name
替换为 x['Name']
,将 first_name
替换为 x['first_Name']
:
df['status'] = df.apply(lambda x: x['Name'].replace(x['first_Name'], '').replace(',','').split('.')[0], axis=1)
虽然这可能不是最有效的方法,但它是一种轻松地将 Python 中的现有代码修改为 Pandas 中可用版本的方法。
结果:
print(df)
Name first_Name status
0 Braund, Mr. Owen Harris Braund Mr
1 Cumings, Mrs. John Bradley (Florence Briggs Th... Cumings Mrs
2 Heikkinen, Miss. Laina Heikkinen Miss
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Futrelle Mrs
4 Allen, Mr. William Henry Allen Mr
151 Pears, Mrs. Thomas (Edith Wearne) Pears Mrs
152 Meo, Mr. Alfonzo Meo Mr
153 van Billiard, Mr. Austin Blyler van Billiard Mr
154 Olsen, Mr. Ole Martin Olsen Mr
155 Williams, Mr. Charles Duane Williams Mr