使用 python 中的 pandas 检索数据列上的匹配字数
Retrieving matching word count on a datacolumn using pandas in python
我有df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
Kumar Kumar is a keeper
和一个列表,
my_list=["one","good","ravi","ball"]
我正在尝试从 my_list 中获取至少有一个关键字的行。
我试过了,
mask=df["Description"].str.contains("|".join(my_list),na=False)
我得到 output_df,
Name Description
Ram Ram is one of ONe crickete
Sri Sri is one of the member
Ravi Ravi is a player, ravi is playing
Kumar there is a BALL
我还想添加 "Description" 中的关键字及其在单独列中的计数,
我想要的输出是,
Name Description pre-keys keys count
Ram Ram is one of ONe crickete one,good,ONe one,good 2
Sri Sri is one of the member one one 1
Ravi Ravi is a player, ravi is playing Ravi,ravi ravi 1
Kumar there is a BALL ball ball 1
使用str.findall
+ str.join
+ str.len
:
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')')
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
编辑:
import re
my_list=["ONE","good"]
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
用 str.findall
试了一下。
c = df.Description.str.findall('({})'.format('|'.join(my_list)))
df['keys'] = c.apply(','.join) # or c.str.join(',')
df['count'] = c.str.len()
df[df['count'] > 0]
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
我有df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
Kumar Kumar is a keeper
和一个列表, my_list=["one","good","ravi","ball"]
我正在尝试从 my_list 中获取至少有一个关键字的行。
我试过了,
mask=df["Description"].str.contains("|".join(my_list),na=False)
我得到 output_df,
Name Description
Ram Ram is one of ONe crickete
Sri Sri is one of the member
Ravi Ravi is a player, ravi is playing
Kumar there is a BALL
我还想添加 "Description" 中的关键字及其在单独列中的计数,
我想要的输出是,
Name Description pre-keys keys count
Ram Ram is one of ONe crickete one,good,ONe one,good 2
Sri Sri is one of the member one one 1
Ravi Ravi is a player, ravi is playing Ravi,ravi ravi 1
Kumar there is a BALL ball ball 1
使用str.findall
+ str.join
+ str.len
:
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')')
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
编辑:
import re
my_list=["ONE","good"]
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
用 str.findall
试了一下。
c = df.Description.str.findall('({})'.format('|'.join(my_list)))
df['keys'] = c.apply(','.join) # or c.str.join(',')
df['count'] = c.str.len()
df[df['count'] > 0]
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1