计算列表中每个项目在 pandas 数据框列中出现的次数,用逗号分隔值
Count number of times each item in list occurs in a pandas dataframe column with comma separates vales
我有一个列表:
citylist = ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Miami']
和具有这些值的 pandas Dataframe df1
first last city email
John Travis New York a@email.com
Jim Perterson San Franciso, Los Angeles b@email.com
Nancy Travis Chicago b1@email.com
Jake Templeton Los Angeles b3@email.com
John Myers New York b4@email.com
Peter Johnson San Franciso, Chicago b5@email.com
Aby Peters Los Angeles b6@email.com
Amy Thomas San Franciso b7@email.com
Jessica Thompson Los Angeles, Chicago, New York b8@email.com
我想计算 citylist 中每个城市在数据框列中出现的次数 'city':
New York 3
San Francisco 3
Los Angeles 4
Chicago 3
Miami 0
目前我有
dftest = df1.groupby(by='city', as_index=False).agg({'id': pd.Series.nunique})
它结束将“洛杉矶、芝加哥、纽约”计为 1 个唯一值
有什么方法可以像我上面显示的那样获得计数吗?
谢谢
试试这个:
先修复数据:
df1['city'] = df1['city'].str.replace('Franciso', 'Francisco')
使用这个:
(df1['city'].str.split(', ')
.explode()
.value_counts(sort=False)
.reindex(citylist, fill_value=0))
输出:
New York 3
San Francisco 3
Los Angeles 4
Chicago 3
Miami 0
Name: city, dtype: int64
您可以使用 Series.str.count
:
pd.Series([df['city'].str.count(c).sum() for c in citylist], index=citylist)
@ScottBoston 建议的另一种更有效的方法
pd.Series({c:sum(c in i for i in df['city']) for c in citylist})
New York 3
San Francisco 0
Los Angeles 4
Chicago 3
Miami 0
dtype: int64
我有一个列表:
citylist = ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Miami']
和具有这些值的 pandas Dataframe df1
first last city email
John Travis New York a@email.com
Jim Perterson San Franciso, Los Angeles b@email.com
Nancy Travis Chicago b1@email.com
Jake Templeton Los Angeles b3@email.com
John Myers New York b4@email.com
Peter Johnson San Franciso, Chicago b5@email.com
Aby Peters Los Angeles b6@email.com
Amy Thomas San Franciso b7@email.com
Jessica Thompson Los Angeles, Chicago, New York b8@email.com
我想计算 citylist 中每个城市在数据框列中出现的次数 'city':
New York 3
San Francisco 3
Los Angeles 4
Chicago 3
Miami 0
目前我有
dftest = df1.groupby(by='city', as_index=False).agg({'id': pd.Series.nunique})
它结束将“洛杉矶、芝加哥、纽约”计为 1 个唯一值
有什么方法可以像我上面显示的那样获得计数吗? 谢谢
试试这个:
先修复数据:
df1['city'] = df1['city'].str.replace('Franciso', 'Francisco')
使用这个:
(df1['city'].str.split(', ')
.explode()
.value_counts(sort=False)
.reindex(citylist, fill_value=0))
输出:
New York 3
San Francisco 3
Los Angeles 4
Chicago 3
Miami 0
Name: city, dtype: int64
您可以使用 Series.str.count
:
pd.Series([df['city'].str.count(c).sum() for c in citylist], index=citylist)
@ScottBoston 建议的另一种更有效的方法
pd.Series({c:sum(c in i for i in df['city']) for c in citylist})
New York 3
San Francisco 0
Los Angeles 4
Chicago 3
Miami 0
dtype: int64