Jupyter Notebook Ipython:Groupby 基于字母顺序的值
Jupyter Notebook Ipython: Groupby based on the values alphabetically
我是第一次使用jupyter notebook。我试图对 csv 的一列进行分组并获取值的计数。我用这段代码得到了以下结果。
import pandas
pandas.read_csv('a.csv', sep=',')
df.groupby('name').name.count()
name
>Aa</TOPONYM> 4
>Aachen</TOPONYM> 5
>Aartselaar</TOPONYM> 1
>Abadan</TOPONYM> 1
>Abaya</TOPONYM> 1
>Abba</TOPONYM> 12
>Abbey 2
>Abbeydale</TOPONYM> 1
>Abbot</TOPONYM> 2
>Abbots 3
>Abbotsford</TOPONYM> 22
>Abbotsinch</TOPONYM> 5
>Abbotts 1
>Abel</TOPONYM> 1
>Aberchirder</TOPONYM> 2
>Aberdare</TOPONYM> 3
>Aberdeen 1
>Aberdeen</TOPONYM> 163
>Aberdeenshire</TOPONYM> 286
>Aberdour</TOPONYM> 9
>Aberfan</TOPONYM> 1
>Aberfeldy</TOPONYM> 16
>Abergavenny</TOPONYM> 4
>Aberlady 1
>Aberlady</TOPONYM> 3
>Abernethy</TOPONYM> 1
>Abertay 1
>Abertillery</TOPONYM> 6
>Abha</TOPONYM> 2
>Abidjan</TOPONYM> 10
...
>Zakho</TOPONYM> 20
>Zakopane</TOPONYM> 1
>Zambezi 2
>Zambezi</TOPONYM> 8
>Zambia</TOPONYM> 19
>Zamboanga</TOPONYM> 4
>Zandak</TOPONYM> 3
>Zanzibar</TOPONYM> 11
>Zaragosa</TOPONYM> 1
>Zaragoza</TOPONYM> 4
>Zeebrugge</TOPONYM> 28
>Zeeland</TOPONYM> 2
>Zemun</TOPONYM> 1
>Zenica</TOPONYM> 12
>Zermatt</TOPONYM> 5
>Zetland</TOPONYM> 1
>Zhizhong</TOPONYM> 1
>Zhongshan</TOPONYM> 2
>Zhuhai</TOPONYM> 1
>Zimbabwe</TOPONYM> 377
>Znamenskoye</TOPONYM> 1
>Zoetermeer</TOPONYM> 1
>Zola</TOPONYM> 1
>Zomba</TOPONYM> 3
>Zulu</TOPONYM> 1
>Zululand</TOPONYM> 2
>Zuni</TOPONYM> 2
>Zurich</TOPONYM> 86
>Zvornik</TOPONYM> 3
>Zwolle</TOPONYM> 1
Name: name, Length: 8585, dtype: int64
是否可以按字母顺序获取计数,首先我应该 运行 带有字母 a 的命令,它应该给出带有 a 然后下一个 b 的所有值,依此类推。或者是否可以让值从 100 个值开始跳过。
我的真实数据是这样的:
<TOPONYM geonameid="2657540" lat="51.24827" lon="-0.76389" >Aldershot</TOPONYM>
<TOPONYM geonameid="3037854" lat="49.9" lon="2.3" >Amiens</TOPONYM>
<TOPONYM geonameid="6216857" lat="-43.59832" lon="171.55011" >Alaska</TOPONYM>
<TOPONYM geonameid="3037854" lat="49.9" lon="2.3" >Amiens</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
<TOPONYM geonameid="7216668" lat="28.0106" lon="-82.1184" >Alabama</TOPONYM>
<TOPONYM geonameid="5884078" lat="48.98339" lon="-73.34907" >Ally</TOPONYM>
<TOPONYM geonameid="2507480" lat="36.7525" lon="3.04197" >Algiers</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
您可以使用 select 第一个字母 by str[1]
然后使用 value_counts
:
df = pandas.read_csv('a.csv')
a = df['name'].str[0].value_counts().rename_axis('alph').reset_index(name='count')
第二个字母 groupby
的另一个解决方案:
a = df['name'].groupby(df['name'].str[0]).count().reset_index(name='count')
a = df['name'].groupby(df['name'].str[0]).size().reset_index(name='count')
我是第一次使用jupyter notebook。我试图对 csv 的一列进行分组并获取值的计数。我用这段代码得到了以下结果。
import pandas
pandas.read_csv('a.csv', sep=',')
df.groupby('name').name.count()
name
>Aa</TOPONYM> 4
>Aachen</TOPONYM> 5
>Aartselaar</TOPONYM> 1
>Abadan</TOPONYM> 1
>Abaya</TOPONYM> 1
>Abba</TOPONYM> 12
>Abbey 2
>Abbeydale</TOPONYM> 1
>Abbot</TOPONYM> 2
>Abbots 3
>Abbotsford</TOPONYM> 22
>Abbotsinch</TOPONYM> 5
>Abbotts 1
>Abel</TOPONYM> 1
>Aberchirder</TOPONYM> 2
>Aberdare</TOPONYM> 3
>Aberdeen 1
>Aberdeen</TOPONYM> 163
>Aberdeenshire</TOPONYM> 286
>Aberdour</TOPONYM> 9
>Aberfan</TOPONYM> 1
>Aberfeldy</TOPONYM> 16
>Abergavenny</TOPONYM> 4
>Aberlady 1
>Aberlady</TOPONYM> 3
>Abernethy</TOPONYM> 1
>Abertay 1
>Abertillery</TOPONYM> 6
>Abha</TOPONYM> 2
>Abidjan</TOPONYM> 10
...
>Zakho</TOPONYM> 20
>Zakopane</TOPONYM> 1
>Zambezi 2
>Zambezi</TOPONYM> 8
>Zambia</TOPONYM> 19
>Zamboanga</TOPONYM> 4
>Zandak</TOPONYM> 3
>Zanzibar</TOPONYM> 11
>Zaragosa</TOPONYM> 1
>Zaragoza</TOPONYM> 4
>Zeebrugge</TOPONYM> 28
>Zeeland</TOPONYM> 2
>Zemun</TOPONYM> 1
>Zenica</TOPONYM> 12
>Zermatt</TOPONYM> 5
>Zetland</TOPONYM> 1
>Zhizhong</TOPONYM> 1
>Zhongshan</TOPONYM> 2
>Zhuhai</TOPONYM> 1
>Zimbabwe</TOPONYM> 377
>Znamenskoye</TOPONYM> 1
>Zoetermeer</TOPONYM> 1
>Zola</TOPONYM> 1
>Zomba</TOPONYM> 3
>Zulu</TOPONYM> 1
>Zululand</TOPONYM> 2
>Zuni</TOPONYM> 2
>Zurich</TOPONYM> 86
>Zvornik</TOPONYM> 3
>Zwolle</TOPONYM> 1
Name: name, Length: 8585, dtype: int64
是否可以按字母顺序获取计数,首先我应该 运行 带有字母 a 的命令,它应该给出带有 a 然后下一个 b 的所有值,依此类推。或者是否可以让值从 100 个值开始跳过。
我的真实数据是这样的:
<TOPONYM geonameid="2657540" lat="51.24827" lon="-0.76389" >Aldershot</TOPONYM>
<TOPONYM geonameid="3037854" lat="49.9" lon="2.3" >Amiens</TOPONYM>
<TOPONYM geonameid="6216857" lat="-43.59832" lon="171.55011" >Alaska</TOPONYM>
<TOPONYM geonameid="3037854" lat="49.9" lon="2.3" >Amiens</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
<TOPONYM geonameid="7216668" lat="28.0106" lon="-82.1184" >Alabama</TOPONYM>
<TOPONYM geonameid="5884078" lat="48.98339" lon="-73.34907" >Ally</TOPONYM>
<TOPONYM geonameid="2507480" lat="36.7525" lon="3.04197" >Algiers</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
<TOPONYM geonameid="2759794" lat="52.37403" lon="4.88969" >Amsterdam</TOPONYM>
您可以使用 select 第一个字母 by str[1]
然后使用 value_counts
:
df = pandas.read_csv('a.csv')
a = df['name'].str[0].value_counts().rename_axis('alph').reset_index(name='count')
第二个字母 groupby
的另一个解决方案:
a = df['name'].groupby(df['name'].str[0]).count().reset_index(name='count')
a = df['name'].groupby(df['name'].str[0]).size().reset_index(name='count')