转换 Pandas 系列以进行整数比较的简单方法

Question

我的很简单。以下代码，并希望 select 所有 highest_ranking 为 1 的团队。

import pandas as pd
table = pd.read_table('team_rankings.dat')
table.head()

rank    team    rating  highest_rank    highest_rating  
0   1   Germany 2097    1   2205    
1   2   Brazil  2086    1   2161    
2   3   Spain   2011    1   2147    
3   4   Portugal    1968    2   1991    
4   5   Argentina   1967    1   2128

type((table['highest_rank'])) 
pandas.core.series.Series

table.loc[(table['highest_rank']) < 2]

然后给我一个

TypeError: unorderable types: str() < int()

因为一些 highest_rank 条目是“-”。呃。执行此（整数）select离子的简单方法是什么？

Answer 1

用户 pd.to_numeric errors ='coerce' 即

df.loc[(pd.to_numeric(df['highest_rank'],errors='coerce')) < 2]

输出：

  rank       team  rating  highest_rank  highest_rating
0     1    Germany    2097             1            2205
1     2     Brazil    2086             1            2161
2     3      Spain    2011             1            2147
4     5  Argentina    1967             1            2128

Answer 2

您可以将“-”解析为 NaN 值。这可能会帮助您完成更多未来的任务。

table = pd.read_table('team_rankings.dat', na_values="-")

见https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

转换 Pandas 系列以进行整数比较的简单方法

Simple way to convert a Pandas Series for integer comparison

python

indexing

where

multiple-columns

pandas