Python 3 Pandas Filter/Extract 按多个列值,包括 <> 0

Python 3 Pandas Filter/Extract by multiple column values, including <> 0

使用来自 USASPENDING.gov 的公开可用的 csv 文件。能够从 Navy 中提取数据,但不知道添加第二个过滤器以排除具有 Dollarsobligated = 0 的所有记录的正确语法。

代码是:

import pandas as pd

df = pd.read_csv("2016_DOD_Contracts_Full_20160915.csv")
df.columns = [c.replace(' ','_') for c in df.columns]
new_df = df[(df.mod_agency == '1700: DEPT OF THE NAVY') & (df.dollarsobligated <> 0)]

# Export result to CSV
new_df.to_csv('example15.csv')

我收到一条错误消息,提示 <> 语法无效。网络上还没有 'does not equal 0' 的示例。

我认为您需要将 boolean indexing, because in Python3, <> was removed, thanks 中的 <> 替换为 !=

您也可以使用 str.replace:

df.columns = df.columns.str.replace(' ','_')
new_df = df[(df.mod_agency == '1700: DEPT OF THE NAVY') & (df.Dollarsobligated != 0)]

样本:

df = pd.DataFrame({'mod agency':['1700: DEPT OF THE NAVY',
                                 '1700: DEPT OF THE NAVY',
                                 '1800: DEPT OF THE NAVY'],
                   'Dollarsobligated':[1,0,0],
                   'C':[7,8,9]})

print (df)
   C  Dollarsobligated              mod agency
0  7                 1  1700: DEPT OF THE NAVY
1  8                 0  1700: DEPT OF THE NAVY
2  9                 0  1800: DEPT OF THE NAVY

df.columns = df.columns.str.replace(' ','_')
new_df = df[(df.mod_agency == '1700: DEPT OF THE NAVY') & (df.Dollarsobligated != 0)]

print (new_df)
   C  Dollarsobligated              mod_agency
0  7                 1  1700: DEPT OF THE NAVY

您必须使用“!=”而不是“<>”