过滤 Pandas 数据框或系列中的值
Filtering Values in Pandas Dataframe or Series
我正在尝试从 pandas 数据框中的列中过滤值,但我似乎收到的是布尔值而不是实际值。
我正在尝试按月和年过滤我们的数据。
在下面的代码中,您会看到我只按年份过滤,但我以不同的方式多次尝试了月份和年份:
In [1]: import requests
In [2]: import pandas as pd # pandas
In [3]: import datetime as dt # module for manipulating dates and times
In [4]: url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"
In [5]: source = requests.get(url).text
In [6]: from io import StringIO, BytesIO
In [7]: s = StringIO(source)
In [8]: election_data = pd.DataFrame.from_csv(s, index_col=None).convert_objects(convert_dates="coerce", convert_numeric=True)
In [9]: election_data.head(n=3)
Out[9]:
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
In [10]: start_date = pd.Series(election_data["Start Date"])
...: start_date.head(n=3)
...:
Out[10]:
0 2012-11-04
1 2012-11-03
2 2012-11-03
Name: Start Date, dtype: datetime64[ns]
In [11]: filtered = start_date.map(lambda x: x.year == 2012)
In [12]: filtered
Out[12]:
0 True
1 True
2 True
...
587 False
588 False
589 False
Name: Start Date, dtype: bool
我认为你需要 read_csv
with url
address first and then boolean indexing
with mask created by year
and month
:
election_data = pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv', parse_dates=[1,2,3])
print (election_data.head(3))
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
print (election_data.dtypes)
Pollster object
Start Date datetime64[ns]
End Date datetime64[ns]
Entry Date/Time (ET) datetime64[ns]
Number of Observations float64
Population object
Mode object
Obama float64
Romney float64
Undecided float64
Other float64
Pollster URL object
Source URL object
Partisan object
Affiliation object
Question Text float64
Question Iteration int64
dtype: object
election_data[election_data["Start Date"].dt.year == 2012]
election_data[(election_data["Start Date"].dt.year == 2012) & (election_data["Start Date"].dt.month== 10)]
如果将 Start Date
作为索引
,则可以使用 pandas 日期过滤
获取全部2012
election_data.set_index('Start Date')['2012']
获取全部Jan, 2012
election_data.set_index('Start Date')['2012-01']
获取 Jan 1, 2012
和 Jan 13, 2012
之间的所有内容
election_data.set_index('Start Date')['2012-01-01':'2012-01-13]
我正在尝试从 pandas 数据框中的列中过滤值,但我似乎收到的是布尔值而不是实际值。 我正在尝试按月和年过滤我们的数据。 在下面的代码中,您会看到我只按年份过滤,但我以不同的方式多次尝试了月份和年份:
In [1]: import requests
In [2]: import pandas as pd # pandas
In [3]: import datetime as dt # module for manipulating dates and times
In [4]: url = "http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv"
In [5]: source = requests.get(url).text
In [6]: from io import StringIO, BytesIO
In [7]: s = StringIO(source)
In [8]: election_data = pd.DataFrame.from_csv(s, index_col=None).convert_objects(convert_dates="coerce", convert_numeric=True)
In [9]: election_data.head(n=3)
Out[9]:
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
In [10]: start_date = pd.Series(election_data["Start Date"])
...: start_date.head(n=3)
...:
Out[10]:
0 2012-11-04
1 2012-11-03
2 2012-11-03
Name: Start Date, dtype: datetime64[ns]
In [11]: filtered = start_date.map(lambda x: x.year == 2012)
In [12]: filtered
Out[12]:
0 True
1 True
2 True
...
587 False
588 False
589 False
Name: Start Date, dtype: bool
我认为你需要 read_csv
with url
address first and then boolean indexing
with mask created by year
and month
:
election_data = pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv', parse_dates=[1,2,3])
print (election_data.head(3))
Pollster Start Date End Date Entry Date/Time (ET) \
0 Politico/GWU/Battleground 2012-11-04 2012-11-05 2012-11-06 08:40:26
1 YouGov/Economist 2012-11-03 2012-11-05 2012-11-26 15:31:23
2 Gravis Marketing 2012-11-03 2012-11-05 2012-11-06 09:22:02
Number of Observations Population Mode Obama Romney \
0 1000.0 Likely Voters Live Phone 47.0 47.0
1 740.0 Likely Voters Internet 49.0 47.0
2 872.0 Likely Voters Automated Phone 48.0 48.0
Undecided Other Pollster URL \
0 6.0 NaN http://elections.huffingtonpost.com/pollster/p...
1 3.0 NaN http://elections.huffingtonpost.com/pollster/p...
2 4.0 NaN http://elections.huffingtonpost.com/pollster/p...
Source URL Partisan Affiliation \
0 http://www.politico.com/news/stories/1112/8338... Nonpartisan None
1 http://cdn.yougov.com/cumulus_uploads/document... Nonpartisan None
2 http://www.gravispolls.com/2012/11/gravis-mark... Nonpartisan None
Question Text Question Iteration
0 NaN 1
1 NaN 1
2 NaN 1
print (election_data.dtypes)
Pollster object
Start Date datetime64[ns]
End Date datetime64[ns]
Entry Date/Time (ET) datetime64[ns]
Number of Observations float64
Population object
Mode object
Obama float64
Romney float64
Undecided float64
Other float64
Pollster URL object
Source URL object
Partisan object
Affiliation object
Question Text float64
Question Iteration int64
dtype: object
election_data[election_data["Start Date"].dt.year == 2012]
election_data[(election_data["Start Date"].dt.year == 2012) & (election_data["Start Date"].dt.month== 10)]
如果将 Start Date
作为索引
获取全部2012
election_data.set_index('Start Date')['2012']
获取全部Jan, 2012
election_data.set_index('Start Date')['2012-01']
获取 Jan 1, 2012
和 Jan 13, 2012
之间的所有内容
election_data.set_index('Start Date')['2012-01-01':'2012-01-13]