从 Baseball-reference Python 抓取文本数据
Grabbing text data from Baseball-reference Python
http://www.baseball-reference.com/players/split.cgi?id=aardsda01&year=2015&t=p
我想知道这个投手用什么手臂投球的数据。如果它是 table 我将能够获取数据,但我不知道如何获取文本。
David Aardsma \ARDS-mah\
David Allan Aardsma (twitter: @TheDA53)
Position: Pitcher
Bats: Right, Throws: Right
Height: 6' 3", Weight: 220 lb.
文本看起来像这样。我想得到 Throws:
.
之后的所有内容
如果你用BeautifulSoup
, you would find the b
tag by text Throws:
and get the following sibling来解决它:
>>> from urllib2 import urlopen
>>> from bs4 import BeautifulSoup
>>>
>>> url = "http://www.baseball-reference.com/players/split.cgi?id=aardsda01&year=2015&t=p"
>>> soup = BeautifulSoup(urlopen(url))
>>> soup.find("b", text='Throws:').next_sibling.strip()
u'Right'
http://www.baseball-reference.com/players/split.cgi?id=aardsda01&year=2015&t=p
我想知道这个投手用什么手臂投球的数据。如果它是 table 我将能够获取数据,但我不知道如何获取文本。
David Aardsma \ARDS-mah\
David Allan Aardsma (twitter: @TheDA53)
Position: Pitcher
Bats: Right, Throws: Right
Height: 6' 3", Weight: 220 lb.
文本看起来像这样。我想得到 Throws:
.
如果你用BeautifulSoup
, you would find the b
tag by text Throws:
and get the following sibling来解决它:
>>> from urllib2 import urlopen
>>> from bs4 import BeautifulSoup
>>>
>>> url = "http://www.baseball-reference.com/players/split.cgi?id=aardsda01&year=2015&t=p"
>>> soup = BeautifulSoup(urlopen(url))
>>> soup.find("b", text='Throws:').next_sibling.strip()
u'Right'