HTML 使用 Python 抓取,document_fromstring 为空
HTML Scraping with Python, document_fromstring is empty
我正在尝试使用 python 从网站中提取一些数据。我找到了一个 (document 完全符合我的问题。
但是当我运行提供的代码
from lxml import html
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')
print 'Buyers: ', buyers
print 'Prices: ', prices
我收到一个错误:
File "C:\Python27\lib\site-packages\lxml\html\__init__.py", line 617, in document_fromstring
"Document is empty")
ParserError: Document is empty
有人知道问题出在哪里吗?
你的脚本对我来说很好用。我得到了输出:
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices: ['.95', '.37', '.26', '.25', '.25', '.99', '.57', '.49', '.47', '.86', '.11', '.98', '.27', '.50', '.85', '.26', '.68', '.00', '4.07', '.09']
我建议您试试latest lxml package. And check that desired webpage您现在可以使用。
我正在尝试使用 python 从网站中提取一些数据。我找到了一个 (document 完全符合我的问题。
但是当我运行提供的代码
from lxml import html
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')
print 'Buyers: ', buyers
print 'Prices: ', prices
我收到一个错误:
File "C:\Python27\lib\site-packages\lxml\html\__init__.py", line 617, in document_fromstring
"Document is empty")
ParserError: Document is empty
有人知道问题出在哪里吗?
你的脚本对我来说很好用。我得到了输出:
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices: ['.95', '.37', '.26', '.25', '.25', '.99', '.57', '.49', '.47', '.86', '.11', '.98', '.27', '.50', '.85', '.26', '.68', '.00', '4.07', '.09']
我建议您试试latest lxml package. And check that desired webpage您现在可以使用。