从“lxml.etree 模块的_ElementUnicodeResult 对象”创建列表对象

Question

我是 Python 的新手，希望从列表网站上抓取房地产数据。我已成功从页面中提取文本，但返回的对象不是我所期望的。


# import modules
from lxml import html
import requests

# specify webpage to scrape
url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)

# scrape desired information
address_raw = tree.xpath('//a[@class="search-nav-link"]//text()')
price_raw = tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')

正如预期的那样，对象 address_raw 和 price_raw 是列表。但此列表中包含的值不是字符串，所获得的地址和价格立即可见。相反，他们都说 [_ElementUnicodeResult object of lxml.etree module]。在解释器中键入对象名称（例如 address_raw）会显示列表中的地址，print(address_raw) 也是如此。如何创建一个简单的地址和价格列表作为字符串，而列表值不显示为 [_ElementUnicodeResult object of lxml.etree module]?

Answer 1

您可以使用 str() 将对象转换为字符串，并使用 map() 将函数应用于列表的每个元素：

from lxml import html
import requests

url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)

address_raw = list(map(str, tree.xpath('//a[@class="search-nav-link"]//text()')))
price_raw = list(map(str, tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')))
print(type(address_raw[0])) # => <class 'str'>
print(type(price_raw[0]))   # => <class 'str'>

从“lxml.etree 模块的_ElementUnicodeResult 对象”创建列表对象

Create list object from "_ElementUnicodeResult object of lxml.etree module"

python

xpath

lxml

web-scraping