Python Web Scraping title in a special div & Page 1 + 15

Question

大家好，下面是问题。我想从网站上抓取数据。但是有两个问题：

我已设置好查看价格。这非常有效，但它只适用于第 1 和 15 页。但我想要 1-15 的所有内容，如 1,2,3,4,5 等
我的问题是产品标题被命名为 div class title 我如何 grep 该数据？因为还有很多其他的称号。我只要威士忌的名字。

一些代码：

from lxml import html
import requests

urls = ['http://whiskey.de/shop/Aktuell/']

for url in urls:
    for number in range(1,15):
        page = requests.get(url+str(number))

tree = html.fromstring(page.text)

prices = tree.xpath('//div[@class="price "]/text()')
names = tree.xpath('//div[@class="column-inner infos"]/text()')

print 'Whiskey Preis: ', prices
print 'Whiskey Names: ', names

我要抓取的网站是 this。

Answer 1

这是我想要的 fix/improve:

代码缩进不正确，需要将HTML-parsing代码移动到循环中body
a url whisky.de/shop/Aktuell/1 对于页码 1 不起作用，而不是指定页码：whisky.de/shop/Aktuell/
获取价格和标题我会使用 CSS selectors（您可以继续使用 XPath 表达式，这没有问题，这只是为了举例和学习新东西）

改进后的代码：

from lxml import html
import requests


urls = ['http://whiskey.de/shop/Aktuell/']

for url in urls:
    for number in range(1, 15):
        page_url = url + str(number) if number > 1 else url
        page = requests.get(page_url)

        tree = html.fromstring(page.text)

        prices = tree.cssselect('div#content div.price')
        names = tree.cssselect('div#content div.title a')

        print 'Whiskey Preis: ', [price.text for price in prices]
        print 'Whiskey Names: ', [name.text for name in names]

Python Web Scraping title in a special div & Page 1 + 15

Python Web Scraping title in a special div & Page 1 + 15

css

python

xpath

request

web-scraping