如何使用 scrapy 访问以下 table 的所有特定数据？

Question

我正在尝试访问 url https://www.jefit.com/exercises/1/ 中以下 table 个元素的所有数据我尝试使用 scrapy 抓取数据。但是无法抓取并且出现了一些错误，我无法抓取所需的所有数据。请帮助我更正我的代码以抓取 'Name'、'images link'、'How to perform exercise' 和 table 中可用的所有其他数据。我正在尝试以下代码：

from scrapy.spider import Spider
from scrapy.selector import Selector
from myproject.items import getExercise

class MySpider(Spider):
   name = "getExercise"
   allowed_domains = ["www.jefit.com"]
   start_urls = ["https://www.jefit.com/exercises/1/" ]

def parse(self, response):

   item = getExercise()
   item['exerciseName']=response.xpath('//table[@class = "JefitMainTable"]/tbody/tr/td[2]/table[2]/thead/tr/th/text()').extract()
   return item

Answer 1

尝试在您的 XPath 中将 /tbody 替换为 //。

这是仅在浏览器 DOM 中检查 XPath 时的常见问题，因为浏览器会自动将 tbody 元素放入表中。

在 scrapy shell:

中尝试 XPath 表达式通常是个好主意

$ scrapy shell https://www.jefit.com/exercises/1/
>>> response.xpath('//table[@class = "JefitMainTable"]/tbody/tr/td[2]/table[2]/thead/tr/th/text()').extract()
[]
>>> response.xpath('//table[@class = "JefitMainTable"]//tr/td[2]/table[2]/thead/tr/th/text()').extract()
[u'Band Cross Over']

如何使用 scrapy 访问以下 table 的所有特定数据？

How do I access all the specific datas of the following table using scrapy?

html

xpath

scrapy