使用 Scrapy 提取时的问题
Issues while extracting with Scrapy
我正在试验 Scrapy,目前正在尝试以下方法
scrapy shell https://github.com/search?p=1&q=React+Django&type=Users
# FName LName
response.css(".mr-1::text").get()
# Headline
response.css(".mb-1::text").get()
# Location
response.css("#user_search_results .mr-3:nth-child(1)::text").get()
# Email
response.css(".Link--muted::attr(href)").get()
我现在 运行 这两个问题:
response.css(".mb-1::text").get()
Expected: Software Engineer interested in Java, Python, Ruby, Groovy, Bash, Clojure, React-Native, and Docker. Focus: Testing, CI, and Micro-Services.
Result: Software Engineer interested in Java, Python, Ruby, Groovy, Bash, Clojure,
response.css(".Link--muted::attr(href)").get()
Expected: djangofan@gmail.com
Result: None
你对我在这里做错了什么有什么建议吗?
对于这些情况,请使用 xpath 而不是 css,因为有多个 .mb-1
,您需要隔离第一个并获取包含其所有子元素的文本。
示例:
''.join(response.xpath('(//p[contains(@class, "mb-1")])[1]//text()').extract())
会给你:
Software Engineer interested in Java, Python, Ruby, Groovy, Bash, Clojure, React-Native, and Docker. Focus: Testing, CI, and Micro-Services.
我正在试验 Scrapy,目前正在尝试以下方法
scrapy shell https://github.com/search?p=1&q=React+Django&type=Users
# FName LName
response.css(".mr-1::text").get()
# Headline
response.css(".mb-1::text").get()
# Location
response.css("#user_search_results .mr-3:nth-child(1)::text").get()
# Email
response.css(".Link--muted::attr(href)").get()
我现在 运行 这两个问题:
response.css(".mb-1::text").get()
Expected: Software Engineer interested in Java, Python, Ruby, Groovy, Bash, Clojure, React-Native, and Docker. Focus: Testing, CI, and Micro-Services.
Result: Software Engineer interested in Java, Python, Ruby, Groovy, Bash, Clojure,
response.css(".Link--muted::attr(href)").get()
Expected: djangofan@gmail.com
Result: None
你对我在这里做错了什么有什么建议吗?
对于这些情况,请使用 xpath 而不是 css,因为有多个 .mb-1
,您需要隔离第一个并获取包含其所有子元素的文本。
示例:
''.join(response.xpath('(//p[contains(@class, "mb-1")])[1]//text()').extract())
会给你:
Software Engineer interested in Java, Python, Ruby, Groovy, Bash, Clojure, React-Native, and Docker. Focus: Testing, CI, and Micro-Services.