使用 Python lxml returns adblocker 值进行数据抓取

Question

我目前正在 HTML 为我在 Discord 中创建的机器人抓取网页中的一些数据。我以前使用 lxml 成功地从另一个网站抓取 HTML，但是，我现在尝试抓取的网站正在检测广告拦截器，因此无论我尝试抓取什么数据，我都会收到相同的值；

我的代码如下 `导入系统从 lxml 导入 html 导入请求

def main(arg): 页面 = requests.get("https://fortnitetracker.com/profile/pc/" + arg) 树 = html.fromstring(page.content)

killdeath = tree.xpath('//div[@class="stats">K/d]/text()')
print(killdeath)`

我得到的值是 '\nPlease consider adding Fortnite Tracker to your adblock whitelist! Our ads support the development and hardware costs of running this site. Really hate ads? Become a

Answer 1

可能发生的情况是，您获得的初始页面实际上只有 "Please consider..." 文本，以及一堆 JavaScript 实际加载您看到的内容。（尝试打印出 page.content 以查看实际得到的结果。）

无论如何，因为 requests 库不是一个成熟的网络浏览器，它不会执行 JavaScript，所以你只会看到广告拦截器消息。

Answer 2

网站上写着：

To make use of our APIs we require you to use an API Key. To use the API key you need to pass it along as a header with your requests.

您是否将 header 添加到请求中？此外，我建议在 postman 或类似的应用程序中提出请求，以便您实际看到整个响应。

使用 Python lxml returns adblocker 值进行数据抓取

Data scraping with Python lxml returns adblocker value

python

lxml

web-scraping