使用scrapy时如何避免被ban

How to avoid ban when uses scrapy

我经常被网站禁止，我在 scrapy 中设置 download_delay = 10，我尝试了一个包 fake_user_agent then I tried implementing tor and polipo, according to this site 配置没问题。但是在运行 1/2 次之后我又被封禁了！有人能帮我一下吗？

注意：scrapy-proxie我也想试试这个但是激活不了

对点击使用延迟
不是 tor - 来自一个地址的所有连接 - 错误，多次访问后轮换代理

并检查这个 post - web scraping etiquette

你应该看看 documentation 说的是什么。

Here are some tips to keep in mind when dealing with these kinds of sites:

rotate your user agent from a pool of well-known ones from browsers (google around to get a list of them)

disable cookies (see COOKIES_ENABLED) as some sites may use cookies to spot bot behaviour

use download delays (2 or higher). See DOWNLOAD_DELAY setting.

if possible, use Google cache to fetch pages, instead of hitting the sites directly use a pool of rotating IPs. For example, the free Tor project or paid services like ProxyMesh

use a highly distributed downloader that circumvents bans internally, so you can just focus on parsing clean pages. One example of such downloaders is Crawlera

使用scrapy时如何避免被ban

How to avoid ban when uses scrapy

scrapy

python-3.5