抓取的连接问题：只开始 2 次抓取中的 1 次（另一个被忽略，每 5-6 次尝试才开始更正）

Question

我正在做一个小的抓取脚本，仅供学习和个人使用（非营利）。我遇到的问题不是关于抓取，而是关于连接（我认为，但我访问该站点没有问题。我没有收到任何错误的请求错误）。我注意到抓取有时能正常工作，有时却不能。仅开始 2 次刮擦中的 1 次。然而，现在它“半途而废”（50% 是，50% 否）。在 5-6-7 次尝试中，B 系列被正确抓取了 1 次。

代码说明：代码通过 Firefox 作为代理连接到 Tor。然后用 2 个“for”循环开始 2 个不同的刮擦（A 系列和 B 系列）。目的是简单地抓取两个for循环的名称。

问题： 我没有收到任何错误，但 Serie B 抓取数据感觉好像被忽略了。只有系列 A 被抓取，没有系列 B（但它们具有相同的抓取代码）。 : 几天前两次抓取都正常，只是偶尔会发生意乙没有抓取的情况。然而，现在，在 5-6-7 次尝试中，乙级联赛被正确淘汰了 1 次。

凭直觉，我会说问题出在 Tor 连接上。我还尝试复制并粘贴 Tor 连接的代码……为 B 系列 for 循环输入它，以便 A 系列和 B 系列都具有 Tor 连接。最初它工作正常，意甲和意乙都在刮擦。在随后的尝试中，意乙并没有拼凑起来。

有什么问题吗？ Python代码问题？ Firefox 代理的 Tor 连接问题？其他？我应该改变什么？我该如何解决？如果我写的代码不正确，我可以写什么代码？谢谢

    ######## TOR CONNECTION WITH FIREFOX ########
    from selenium import webdriver
    from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
    import os
    
    tor_linux = os.popen('/home/james/.local/share/torbrowser/tbb/x86_64/tor-browser_en-US') 
    
    profile = FirefoxProfile('/home/james/.local/share/torbrowser/tbb/x86_64/tor-browser_en-US/Browser/TorBrowser/Data/Browser/profile.default')
    profile.set_preference('network.proxy.type', 1)
    profile.set_preference('network.proxy.socks', '127.0.0.1')
    profile.set_preference('network.proxy.socks_port', 9050)
    profile.set_preference("network.proxy.socks_remote_dns", False) 
    
    profile.update_preferences()
    
    firefox_options = webdriver.FirefoxOptions()
    firefox_options.binary_location = '/usr/bin/firefox' 
    
    driver = webdriver.Firefox(
        firefox_profile=profile, options=firefox_options, 
        executable_path='/usr/bin/geckodriver')
    ########################################################################    
    
    #I need this for subsequent insertion into the database
    Values_SerieA = []
    Values_SerieB = []
    
    
    #### SCRAPING SERIE A ####
    driver.minimize_window()
    driver.get("link")
    for SerieA in driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='tableCellParticipant__name']"):
        SerieA_text = SerieA.text
        Values_SerieA.append(tuple([SerieA_text])) #inserisco le squadre nell'elenco vuoto Values
        print(SerieA_text)
    driver.close
    enter code here
    
   #### SCRAPING SERIE B ######
    driver.minimize_window()
    driver.get("link")
    for SerieB in driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='tableCellParticipant__name']"):
        SerieB_text = SerieA.text
        Values_SerieB.append(tuple([SerieB_text])) #inserisco le squadre nell'elenco vuoto Values
        print(SerieB_text)
    driver.close

Answer 1

有几件事值得一提：

selenium 是同步的，因此在请求站点后使用 driver.implicity_wait(2) 会在您的 driver 开始寻找尚未加载到 DOM还
您正在尝试最小化驱动程序 window，即使您执行的最后一步是关闭驱动程序 window。尝试翻转系列 B 部分的前两行，然后在
之后立即放置 time.sleep(2) 或 driver.implicitly_wait(2)
我没有在驱动程序中使用代理，所以我不能告诉你这是否会造成连接问题。如果您能够在没有出现某种错误请求错误的情况下访问该站点，我会认为连接不是问题

===试试这个===

#### SCRAPING SERIE A ####

# request site
    driver.get("link")

# wait for it to load
    driver.implicitly_wait(2)

# once you're sure page is loaded, minimize window
    driver.minimize_window()

    for SerieA in driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='tableCellParticipant__name']"):
        SerieA_text = SerieA.text
        Values_SerieA.append(tuple([SerieA_text])) #inserisco le squadre nell'elenco vuoto Values
        print(SerieA_text)
    driver.close()
    
   #### SCRAPING SERIE B ######

# request the site
    driver.get("link")

# wait for everything to load
    driver.implicitly_wait(2)

# once you're sure the window is loading correctly you can move
# this back up to happen before the wait
    driver.minimize_window()

    for SerieB in driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='tableCellParticipant__name']"):
        SerieB_text = SerieA.text
        Values_SerieB.append(tuple([SerieB_text])) #inserisco le squadre nell'elenco vuoto Values
        print(SerieB_text)
    driver.close

抓取的连接问题：只开始 2 次抓取中的 1 次（另一个被忽略，每 5-6 次尝试才开始更正）

Connection problems for scraping: start only 1 out of 2 scraping (the other is ignored and started correcting only every 5-6 attempts)

python

proxy

selenium

python-3.x

selenium-webdriver