Python webscraper 在 SQL 服务器代理中运行不一致

Question

我在 python 中设计了一个网络爬虫。这是代码：


import sys
from selenium import webdriver
import time
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.keys import Keys
import datetime
from datetime import datetime
import pandas as pd
from bs4 import BeautifulSoup


option = webdriver.ChromeOptions()
prefs = {"download.default_directory" : "C:\DownloadFolder\"}
option.add_experimental_option("prefs",prefs)
option.add_argument("--start-maximized");
chromedriver = "C:\Script\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chromedriver, options=option)


BASE_URl = "https://www.mywebsite.com"
driver.get(BASE_URl)
time.sleep(3)

link2 = driver.find_element_by_xpath("mypath").text; 

link = driver.find_element_by_link_text(link2)
link.click()

time.sleep(10)

driver.quit()
sys.exit(0)

我在 SQL 服务器代理中创建了一个作业，需要按特定的时间表运行这个网络爬虫。问题是网络抓取工具有时运行正确，有时会出错。当它产生错误时，它首先运行是无限的。
当它无法完成时，会产生以下错误：

Code: 0xC0029151 Source: Download Execute Process Task
Description: In Executing "C:\Python\Python392\python.exe" "myscript.py" at "C:\script ", The process exit code was "1" while the expected was "0". End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 19:45:21 Finished: 19:55:28 Elapsed: 607.188 seconds. The package execution failed. The step failed.

我不清楚问题出在哪里。如果我手动启动 .py 文件，它总是有效的。另外，我在一个SSIS包中嵌入了.py文件，这个包是SQL代理使用的。此外，当我手动启动程序包时，它也 100% 正常工作。

我认为网络爬虫在第一次完成它的工作后，并没有关闭所有使用过的 processes.I 试图在 driver.quit() 之前添加 driver.close() 但这是也没有工作。

有人可以帮我吗？

Answer 1

我在解决问题的选项中添加了无头模式。

Python webscraper 在 SQL 服务器代理中运行不一致

Python webscraper runs inconsistent in SQL Server Agent

python

sql-server

ssis

selenium-chromedriver