从列表中在同一页面上执行多个网络抓取？ python

Question

大家好，我的网络抓取几乎完成了，我正在尝试找出最后一步。这是执行的步骤;网页抓取，保存到数据框，最后保存到 excel。在歌单上。

例如，代码如下：

driver.get("website")
wait = WebDriverWait(driver, 20)
search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys("BC-9700021-1")

driver.switch_to.default_content()

submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()

order_list = []
order_info = {}

soup = BeautifulSoup(driver.page_source,'html.parser') 
def correct_tag(tag):
    return tag.name == "span" and tag.get_text(strip=True) in {
        "Order Amount",
        "Item Name",
        "Date",
        "Warehouse Number",
    }
for t in soup1.find_all(correct_tag):
    order_info[t.text] = t.find_next_sibling(text=True).strip()

order_list.append(order_info)

order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter('Order_sheet.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()

输出：

    Order Amount: 7000
    Item Name: Plastic Cup
    Date: 7/1/2022
    Warehouse Number: 000718

但在最顶部，我在搜索框中输入“BC-9700021-1” 我希望能够从保存在 excel 中的列表中提取特定搜索。所以 excel sheet 会有这样一个列表：

BC-9700021-1
BC-9700024-1
BC-9700121-2
ETC.
ETC.

我怎样才能让我的程序执行与第一次搜索相同的步骤，但对其余值执行相同的步骤，而不必每次都手动更改发送键？

如有任何帮助，我们将不胜感激。

Answer 1

你不熟悉 for 循环吗？只需遍历每个搜索项。

此外，您可以在此处使用 Selenium，但很有可能您可以通过 api 获取数据。但除非您分享 url/site.

，否则不会知道

a_list = ['BC-9700021-1', 'BC-9700024-1', 'BC-9700121-2']


order_list = []
order_info = {}

for eachId in a_list:
    driver.get("website")
    wait = WebDriverWait(driver, 20)
    search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys(eachId)
    
    driver.switch_to.default_content()
    
    submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()
    
    
    
    soup = BeautifulSoup(driver.page_source,'html.parser') 
    def correct_tag(tag):
        return tag.name == "span" and tag.get_text(strip=True) in {
            "Order Amount",
            "Item Name",
            "Date",
            "Warehouse Number",
        }
    for t in soup1.find_all(correct_tag):
        order_info[t.text] = t.find_next_sibling(text=True).strip()
    
    order_list.append(order_info)

order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter(f'Order_sheet_{eachId}.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()

从列表中在同一页面上执行多个网络抓取？ python

performing multiple web scrapes on the same page from a list? python

python

selenium

loops

beautifulsoup