从列表中在同一页面上执行多个网络抓取? python

performing multiple web scrapes on the same page from a list? python

大家好,我的网络抓取几乎完成了,我正在尝试找出最后一步。这是执行的步骤;网页抓取,保存到数据框,最后保存到 excel。在歌单上。

例如,代码如下:

driver.get("website")
wait = WebDriverWait(driver, 20)
search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys("BC-9700021-1")

driver.switch_to.default_content()

submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()

order_list = []
order_info = {}

soup = BeautifulSoup(driver.page_source,'html.parser') 
def correct_tag(tag):
    return tag.name == "span" and tag.get_text(strip=True) in {
        "Order Amount",
        "Item Name",
        "Date",
        "Warehouse Number",
    }
for t in soup1.find_all(correct_tag):
    order_info[t.text] = t.find_next_sibling(text=True).strip()

order_list.append(order_info)

order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter('Order_sheet.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()

输出:

    Order Amount: 7000
    Item Name: Plastic Cup
    Date: 7/1/2022
    Warehouse Number: 000718

但在最顶部,我在搜索框中输入“BC-9700021-1” 我希望能够从保存在 excel 中的列表中提取特定搜索。 所以 excel sheet 会有这样一个列表:

BC-9700021-1
BC-9700024-1
BC-9700121-2
ETC.
ETC.

我怎样才能让我的程序执行与第一次搜索相同的步骤,但对其余值执行相同的步骤,而不必每次都手动更改发送键?

如有任何帮助,我们将不胜感激。

你不熟悉 for 循环吗?只需遍历每个搜索项。

此外,您可以在此处使用 Selenium,但很有可能您可以通过 api 获取数据。但除非您分享 url/site.

,否则不会知道
a_list = ['BC-9700021-1', 'BC-9700024-1', 'BC-9700121-2']


order_list = []
order_info = {}

for eachId in a_list:
    driver.get("website")
    wait = WebDriverWait(driver, 20)
    search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys(eachId)
    
    driver.switch_to.default_content()
    
    submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()
    
    
    
    soup = BeautifulSoup(driver.page_source,'html.parser') 
    def correct_tag(tag):
        return tag.name == "span" and tag.get_text(strip=True) in {
            "Order Amount",
            "Item Name",
            "Date",
            "Warehouse Number",
        }
    for t in soup1.find_all(correct_tag):
        order_info[t.text] = t.find_next_sibling(text=True).strip()
    
    order_list.append(order_info)

order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter(f'Order_sheet_{eachId}.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()