从列表中在同一页面上执行多个网络抓取? python
performing multiple web scrapes on the same page from a list? python
大家好,我的网络抓取几乎完成了,我正在尝试找出最后一步。这是执行的步骤;网页抓取,保存到数据框,最后保存到 excel。在歌单上。
例如,代码如下:
driver.get("website")
wait = WebDriverWait(driver, 20)
search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys("BC-9700021-1")
driver.switch_to.default_content()
submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()
order_list = []
order_info = {}
soup = BeautifulSoup(driver.page_source,'html.parser')
def correct_tag(tag):
return tag.name == "span" and tag.get_text(strip=True) in {
"Order Amount",
"Item Name",
"Date",
"Warehouse Number",
}
for t in soup1.find_all(correct_tag):
order_info[t.text] = t.find_next_sibling(text=True).strip()
order_list.append(order_info)
order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter('Order_sheet.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()
输出:
Order Amount: 7000
Item Name: Plastic Cup
Date: 7/1/2022
Warehouse Number: 000718
但在最顶部,我在搜索框中输入“BC-9700021-1”
我希望能够从保存在 excel 中的列表中提取特定搜索。
所以 excel sheet 会有这样一个列表:
BC-9700021-1
BC-9700024-1
BC-9700121-2
ETC.
ETC.
我怎样才能让我的程序执行与第一次搜索相同的步骤,但对其余值执行相同的步骤,而不必每次都手动更改发送键?
如有任何帮助,我们将不胜感激。
你不熟悉 for
循环吗?只需遍历每个搜索项。
此外,您可以在此处使用 Selenium,但很有可能您可以通过 api 获取数据。但除非您分享 url/site.
,否则不会知道
a_list = ['BC-9700021-1', 'BC-9700024-1', 'BC-9700121-2']
order_list = []
order_info = {}
for eachId in a_list:
driver.get("website")
wait = WebDriverWait(driver, 20)
search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys(eachId)
driver.switch_to.default_content()
submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()
soup = BeautifulSoup(driver.page_source,'html.parser')
def correct_tag(tag):
return tag.name == "span" and tag.get_text(strip=True) in {
"Order Amount",
"Item Name",
"Date",
"Warehouse Number",
}
for t in soup1.find_all(correct_tag):
order_info[t.text] = t.find_next_sibling(text=True).strip()
order_list.append(order_info)
order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter(f'Order_sheet_{eachId}.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()
大家好,我的网络抓取几乎完成了,我正在尝试找出最后一步。这是执行的步骤;网页抓取,保存到数据框,最后保存到 excel。在歌单上。
例如,代码如下:
driver.get("website")
wait = WebDriverWait(driver, 20)
search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys("BC-9700021-1")
driver.switch_to.default_content()
submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()
order_list = []
order_info = {}
soup = BeautifulSoup(driver.page_source,'html.parser')
def correct_tag(tag):
return tag.name == "span" and tag.get_text(strip=True) in {
"Order Amount",
"Item Name",
"Date",
"Warehouse Number",
}
for t in soup1.find_all(correct_tag):
order_info[t.text] = t.find_next_sibling(text=True).strip()
order_list.append(order_info)
order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter('Order_sheet.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()
输出:
Order Amount: 7000
Item Name: Plastic Cup
Date: 7/1/2022
Warehouse Number: 000718
但在最顶部,我在搜索框中输入“BC-9700021-1” 我希望能够从保存在 excel 中的列表中提取特定搜索。 所以 excel sheet 会有这样一个列表:
BC-9700021-1
BC-9700024-1
BC-9700121-2
ETC.
ETC.
我怎样才能让我的程序执行与第一次搜索相同的步骤,但对其余值执行相同的步骤,而不必每次都手动更改发送键?
如有任何帮助,我们将不胜感激。
你不熟悉 for
循环吗?只需遍历每个搜索项。
此外,您可以在此处使用 Selenium,但很有可能您可以通过 api 获取数据。但除非您分享 url/site.
,否则不会知道a_list = ['BC-9700021-1', 'BC-9700024-1', 'BC-9700121-2']
order_list = []
order_info = {}
for eachId in a_list:
driver.get("website")
wait = WebDriverWait(driver, 20)
search_box = wait.until(EC.visibility_of_element_located((By.ID,"Search"))).send_keys(eachId)
driver.switch_to.default_content()
submit_box = wait.until(EC.visibility_of_element_located((By.ID,"Submit"))).click()
soup = BeautifulSoup(driver.page_source,'html.parser')
def correct_tag(tag):
return tag.name == "span" and tag.get_text(strip=True) in {
"Order Amount",
"Item Name",
"Date",
"Warehouse Number",
}
for t in soup1.find_all(correct_tag):
order_info[t.text] = t.find_next_sibling(text=True).strip()
order_list.append(order_info)
order_df1 = pd.DataFrame(order_list)
datatoexcel = pd.ExcelWriter(f'Order_sheet_{eachId}.xlsx')
order_df1.to_excel(datatoexcel)
datatoexcel.save()