需要在列表追加中包含缺失值
Needing to Include Missing Values in list append
我正在寻找创建 URL 和 Marketo 表单 ID 的数据框(如果它们在页面上的话)。我遇到的问题是,当我遇到没有表单的页面时,缺失值不会附加到空白列表中。这搞乱了我识别哪些 URL 上有哪些表单 ID 的最终结果。
from bs4 import BeautifulSoup
import requests
import pandas as pd
# pages to scrape for code
url_list = pd.read_csv("/Users/derekgunn/Documents/Clients/Achievers Site Map 2-23-22/test.csv")
# turn urls into list
urls = list(url_list['URLs'])
# empty lists for dataframe
id_list = []
# loop to scrape URLs
for loop in urls:
# get list of URLs
get = requests.get(loop)
# turn get variable into html format
response = get.text
# parses response
soup = BeautifulSoup(response, 'html.parser')
for id in soup.find_all("form"):
if id is None:
id_list.append("No Form Found")
else:
id_list.append(id.get("id"))
print(id_list)
我一直用来测试的 URL(我有 35 个,但显然我只能 post 8 个??):
https://www.achievers.com/privacy-policy-archived/
https://www.achievers.com/news/
https://www.achievers.com/awn-book-club/
https://www.achievers.com/tofv2/
https://www.achievers.com/the-future-of-employee-experience/
https://www.achievers.com/referral/demo-18-06-2021/
https://www.achievers.com/request-a-demo/
https://www.achievers.com/demo-2021-06-18/
您需要先测试是否存在任何表单,然后再遍历它们。例如:
from bs4 import BeautifulSoup
import requests
import pandas as pd
# pages to scrape for code
url_list = pd.read_csv("test.csv")
# turn urls into list
urls = list(url_list['URLs'])
# empty lists for dataframe
id_list = []
# loop to scrape URLs
for url in urls:
get = requests.get(url)
soup = BeautifulSoup(get.content, 'html.parser')
forms = soup.find_all("form")
if forms:
for form in forms:
id_list.append([url, form.get("id")])
else:
id_list.append([url, "No Form Found"])
for url, form_id in id_list:
print(f"{form_id:15} {url}")
此方法显示哪些 URL 找到了合适的形式:
No Form Found https://www.achievers.com/privacy-policy-archived/
No Form Found https://www.achievers.com/news/
No Form Found https://www.achievers.com/awn-book-club/
No Form Found https://www.achievers.com/tofv2/
mktoForm_2460 https://www.achievers.com/the-future-of-employee-experience/
mktoForm_738 https://www.achievers.com/referral/demo-18-06-2021/
No Form Found https://www.achievers.com/request-a-demo/
mktoForm_738 https://www.achievers.com/demo-2021-06-18/
我正在寻找创建 URL 和 Marketo 表单 ID 的数据框(如果它们在页面上的话)。我遇到的问题是,当我遇到没有表单的页面时,缺失值不会附加到空白列表中。这搞乱了我识别哪些 URL 上有哪些表单 ID 的最终结果。
from bs4 import BeautifulSoup
import requests
import pandas as pd
# pages to scrape for code
url_list = pd.read_csv("/Users/derekgunn/Documents/Clients/Achievers Site Map 2-23-22/test.csv")
# turn urls into list
urls = list(url_list['URLs'])
# empty lists for dataframe
id_list = []
# loop to scrape URLs
for loop in urls:
# get list of URLs
get = requests.get(loop)
# turn get variable into html format
response = get.text
# parses response
soup = BeautifulSoup(response, 'html.parser')
for id in soup.find_all("form"):
if id is None:
id_list.append("No Form Found")
else:
id_list.append(id.get("id"))
print(id_list)
我一直用来测试的 URL(我有 35 个,但显然我只能 post 8 个??):
https://www.achievers.com/privacy-policy-archived/
https://www.achievers.com/news/
https://www.achievers.com/awn-book-club/
https://www.achievers.com/tofv2/
https://www.achievers.com/the-future-of-employee-experience/
https://www.achievers.com/referral/demo-18-06-2021/
https://www.achievers.com/request-a-demo/
https://www.achievers.com/demo-2021-06-18/
您需要先测试是否存在任何表单,然后再遍历它们。例如:
from bs4 import BeautifulSoup
import requests
import pandas as pd
# pages to scrape for code
url_list = pd.read_csv("test.csv")
# turn urls into list
urls = list(url_list['URLs'])
# empty lists for dataframe
id_list = []
# loop to scrape URLs
for url in urls:
get = requests.get(url)
soup = BeautifulSoup(get.content, 'html.parser')
forms = soup.find_all("form")
if forms:
for form in forms:
id_list.append([url, form.get("id")])
else:
id_list.append([url, "No Form Found"])
for url, form_id in id_list:
print(f"{form_id:15} {url}")
此方法显示哪些 URL 找到了合适的形式:
No Form Found https://www.achievers.com/privacy-policy-archived/
No Form Found https://www.achievers.com/news/
No Form Found https://www.achievers.com/awn-book-club/
No Form Found https://www.achievers.com/tofv2/
mktoForm_2460 https://www.achievers.com/the-future-of-employee-experience/
mktoForm_738 https://www.achievers.com/referral/demo-18-06-2021/
No Form Found https://www.achievers.com/request-a-demo/
mktoForm_738 https://www.achievers.com/demo-2021-06-18/