需要在列表追加中包含缺失值

Needing to Include Missing Values in list append

我正在寻找创建 URL 和 Marketo 表单 ID 的数据框(如果它们在页面上的话)。我遇到的问题是,当我遇到没有表单的页面时,缺失值不会附加到空白列表中。这搞乱了我识别哪些 URL 上有哪些表单 ID 的最终结果。

from bs4 import BeautifulSoup
import requests
import pandas as pd

# pages to scrape for code
url_list = pd.read_csv("/Users/derekgunn/Documents/Clients/Achievers Site Map 2-23-22/test.csv")

# turn urls into list
urls = list(url_list['URLs'])

# empty lists for dataframe
id_list = []

# loop to scrape URLs
for loop in urls:
    # get list of URLs
    get = requests.get(loop)
    # turn get variable into html format
    response = get.text
    # parses response
    soup = BeautifulSoup(response, 'html.parser')

    for id in soup.find_all("form"):
        if id is None:
            id_list.append("No Form Found")
        else:
            id_list.append(id.get("id"))
print(id_list)

我一直用来测试的 URL(我有 35 个,但显然我只能 post 8 个??):

https://www.achievers.com/privacy-policy-archived/
https://www.achievers.com/news/
https://www.achievers.com/awn-book-club/
https://www.achievers.com/tofv2/
https://www.achievers.com/the-future-of-employee-experience/
https://www.achievers.com/referral/demo-18-06-2021/
https://www.achievers.com/request-a-demo/
https://www.achievers.com/demo-2021-06-18/

您需要先测试是否存在任何表单,然后再遍历它们。例如:

from bs4 import BeautifulSoup
import requests
import pandas as pd

# pages to scrape for code
url_list = pd.read_csv("test.csv")

# turn urls into list
urls = list(url_list['URLs'])

# empty lists for dataframe
id_list = []

# loop to scrape URLs
for url in urls:
    get = requests.get(url)
    soup = BeautifulSoup(get.content, 'html.parser')

    forms = soup.find_all("form")

    if forms:
        for form in forms:
            id_list.append([url, form.get("id")])
    else:
        id_list.append([url, "No Form Found"])
        
for url, form_id in id_list:
    print(f"{form_id:15} {url}")

此方法显示哪些 URL 找到了合适的形式:

No Form Found   https://www.achievers.com/privacy-policy-archived/
No Form Found   https://www.achievers.com/news/
No Form Found   https://www.achievers.com/awn-book-club/
No Form Found   https://www.achievers.com/tofv2/
mktoForm_2460   https://www.achievers.com/the-future-of-employee-experience/
mktoForm_738    https://www.achievers.com/referral/demo-18-06-2021/
No Form Found   https://www.achievers.com/request-a-demo/
mktoForm_738    https://www.achievers.com/demo-2021-06-18/