如何使用 Python 自动解析跨越多个页面的表格

How can I automatically parse tables spanning over multiple pages with Python

我想解析跨越多个页面的 table(或多个 table)。 我在下面执行此操作的方法可行,但过于手动,我希望它能自动解析来自不同页面的 table 并将它们合并为一个。页数可能并不总是相同。

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

one = "https://rittresultater.no/nb/sb_tid/923?page=0&pv2=11027&pv1=U"
two = "https://rittresultater.no/nb/sb_tid/923?page=1&pv2=11027&pv1=U"
three = "https://rittresultater.no/nb/sb_tid/923?page=2&pv2=11027&pv1=U"

#parse the first page
html = urlopen(one)
soup = BeautifulSoup(html, "lxml")
table = soup.find_all(class_="table-condensed")
one = pd.read_html(str(table))[0]

#parse the second page
html = urlopen(two)
soup = BeautifulSoup(html, "lxml")
table = soup.find_all(class_="table-condensed")
two = pd.read_html(str(table))[0]

#parse thr third page
html = urlopen(three)
soup = BeautifulSoup(html, "lxml")
table = soup.find_all(class_="table-condensed")
three = pd.read_html(str(table))[0]

df = pd.concat([one,two,three], axis = 0)
df

请注意,网址仅在 "page=X" 处有所不同。网页本身也包含指向例如的链接。下一页。

results = {}
for page_num in range(1, 10): #change depending on max page
    address = 'https://rittresultater.no/nb/sb_tid/923?page=' + \
               str(page_num) + '&pv2=11027&pv1=U' 

    html = urlopen(address)
    soup = BeautifulSoup(html, 'lxml')
    table = soup.find_all(class='table-condensed')
    output = pd.read_html(str(table))[0]
    results[page_num] = output

当它完成后使用列表理解来做相关的输出,如果它是你代码中的最后一行但是放大了这样做:

df = pd.concat([v for v in results.values()], axis = 0)