从网络上抓取数据,然后重组为 Pandas DataFrame
Crawling data from the web then restructure to a Pandas DataFrame
我有这样的代码:
import os
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
from datetime import datetime, timedelta
URL_TEMPLATES ='https://freemeteo.vn/thoi-tiet/ngoc-ha/history/daily-history/?gid=1572463&station=11376&date={}-{:02d}-{:02d}&language=vietnamese&country=vietnam' #%loc
urls = URL_TEMPLATES.format(2015,1,1)
html_docs = requests.get(urls).text
soups = BeautifulSoup(html_doc)
tables = soup.find(class_='table-list')
tables
那么结果是这样的:
<div class="table-list">
<ul><li><a href="/thoi-tiet/yen-phu/current-weather/location/?gid=1560121&language=vietnamese&country=vietnam" title="Yên Phụ Thời tiết">Yên Phụ</a></li>
<li><a href="/thoi-tiet/huu-tiep/current-weather/location/?gid=1580042&language=vietnamese&country=vietnam" title="Hữu Tiệp Thời tiết">Hữu Tiệp</a></li>
任何人都可以帮助我创建 tables
到 pandas DataFrame 以便像我可以 select 'Yên Phụ' 字符串那样易于处理吗?谢谢
您可以直接用 pandas 解析 table。
import pandas as pd
url = 'https://freemeteo.vn/thoi-tiet/ngoc-ha/history/daily-history/?gid=1572463&station=11376&date={}-{:02d}-{:02d}&language=vietnamese&country=vietnam'
tables = pd.read_html(url)
这将为您提供数据帧列表。页面上的每个 table 将是一个数据帧。
然后你可以像这样查询数据框:
tables[0].query("Tên == 'Hà Nội'")
如果您只想要 table-list
div 中的城市:
resp = requests.get(url)
soup = BeautifulSoup(resp.text)
table_list = soup.find('div', {'class': 'table-list'})
names, links = [], []
for city in table_list.find_all('a'):
names.append(city.text)
links.append(city['href'])
将上述 2 个列表转换为数据框:
df = pd.DataFrame(zip(names, links), columns=['City', 'Link'])
我有这样的代码:
import os
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
from datetime import datetime, timedelta
URL_TEMPLATES ='https://freemeteo.vn/thoi-tiet/ngoc-ha/history/daily-history/?gid=1572463&station=11376&date={}-{:02d}-{:02d}&language=vietnamese&country=vietnam' #%loc
urls = URL_TEMPLATES.format(2015,1,1)
html_docs = requests.get(urls).text
soups = BeautifulSoup(html_doc)
tables = soup.find(class_='table-list')
tables
那么结果是这样的:
<div class="table-list">
<ul><li><a href="/thoi-tiet/yen-phu/current-weather/location/?gid=1560121&language=vietnamese&country=vietnam" title="Yên Phụ Thời tiết">Yên Phụ</a></li>
<li><a href="/thoi-tiet/huu-tiep/current-weather/location/?gid=1580042&language=vietnamese&country=vietnam" title="Hữu Tiệp Thời tiết">Hữu Tiệp</a></li>
任何人都可以帮助我创建 tables
到 pandas DataFrame 以便像我可以 select 'Yên Phụ' 字符串那样易于处理吗?谢谢
您可以直接用 pandas 解析 table。
import pandas as pd
url = 'https://freemeteo.vn/thoi-tiet/ngoc-ha/history/daily-history/?gid=1572463&station=11376&date={}-{:02d}-{:02d}&language=vietnamese&country=vietnam'
tables = pd.read_html(url)
这将为您提供数据帧列表。页面上的每个 table 将是一个数据帧。
然后你可以像这样查询数据框:
tables[0].query("Tên == 'Hà Nội'")
如果您只想要 table-list
div 中的城市:
resp = requests.get(url)
soup = BeautifulSoup(resp.text)
table_list = soup.find('div', {'class': 'table-list'})
names, links = [], []
for city in table_list.find_all('a'):
names.append(city.text)
links.append(city['href'])
将上述 2 个列表转换为数据框:
df = pd.DataFrame(zip(names, links), columns=['City', 'Link'])