将抓取的 table 数据直接插入 PostgreSQL 数据库
Insert scraped table data directly into PostgreSQL db
我想将我抓取的数据直接插入到 PostgreSQL 数据库中,我正在为此编写查询而苦苦挣扎,我们将不胜感激。
到目前为止我想出的代码:
import csv
import urllib.request
from bs4 import BeautifulSoup
conn = psycopg2.connect(database='--',user='--', password='--', port=--)
cursor = conn.cursor()
soup = BeautifulSoup(urllib.request.urlopen("http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").read(),'lxml')
tbody = soup('table' ,{"class":"tollinfotbl"})[0].find_all('tr')
for row in tbody:
cols = row.findChildren(recursive=False)
cols = [ele.text.strip() for ele in cols]
writer.writerow(cols)
print(cols)
我的table的资料如下:
Column | Type | Modifiers
---------------+---------+-----------
vehicle_type | text | not null
one_time | integer | not null
return_trip | integer |
monthly_pass | integer | not null
local_vehicle | integer | not null
我假设 cols
包含 5 个元素,按照您在 table 中显示的顺序排列,否则调整索引。
import csv
import urllib.request
from bs4 import BeautifulSoup
conn = psycopg2.connect(database='--', user='--', password='--', port='--')
cursor = conn.cursor()
soup = BeautifulSoup(urllib.request.urlopen(
"http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").read(), 'lxml')
tbody = soup('table', {"class": "tollinfotbl"})[0].find_all('tr')
for row in tbody:
cols = row.findChildren(recursive=False)
cols = [ele.text.strip() for ele in cols]
if cols:
vehicle_type = cols[0]
one_time = int(cols[1])
return_strip = int(cols[2])
monthly_pass = int(cols[3])
local_vehicle = int(cols[4])
query = "INSERT INTO table_name (vehicle_type, return_strip, monthly_pass, local_vehicle) VALUES (%s, %s, %s, %s, %s);"
data = (vehicle_type, one_time, return_strip, monthly_pass, local_vehicle)
cursor.execute(query, data)
# Commit the transaction
conn.commit()
我想将我抓取的数据直接插入到 PostgreSQL 数据库中,我正在为此编写查询而苦苦挣扎,我们将不胜感激。
到目前为止我想出的代码:
import csv
import urllib.request
from bs4 import BeautifulSoup
conn = psycopg2.connect(database='--',user='--', password='--', port=--)
cursor = conn.cursor()
soup = BeautifulSoup(urllib.request.urlopen("http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").read(),'lxml')
tbody = soup('table' ,{"class":"tollinfotbl"})[0].find_all('tr')
for row in tbody:
cols = row.findChildren(recursive=False)
cols = [ele.text.strip() for ele in cols]
writer.writerow(cols)
print(cols)
我的table的资料如下:
Column | Type | Modifiers
---------------+---------+-----------
vehicle_type | text | not null
one_time | integer | not null
return_trip | integer |
monthly_pass | integer | not null
local_vehicle | integer | not null
我假设 cols
包含 5 个元素,按照您在 table 中显示的顺序排列,否则调整索引。
import csv
import urllib.request
from bs4 import BeautifulSoup
conn = psycopg2.connect(database='--', user='--', password='--', port='--')
cursor = conn.cursor()
soup = BeautifulSoup(urllib.request.urlopen(
"http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").read(), 'lxml')
tbody = soup('table', {"class": "tollinfotbl"})[0].find_all('tr')
for row in tbody:
cols = row.findChildren(recursive=False)
cols = [ele.text.strip() for ele in cols]
if cols:
vehicle_type = cols[0]
one_time = int(cols[1])
return_strip = int(cols[2])
monthly_pass = int(cols[3])
local_vehicle = int(cols[4])
query = "INSERT INTO table_name (vehicle_type, return_strip, monthly_pass, local_vehicle) VALUES (%s, %s, %s, %s, %s);"
data = (vehicle_type, one_time, return_strip, monthly_pass, local_vehicle)
cursor.execute(query, data)
# Commit the transaction
conn.commit()