如何在 python 中的特定场景中忽略 csv 分隔符?
How to ignore a csv delimiter on specific scenarios in python?
我正在尝试使用 CSV 文件在数据库中插入数据。
import psycopg2 #import the postgres library
#connect to the database
conn = psycopg2.connect(host='1.11.11.111',
dbname='postgres',
user='postgres',
password='myPassword',
port='1234')
#create a cursor object
#cursor object is used to interact with the database
cur = conn.cursor()
#open the csv file using python standard file I/O
#copy file into the table just created
with open("C:/Users/Harshal/Desktop/tar.csv", 'r') as f:
next(f)
cur.copy_from(f, 'geotargets_india',sep=',')
conn.commit()
conn.close()
f.close()
我的table如下:
create table public.geotargets_india(
Criteria_ID integer not null,
Name character varying(50) COLLATE pg_catalog."default" NOT NULL,
Canonical_Name character varying(100) COLLATE pg_catalog."default" NOT NULL,
Parent_ID NUMERIC(10,2),
Country_Code character varying(10) COLLATE pg_catalog."default" NOT NULL,
Target_Type character varying(50) COLLATE pg_catalog."default" NOT NULL,
Status character varying(50) COLLATE pg_catalog."default" NOT NULL
)
我的 CSV 看起来像:
我得到的错误是:
例如,如果仔细查看我的 csv 行:1007740,Hyderabad,"Hyderabad,Telangana,India",9061642.0,IN,City,Active
。此处,Canonical_Name
具有导致错误的“,”分隔字符串,并假设 CSV 中的列数多于 table。如何解决这个问题?
注意:我假设错误只是因为这个。
CSV Link
您可能应该在 Python 中自己读取和解析 CSV 文件,然后使用 INSERT
语句将数据加载到数据库中。
import csv
import psycopg2
conn = psycopg2.connect(
host='1.11.11.111',
dbname='postgres',
user='postgres',
password='myPassword',
port='1234'
)
cur = conn.cursor()
with open("tar.csv") as fd:
rdr = csv.DictReader(fd)
cur.executemany("""
INSERT INTO geotargets_india
VALUES (%(Criteria_ID)s, %(Name)s, %(Canonical_Name)s, %(Parent_ID)s, %(Country_Code)s, %(Target_Type)s, %(Status)s);
""",
rdr
)
cur.close()
conn.close()
对上面的几点评论。 csv.DictReader class will return dictionaries of your CSV. The returned DictReader object, rdr
, is iterable, so it can be used directly in psycopg2's cursor.executemany 函数,这可能比自己遍历 csv DictReader 对象更有效。
您对 Canonical_Name 中的问题是正确的。我用你的结构在 table 中成功导入了行 1007740,Hyderabad,"Hyderabad",9061642.0,IN,City,Active
。
很遗憾,copy_from 方法不支持 csv 分隔符参数。这是文档 https://www.psycopg.org/docs/cursor.html#cursor.copy_from
因此您可以使用制表符分隔符重新格式化 csv 文件,然后使用 copy_from
import csv
import psycopg2 #import the postgres library
#connect to the database
conn = psycopg2.connect(host='1.11.11.111',
dbname='postgres',
user='postgres',
password='myPassword',
port='1234')
#create a cursor object
#cursor object is used to interact with the database
cur = conn.cursor()
#open the csv file using python standard file I/O
#copy file into the table just created
with open("C:/Users/Harshal/Desktop/tar.csv", 'r') as f:
reader = csv.reader(f, delimiter=",")
with open("C:/Users/Harshal/Desktop/tar.tsv", 'w') as tsv:
writer = csv.writer(tsv, delimiter='\t')
writer.writerows(reader)
with open("C:/Users/Harshal/Desktop/tar.tsv", 'r') as f:
next(f)
cur.copy_from(f, 'geotargets_india',sep='\t')
conn.commit()
conn.close()
f.close()
foo.csv:
It is header which will be ignored------------------------------------
1007740,Hyderabad,"Hyderabad,Telangana,India",9061642.0,IN,City,Active
Python:
import psycopg2
conn = psycopg2.connect('')
cur = conn.cursor()
f = open('foo.csv', 'r')
cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f)
conn.commit()
psql:
table geotargets_india;
┌─────────────┬───────────┬───────────────────────────┬────────────┬──────────────┬─────────────┬────────┐
│ criteria_id │ name │ canonical_name │ parent_id │ country_code │ target_type │ status │
├─────────────┼───────────┼───────────────────────────┼────────────┼──────────────┼─────────────┼────────┤
│ 1007740 │ Hyderabad │ Hyderabad,Telangana,India │ 9061642.00 │ IN │ City │ Active │
└─────────────┴───────────┴───────────────────────────┴────────────┴──────────────┴─────────────┴────────┘
我正在尝试使用 CSV 文件在数据库中插入数据。
import psycopg2 #import the postgres library
#connect to the database
conn = psycopg2.connect(host='1.11.11.111',
dbname='postgres',
user='postgres',
password='myPassword',
port='1234')
#create a cursor object
#cursor object is used to interact with the database
cur = conn.cursor()
#open the csv file using python standard file I/O
#copy file into the table just created
with open("C:/Users/Harshal/Desktop/tar.csv", 'r') as f:
next(f)
cur.copy_from(f, 'geotargets_india',sep=',')
conn.commit()
conn.close()
f.close()
我的table如下:
create table public.geotargets_india(
Criteria_ID integer not null,
Name character varying(50) COLLATE pg_catalog."default" NOT NULL,
Canonical_Name character varying(100) COLLATE pg_catalog."default" NOT NULL,
Parent_ID NUMERIC(10,2),
Country_Code character varying(10) COLLATE pg_catalog."default" NOT NULL,
Target_Type character varying(50) COLLATE pg_catalog."default" NOT NULL,
Status character varying(50) COLLATE pg_catalog."default" NOT NULL
)
我的 CSV 看起来像:
我得到的错误是:
1007740,Hyderabad,"Hyderabad,Telangana,India",9061642.0,IN,City,Active
。此处,Canonical_Name
具有导致错误的“,”分隔字符串,并假设 CSV 中的列数多于 table。如何解决这个问题?
注意:我假设错误只是因为这个。
CSV Link
您可能应该在 Python 中自己读取和解析 CSV 文件,然后使用 INSERT
语句将数据加载到数据库中。
import csv
import psycopg2
conn = psycopg2.connect(
host='1.11.11.111',
dbname='postgres',
user='postgres',
password='myPassword',
port='1234'
)
cur = conn.cursor()
with open("tar.csv") as fd:
rdr = csv.DictReader(fd)
cur.executemany("""
INSERT INTO geotargets_india
VALUES (%(Criteria_ID)s, %(Name)s, %(Canonical_Name)s, %(Parent_ID)s, %(Country_Code)s, %(Target_Type)s, %(Status)s);
""",
rdr
)
cur.close()
conn.close()
对上面的几点评论。 csv.DictReader class will return dictionaries of your CSV. The returned DictReader object, rdr
, is iterable, so it can be used directly in psycopg2's cursor.executemany 函数,这可能比自己遍历 csv DictReader 对象更有效。
您对 Canonical_Name 中的问题是正确的。我用你的结构在 table 中成功导入了行 1007740,Hyderabad,"Hyderabad",9061642.0,IN,City,Active
。
很遗憾,copy_from 方法不支持 csv 分隔符参数。这是文档 https://www.psycopg.org/docs/cursor.html#cursor.copy_from
因此您可以使用制表符分隔符重新格式化 csv 文件,然后使用 copy_from
import csv
import psycopg2 #import the postgres library
#connect to the database
conn = psycopg2.connect(host='1.11.11.111',
dbname='postgres',
user='postgres',
password='myPassword',
port='1234')
#create a cursor object
#cursor object is used to interact with the database
cur = conn.cursor()
#open the csv file using python standard file I/O
#copy file into the table just created
with open("C:/Users/Harshal/Desktop/tar.csv", 'r') as f:
reader = csv.reader(f, delimiter=",")
with open("C:/Users/Harshal/Desktop/tar.tsv", 'w') as tsv:
writer = csv.writer(tsv, delimiter='\t')
writer.writerows(reader)
with open("C:/Users/Harshal/Desktop/tar.tsv", 'r') as f:
next(f)
cur.copy_from(f, 'geotargets_india',sep='\t')
conn.commit()
conn.close()
f.close()
foo.csv:
It is header which will be ignored------------------------------------
1007740,Hyderabad,"Hyderabad,Telangana,India",9061642.0,IN,City,Active
Python:
import psycopg2
conn = psycopg2.connect('')
cur = conn.cursor()
f = open('foo.csv', 'r')
cur.copy_expert("""copy geotargets_india from stdin with (format csv, header, delimiter ',', quote '"')""", f)
conn.commit()
psql:
table geotargets_india;
┌─────────────┬───────────┬───────────────────────────┬────────────┬──────────────┬─────────────┬────────┐
│ criteria_id │ name │ canonical_name │ parent_id │ country_code │ target_type │ status │
├─────────────┼───────────┼───────────────────────────┼────────────┼──────────────┼─────────────┼────────┤
│ 1007740 │ Hyderabad │ Hyderabad,Telangana,India │ 9061642.00 │ IN │ City │ Active │
└─────────────┴───────────┴───────────────────────────┴────────────┴──────────────┴─────────────┴────────┘