CSV 到 MSSQL 使用 pymssql
CSV to MSSQL using pymssql
动机是不断在我的 CSV 中寻找新记录,并使用 pymssql 库将记录插入到 mssql 中。
CSV 最初有 244 行,我试图插入 1 个值,并且希望仅当脚本与调度程序 运行 时才动态插入新行。
我有每 15 秒运行一次以插入值的脚本,但是 post 第一次插入值,第二次脚本抛出 'Cannot insert duplicate key in object' 因为我有第一列 DateID
设置了 PK 并从第一条记录本身终止语句,因此不插入新行。
我怎么会遇到这个。
代码:
def trial():
try:
for row in df.itertuples():
datevalue = datetime.datetime.strptime(row.OrderDate, format)
query= "INSERT INTO data (OrderDate, Region, City, Category) VALUES (%s,%s,%s,%s)"
cursor.execute(query, (datevalue, row.Region,row.City,row.Category))
print('"Values inserted')
conn.commit()
conn.close()
except Exception as e:
print("Handle error", e)
pass
schedule.every(15).seconds.do(trial)
使用的库:pymssql
SQL: MSSQL 服务器 2019
为避免重复值,请考虑调整查询以针对实际数据使用 EXCEPT
子句(UNION
和 INTERSECT
集运算符系列的一部分)。此外,考虑使用 executemany
传递所有 row/column 数据的嵌套列表和 DataFrame.to_numpy().tolist()
.
顺便说一下,如果 OrderDate
列是数据框和数据库 table 中的 datetime
类型,则不需要重新格式化为字符串值。
def trial():
try:
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT %s, %s, %s, %s "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.executemany(query, vals)
print('Values inserted')
conn.commit()
except Exception as e:
print("Handle error", e)
finally:
cur.close()
conn.close()
要获得更快的批量插入,请考虑使用暂存器,临时 table:
# CREATE EMPTY TEMP TABLE
query = "SELECT TOP 0 OrderDate, Region, City, Category INTO #pydata FROM data"
cur.execute(query)
# INSERT INTO TEMP TABLE
query= (
"INSERT INTO #pydata (OrderDate, Region, City, Category) "
"VALUES (%s, %s, %s, %s) "
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.execute("BEGIN TRAN")
cur.executemany(query, vals)
# MIGRATE TO FINAL TABLE
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT OrderDate, Region, City, Category "
"FROM #pydata "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
cur.execute(query)
conn.commit()
print("Values inserted")
动机是不断在我的 CSV 中寻找新记录,并使用 pymssql 库将记录插入到 mssql 中。
CSV 最初有 244 行,我试图插入 1 个值,并且希望仅当脚本与调度程序 运行 时才动态插入新行。
我有每 15 秒运行一次以插入值的脚本,但是 post 第一次插入值,第二次脚本抛出 'Cannot insert duplicate key in object' 因为我有第一列 DateID
设置了 PK 并从第一条记录本身终止语句,因此不插入新行。
我怎么会遇到这个。
代码:
def trial():
try:
for row in df.itertuples():
datevalue = datetime.datetime.strptime(row.OrderDate, format)
query= "INSERT INTO data (OrderDate, Region, City, Category) VALUES (%s,%s,%s,%s)"
cursor.execute(query, (datevalue, row.Region,row.City,row.Category))
print('"Values inserted')
conn.commit()
conn.close()
except Exception as e:
print("Handle error", e)
pass
schedule.every(15).seconds.do(trial)
使用的库:pymssql SQL: MSSQL 服务器 2019
为避免重复值,请考虑调整查询以针对实际数据使用 EXCEPT
子句(UNION
和 INTERSECT
集运算符系列的一部分)。此外,考虑使用 executemany
传递所有 row/column 数据的嵌套列表和 DataFrame.to_numpy().tolist()
.
顺便说一下,如果 OrderDate
列是数据框和数据库 table 中的 datetime
类型,则不需要重新格式化为字符串值。
def trial():
try:
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT %s, %s, %s, %s "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.executemany(query, vals)
print('Values inserted')
conn.commit()
except Exception as e:
print("Handle error", e)
finally:
cur.close()
conn.close()
要获得更快的批量插入,请考虑使用暂存器,临时 table:
# CREATE EMPTY TEMP TABLE
query = "SELECT TOP 0 OrderDate, Region, City, Category INTO #pydata FROM data"
cur.execute(query)
# INSERT INTO TEMP TABLE
query= (
"INSERT INTO #pydata (OrderDate, Region, City, Category) "
"VALUES (%s, %s, %s, %s) "
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.execute("BEGIN TRAN")
cur.executemany(query, vals)
# MIGRATE TO FINAL TABLE
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT OrderDate, Region, City, Category "
"FROM #pydata "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
cur.execute(query)
conn.commit()
print("Values inserted")