CSV 到 MSSQL 使用 pymssql

CSV to MSSQL using pymssql

动机是不断在我的 CSV 中寻找新记录,并使用 pymssql 库将记录插入到 mssql 中。 CSV 最初有 244 行,我试图插入 1 个值,并且希望仅当脚本与调度程序 运行 时才动态插入新行。 我有每 15 秒运行一次以插入值的脚本,但是 post 第一次插入值,第二次脚本抛出 'Cannot insert duplicate key in object' 因为我有第一列 DateID设置了 PK 并从第一条记录本身终止语句,因此不插入新行。

我怎么会遇到这个。

代码:

def trial():
    try: 
        for row in df.itertuples():
         
            datevalue = datetime.datetime.strptime(row.OrderDate, format)

            query= "INSERT INTO data (OrderDate, Region, City, Category) VALUES (%s,%s,%s,%s)"
            cursor.execute(query, (datevalue, row.Region,row.City,row.Category))
        print('"Values inserted')
        conn.commit()
        conn.close()
    except Exception as e:
        print("Handle error", e)
        pass
        

schedule.every(15).seconds.do(trial)

使用的库:pymssql SQL: MSSQL 服务器 2019

为避免重复值,请考虑调整查询以针对实际数据使用 EXCEPT 子句(UNIONINTERSECT 集运算符系列的一部分)。此外,考虑使用 executemany 传递所有 row/column 数据的嵌套列表和 DataFrame.to_numpy().tolist().

顺便说一下,如果 OrderDate 列是数据框和数据库 table 中的 datetime 类型,则不需要重新格式化为字符串值。

def trial():
    try: 
        query= (
            "INSERT INTO data (OrderDate, Region, City, Category) "
            "SELECT %s, %s, %s, %s "
            "EXCEPT "
            "SELECT OrderDate, Region, City, Category "
            "FROM data"
        )

        vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
        vals = tuple(map(tuple, vals))
        cur.executemany(query, vals)

        print('Values inserted')
        conn.commit()

    except Exception as e:
        print("Handle error", e)

    finally:
        cur.close()
        conn.close()

要获得更快的批量插入,请考虑使用暂存器,临时 table:

# CREATE EMPTY TEMP TABLE 
query = "SELECT TOP 0 OrderDate, Region, City, Category INTO #pydata FROM data"
cur.execute(query)

# INSERT INTO TEMP TABLE
query= (
    "INSERT INTO #pydata (OrderDate, Region, City, Category) "
    "VALUES (%s, %s, %s, %s) "
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.execute("BEGIN TRAN")
cur.executemany(query, vals)

# MIGRATE TO FINAL TABLE
query= (
    "INSERT INTO data (OrderDate, Region, City, Category) "
    "SELECT OrderDate, Region, City, Category "
    "FROM #pydata "
    "EXCEPT "
    "SELECT OrderDate, Region, City, Category "
    "FROM data"
)
cur.execute(query)
conn.commit()
print("Values inserted")