将数据从 excel 工作表 (openpyxl) 传输到数据库 table (dbf)

Question

我有一个读取 excel 工作表的简单问题，将包含大约 83 列的每一行视为唯一的数据库记录，将其添加到本地数据记录并最终追加并写入 DBF 文件。

我可以从 excel 中提取所有值并将它们添加到列表中。但是列表语法不正确，我不知道如何 prepare/convert 列表到数据库记录。我正在使用 Openpyxl、dbf 和 python 3.7.

目前我只是测试并尝试为第 3 行准备数据（因此 min_max 行 = 3）

我理解数据应该是这样的格式 (('','','', ... 83 个条目), \ ('','','', ... 83 个条目) \ )

但是我不知道怎么把list数据转成record 或者，如何将 excel 数据直接读入 DF 可追加格式

tbl_tst.open(mode=dbf.READ_WRITE) # all fields character string

for everyrow in ws_IntMstDBF.iter_rows(min_row = 3, max_row = 3, max_col = ws_IntMstDBF.max_column-1):
    datum = [] #set([83]), will defining datum as () help solve the problem?
    for idx, cells in enumerate(everyrow):
        if cells.value is None: # for None entries, enter empty string
            datum.append("")
            continue
        datum.append(cells.value) # else enter cell values 

     tbl_tst.append(datum) # append that record to table !!! list is not record error here

tbl_tst.close()

错误是关于使用列表追加到 table，这应该是一条记录等。请指导我如何将 excel 行转换为可追加的 DBF table 数据.

raise TypeError("data to append must be a tuple, dict, record, or template; not a %r" % type(data))
TypeError: data to append must be a tuple, dict, record, or template; not a <class 'list'>

Answer 1

查看 Python Pandas 图书馆...

要从 excel 读取数据到 Pandas 数据帧，您可以使用 pandas.read_excel

将日期读入 Pandas 数据帧后，您可以对其进行操作，然后使用 pandas.DataFrame.to_sql

将其写入数据库

See also this explanation for dealing with database io

Answer 2

改变

tbl_tst.append(datum)

至

tbl_tst.append(tuple(datum))

这将消除该错误。只要您所有的单元格数据都具有适当的类型，那么追加应该可以工作。

Answer 3

感谢您的回复，自昨晚以来，我在尝试不同的解决方案时有点偏离主题。

一个对我有用的解决方案如下：我确保我正在使用的工作表数据都是 strings/Text 并将任何空条目转换为 String 类型并输入空字符串。所以下面的代码完成了这个任务：

#house keeping
for eachrow in ws_IntMstDBF.iter_rows(min_row=2, max_row=ws_IntMstDBF.max_row, max_col=ws_IntMstDBF.max_column):
    for idx, cells in enumerate(eachrow):
        if cells.value is None: # change every Null cell type to String and put 0x20 (space)
            cells.data_type = 's'
            cells.value = " "

写完工作表后，我用熊猫数据框重新打开它并验证内容是否都是字符串类型并且数据框中没有剩余 "nan" 值。然后我使用了 "Dani Arribas-Bel" 中的 df2dbf 函数，修改它以适应我正在使用的数据并转换为 dbf。

导入dataframe并转为dbf格式的代码如下：

abspath = Path(__file__).resolve() # resolve to relative path to absolute
rootpath = abspath.parents[3] # root (my source file is3 sub directories deep
xlspath = rootpath / 'sub-dir1' / 'sub-dir2' / 'sub-dir3' / 'test.xlsx'
# above code is only resolving file location, ignore 
pd_Mst_df = pd.read_excel(xlspath)
#print(pd_Mst_df) # for debug 
print("... Writing Master DBF file ")
df2dbf(pd_Mst_df, dbfpath) # dbf path is defined similar to pd_Mst path

函数df2dbg使用pysal写入dbf格式的dataframe：我对代码进行了一些修改，检测长度行长度和字符类型如下：

import pandas as pd
import pysal as ps
import numpy as np

# code from function df2dbf
else:
    type2spec = {int: ('N', 20, 0),
                 np.int64: ('N', 20, 0),
                 float: ('N', 36, 15),
                 np.float64: ('N', 36, 15),
                 str: ('C', 200, 0)
                 }
    #types = [type(df[i].iloc[0]) for i in df.columns]
    types = [type('C') for i in range(0, len(df.columns))] #84)] #df.columns)] #range(0,84)] # i not required, to be removed
    specs = [type2spec[t] for t in types]
db = ps.open(dbf_path, 'w')
# code continues from function df2dbf

Pandas 数据框不需要进一步修改，因为所有源数据在提交到 excel 文件之前都已正确格式化。

我会在 Whosebug 上找到它后立即将 link 提供给 pysal 和 df2dbf。

将数据从 excel 工作表 (openpyxl) 传输到数据库 table (dbf)

Transfer data from excel worksheet (openpyxl) to database table (dbf)

python

excel

dbf

openpyxl