将多个文件中的数据插入到多个表中

Question

我将数据存储在多个文件夹中的 CSV 文件中，我想在 Ubuntu 系统上使用 MySQL 将这些数据加载到多个 SQL table 中。每个 table 和文件都遵循此架构（文件没有 id 字段）：

+ ------ + -------- + -------- + --------- + ---------- +
| SPO_Id | SPO_Name | SPO_Date | SPO_Price | SPO_Amount |
+ ------ + -------- + -------- + --------- + ---------- +

每个文件都包含一天的定价和销售数据。不幸的是，这些文件没有以它们的日期命名；它们存储在以日期命名的 文件夹 中。这是目录的示例图

      ------> 20170102 ------> prices.csv
     /
    /
Exmpl ------> 20170213 ------> prices.csv
    \
     \
      ------> 20170308 ------> prices.csv

这是我编写的查询，它从文件中提取数据并将其存储到 table:

use pricing ; # the database I want the tables in
drop table if exists SP_2017_01_02 ;

create table SP_2017_01_02 (
    SPO_Id int not null primary key auto_increment,
    SPO_Name varchar(32),
    SPO_Date date,
    SPO_Price float,
    SPO_Amount int
);

load data local infile '/Exmpl/20170102/prices.csv'
    into table SP_2017_01_02
    fields terminated by ','
    lines terminated by '\n'
    ignore 1 lines # First line contains field name information
    (SPO_Name, SPO_Date, SPO_Price, SPO_Amount) ;

select * from SP_2017_01_02 ;

show tables ;

这个查询可以很好地一次加载一个 table；但是，因为我有数百个 table，所以我需要自动执行此过程。我环顾四周，发现了一些东西：

Here是一个和我类似的问题，只是这个问题引用了SQL服务器。答案给出了没有任何实质内容的建议。

This question也和我的很像，只是这个是专门用SSIS的，我没有访问权限（问题悬而未决）

This post 建议使用控制文件引用，但这是针对 sql-loader 和 oracle。

Using python 可能是要走的路，但我以前从未使用过它，而且我的问题似乎太复杂了，无法开始。

This one and this one 也使用 python，但他们只是用一个文件中的数据更新一个 table。

我在 SQL 服务器上工作了很多，但我对 MySQL 还很陌生。非常感谢任何帮助！

更新

我曾尝试在 MySQL 中使用动态 SQL 来做到这一点。不幸的是，MySQL 需要使用存储过程来执行 Dynamic SQL，但它不允许存储过程中的函数 load data。作为@RandomSeed pointed out, this cannot be done，只有MySQL。我打算听取他的建议并尝试编写一个 shell/python 脚本来处理这个问题。

在我（或其他人）能够得出可靠的答案之前，我会保留这个问题。

Answer 1

所以一旦你有一个 sql query/function/script 读取单个 table，它看起来像你这样做（或者可以在 python 中构建一个等效的简单），使用 python 遍历目录结构并获取文件名非常简单。如果您每次都可以通过某种方式向 infile '/Exmpl/20170102/prices.csv' 传递一个新的 csv 参数，并从 python 中调用您的 sql 脚本，那么您应该很好。

我现在没有太多时间，但我想向您展示如何使用 python 获取这些文件名字符串。

import os

prices_csvs = []
for root, dirs, files in os.walk(os.path.join('insert_path_here', 'Exmpl'):
    for f in files:
        if f == 'prices.csv':
            prices_csvs.append(os.path.join(root, f))
            break # optional, use if there only is one prices.csv in each subfolder

for csv_file in prices_csvs:
    # csv_file is a string of the path for each prices.csv
    # if you can insert it as the `infile` parameter and run the sql, you are done
    # admittedly, i don't know how to do this at the moment

os.walk 遍历每个子目录，将名称 root 指定为该文件夹的路径，将所有目录列为 dirs，将文件列为 files那里。从那里可以简单地检查文件名是否与您要查找的内容匹配，如果匹配，则将其存储在列表中。遍历列表会生成包含 Exmpl 中每个 prices.csv 的路径的字符串。

希望对 python 的帮助有所帮助

Answer 2

我将查理的回答标记为正确答案，因为尽管他没有完全回答问题，但他给了我一个很好的开始。以下代码适用于可能希望了解如何将 csv 文件加载到 MySQL 中的任何人。基本思想是在 Python 中动态构造一个字符串，然后在 MySQL.

中执行该字符串

#!/usr/bin/python
import os
import MySQLdb # Use this module in order to interact with SQL

# Find all the file names located in this directory
prices_csvs = []
for root, dirs, files in os.walk(os.path.join('insert_path_here', 'Exmpl'):
for f in files:
    if f == 'prices.csv':
        prices_csvs.append(os.path.join(root, f))
        break

# Connect to the MySQL database
db = MySQLdb.connect(host ="<Enter Host Here>", user = "<Enter User here>", passwd = "<Enter Password Here>", db = "<Enter Database name here>" )

# must create cursor object
cur = db.cursor()

for csv_file in prices_csvs:

    directory = "'" + csv_file + "'"    

    table = csv_file[56:64] # This extracts the name of the table from the directory

    sql_string1 = "drop table if exists SD" + table + " ;\n"

    sql_string2 = "create table SD" + table + " as \n\
    <Enter your fields here> \n\
    ); \n"

    sql_string3 = "load data local infile " + directory + " \n\
    into table TempPrices \n\
    fields terminated by ',' \n\
    lines terminated by " + repr('\n') + " \n\
    ignore 1 lines ;\n"

    # Print out the strings for debugging
    print sql_string1
    print sql_string2
    print sql_string3
    print sql_string4
    print sql_string5

    # Execute your SQL statements
    cur.execute(sql_string1)
    cur.execute(sql_string2)
    cur.execute(sql_string3)
    cur.execute(sql_string4)
    cur.execute(sql_string5)
    db.commit()

db.close()

调试时，我发现复制打印的 SQL 语句并将其粘贴到 MySQL 中以确认字符串正在成功构建。

将多个文件中的数据插入到多个表中

Insert data from multiple files into multiple tables

python

mysql

sql

shell

dynamic-sql