以我可以用于关系 SQL 数据库的方式访问和组织这个不均匀列表列表的最佳方法是什么？

Question

我有这个啤酒数据集，作为个人项目我已经尝试清理了很长一段时间，但我似乎无法克服一些小问题。

我有这个“不平衡列表列表”，我想你会称之为，我需要整理一下。这是我正在查看的一个简短示例： [已更新以添加数据形式]

data = [[2.3810000000000002, 'American - Pale 2-Row', 37.0, 1.8, 44.7], [0.907, 'American - White Wheat', 40.0, 2.8, 17.0], [0.907, 'American - Pale 6-Row', 35.0, 1.8, 17.0], [0.227, 'Flaked Corn', 40.0, 0.5, 4.3], [0.227, 'American - Caramel / Crystal 20L', 35.0, 20.0, 4.3], [0.227, 'American - Carapils (Dextrine Malt)', 33.0, 1.8, 4.3], [0.113, 'Flaked Barley', 32.0, 2.2, 2.1], [0.34, 'Honey', 42.0, 2.0, 6.4]],[[2.722, 'Dry Malt Extract - Extra Light', 42.0, 2.5, 70.6],[[2.722, 'Liquid Malt Extract - Light', 35.0, 4.0, 59.1], [1.429, 'Liquid Malt Extract - Amber', 35.0, 10.0, 31.0]]]

因此，对于与名为 Vanilla Cream Ale 的啤酒相关的第 0 列，是包含 5 个参数的多个列表的列表。

[[2.3810000000000002, 'American - Pale 2-Row', 37.0, 1.8, 44.7], [0.907, 'American - White Wheat', 40.0, 2.8, 17.0], [0.907, 'American - Pale 6-Row', 35.0, 1.8, 17.0], [0.227, 'Flaked Corn', 40.0, 0.5, 4.3], [0.227, 'American - Caramel / Crystal 20L', 35.0, 20.0, 4.3], [0.227, 'American - Carapils (Dextrine Malt)', 33.0, 1.8, 4.3], [0.113, 'Flaked Barley', 32.0, 2.2, 2.1], [0.34, 'Honey', 42.0, 2.0, 6.4]]

对于第 1 列，我有

[[2.722, 'Dry Malt Extract - Extra Light', 42.0, 2.5, 70.6]]

2 是

[[2.722, 'Liquid Malt Extract - Light', 35.0, 4.0, 59.1], [1.429, 'Liquid Malt Extract - Amber', 35.0, 10.0, 31.0]]

其中每个元素是'weight'、'grain_name'、'ppg'、'deg_litner'、'grain_bill'。到目前为止，我已经能够将权重、谷物名称等从列表中分离出来，放入我可以调用的单独列表中。

我想最终得到一个 ID，其中 0 对应于香草奶油啤酒，其中列出了每个元素。我想象“0”会重复八次，“1”会重复一次，“2”会重复两次，但我不确定这是最好的方法。

我已经成功地制作了一个列表来计算每个元素中列表的数量：

fcount = [8,1,2]

我在想我可以做类似的事情

for i in range(0,3):
    [i] * fcount

以为我能得到

[0,0,0,0,0,0,0,0]
[1]
[2,2]

但你显然不能乘以这样的列表。

我真的很难找到总体上最好的方法，我不确定这是正确的方法 - 但我想这是某种方法。我的总体目标是在 SQL 服务器中完成一个关系数据库——我在那里有所有的啤酒数据（ibu、abv 等），它只是错过了这个烂摊子！感谢您对此的任何帮助！

Answer 1

您所说的“不均匀列表”是“元组”或“行”。一组具有相同形状的tuples/rows称为“关系”或“table”。

Where each element is each element is 'weight', 'grain_name', 'ppg', 'deg_litner', 'grain_bill'

在 SQL 服务器中，您将创建 table 类似

create table Beers
(
  BeerId int identity primary key,
  BeerName varchar(200),
  ABV decimal(2,2)
)
create table BeerIngredients
(
  BeerId int not null referencess Beer(BeerId),
  BeerIngredientId int identity not null, 
  Weight decimal(10,2),
  GrainName varchar(200),
  PPG decimal(10,2),
  DegLitner decimal(10,2),
  GrainBill decimal(10,2),
  constraint pk_BeerIngredients 
    primary key (BeerName,BeerIngredientId)
)

您可以将其加载到 pandas table 中以在 Python 中使用，如下所示：

import pandas
import pyodbc
import sqlalchemy 

data = [[2.3810000000000002, 'American - Pale 2-Row', 37.0, 1.8, 44.7], 
        [0.907, 'American - White Wheat', 40.0, 2.8, 17.0], 
        [0.907, 'American - Pale 6-Row', 35.0, 1.8, 17.0], 
        [0.227, 'Flaked Corn', 40.0, 0.5, 4.3], 
        [0.227, 'American - Caramel / Crystal 20L', 35.0, 20.0, 4.3],
        [0.227, 'American - Carapils (Dextrine Malt)', 33.0, 1.8, 4.3], 
        [0.113, 'Flaked Barley', 32.0, 2.2, 2.1], 
        [0.34, 'Honey', 42.0, 2.0, 6.4],
        [2.722, 'Dry Malt Extract - Extra Light', 42.0, 2.5, 70.6]]


df = pandas.DataFrame(data, columns=['Weight','GrainName','PPG','DegLitner','GrainBill'])
print(df)
    
engine = sqlalchemy.create_engine('mssql+pyodbc://localhost/tempdb?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server')
df.to_sql('BeerIngredients',engine, index=False,if_exists='replace')

Answer 2

如果我理解正确的话，这是一个解决方案。给定以下输入数据

data = [[
        [2.3810000000000002, 'American - Pale 2-Row', 37.0, 1.8, 44.7],
        [0.907, 'American - White Wheat', 40.0, 2.8, 17.0],
        [0.907, 'American - Pale 6-Row', 35.0, 1.8, 17.0],
        [0.227, 'Flaked Corn', 40.0, 0.5, 4.3],
        [0.227, 'American - Caramel / Crystal 20L', 35.0, 20.0, 4.3],
        [0.227, 'American - Carapils (Dextrine Malt)', 33.0, 1.8, 4.3],
        [0.113, 'Flaked Barley', 32.0, 2.2, 2.1],
        [0.34, 'Honey', 42.0, 2.0, 6.4]
    ],
    [
        [2.722, 'Dry Malt Extract - Extra Light', 42.0, 2.5, 70.6]
    ],
    [
        [2.722, 'Liquid Malt Extract - Light', 35.0, 4.0, 59.1],
        [1.429, 'Liquid Malt Extract - Amber', 35.0, 10.0, 31.0]
    ]
]

遍历最外层的列表并使用enumerate获取每个元素的索引。然后，遍历每个内部列表，将索引添加到 5 个元素列表中的每一个，并将它们组合成一个主列表，data_id，最后将主列表转换为可以写入数据库的数据帧，如@David Browne 中描述 - 微软的回答：

import pandas as pd
data_id = []
for ind, d in enumerate(data):
    for a in d:
        a.insert(0, ind)
        data_id.append(a)
df = pd.DataFrame(data_id)
df.columns = ['ID', 'Weight', 'GrainName', 'PPG', 'DegLitner', 'GrainBill']

输出：

以我可以用于关系 SQL 数据库的方式访问和组织这个不均匀列表列表的最佳方法是什么？

What is the best way to access and organize this list of uneven lists in a way that I could use for a relational SQL database?

python

sql

sql-server

relational-database