如何将字典中的列表列表转换为数据框?
How to convert list of lists from a dictionary of dictionaries into a data frame?
我有一个字典的字典,其中包含如下元组列表:
mydict:{'Id1':{'sample1': [[1,1,5],[1,2,6],[1,3,21],[2,1,0],[2,2,0]...(10,3,54)],
'sample2': [[1,1,21],[1,2,1],[1,3,4],[,1,23],[2,2,43]...[10,3,]],
...
'sample199': [[1,1,0],[1,2,13],[1,3,32],[2,1,0],[2,2,15]...[...]],
'sample200': [[1,1,43],[1,2,30],[1,3,6],[2,1,0],[2,2,4]...[10,3,87]]}
'Id2':{'sample1': [[1,1,0],[1,2,0],[1,3,2],[2,1,0],[2,2,32]...[10,3,43]],
'sample2': [[1,1,0],[1,2,15],[1,3,43],[2,1,2],[2,2,12]...[10,3,7]],
...
'sample199': [[1,1,0],[1,2,3],[1,3,16],[2,1,17],[2,2,11]...[]]}
...
}
我想将上面的字典转换成数据框,其中列表的第三项成为数据框中的列(基本上数据框中的特征数应该等于列表数在当前词典中)。我想要的结果应该是这样的:
mydata_frame:
IDs Samples f1 f2 f3 ... fn
Id1 sample1 5 6 21 ... 53
Id1 sample2 21 1 4 ... 21
... ... .. .. .. ... ...
Id1 Sample200 43 30 6 ... 87
Id2 sample1 0 0 2 ... 43
Id2 sample2 0 15 43 ... 7
... ... ...
我试过下面的代码,但没有用:
mydata_frame= [[key] + i for key,value in mydict.items() for i in value]
下面的使用并不直接,熔化数据框是一种将列值表示为行的方法。
合并列,使 id 和 samples 成为行。然后使用 apply
将 values/features 拆分为数据框
mydict = {'Id1':{'sample1': [(1,1,5),(1,2,6),(1,3,21),(2,1,0),(2,2,0)],
'sample2': [(1,1,21),(1,2,1),(1,3,4),(2,1,23),(2,2,43)],
'sample199': [(1,1,0),(1,2,13),(1,3,32),(2,1,0),(2,2,15)],
'sample200': [(1,1,43),(1,2,30),(1,3,6),(2,1,0),(2,2,4)]},
'Id2':{'sample1': [(1,1,0),(1,2,0),(1,3,2),(2,1,0),(2,2,32)],
'sample2': [(1,1,0),(1,2,15),(1,3,43),(2,1,2),(2,2,12)],
'sample199': [(1,1,0),(1,2,3),(1,3,16),(2,1,17),(2,2,11)]}}
# COnvert dict to dataframe
# it will have column IDs, index samples
df = pd.DataFrame(mydict)
# Make index samples as columns
df = df.reset_index()
# Convert the column names as rows
df = df.melt(id_vars=['index'])
df = df.dropna()
# This will move all values and columns as rows
# Take all the value columns, extract last element of tuple
# Generally apply will give series, but if you expand a list, it will give dataframe
df[[f'f{i}'for i in range(5)]] = df.apply(lambda x: [t[-1] for t in x['value']], axis=1, result_type='expand')
df.drop(['value'], axis=1)
我有一个字典的字典,其中包含如下元组列表:
mydict:{'Id1':{'sample1': [[1,1,5],[1,2,6],[1,3,21],[2,1,0],[2,2,0]...(10,3,54)],
'sample2': [[1,1,21],[1,2,1],[1,3,4],[,1,23],[2,2,43]...[10,3,]],
...
'sample199': [[1,1,0],[1,2,13],[1,3,32],[2,1,0],[2,2,15]...[...]],
'sample200': [[1,1,43],[1,2,30],[1,3,6],[2,1,0],[2,2,4]...[10,3,87]]}
'Id2':{'sample1': [[1,1,0],[1,2,0],[1,3,2],[2,1,0],[2,2,32]...[10,3,43]],
'sample2': [[1,1,0],[1,2,15],[1,3,43],[2,1,2],[2,2,12]...[10,3,7]],
...
'sample199': [[1,1,0],[1,2,3],[1,3,16],[2,1,17],[2,2,11]...[]]}
...
}
我想将上面的字典转换成数据框,其中列表的第三项成为数据框中的列(基本上数据框中的特征数应该等于列表数在当前词典中)。我想要的结果应该是这样的:
mydata_frame:
IDs Samples f1 f2 f3 ... fn
Id1 sample1 5 6 21 ... 53
Id1 sample2 21 1 4 ... 21
... ... .. .. .. ... ...
Id1 Sample200 43 30 6 ... 87
Id2 sample1 0 0 2 ... 43
Id2 sample2 0 15 43 ... 7
... ... ...
我试过下面的代码,但没有用:
mydata_frame= [[key] + i for key,value in mydict.items() for i in value]
下面的使用并不直接,熔化数据框是一种将列值表示为行的方法。
合并列,使 id 和 samples 成为行。然后使用 apply
将 values/features 拆分为数据框mydict = {'Id1':{'sample1': [(1,1,5),(1,2,6),(1,3,21),(2,1,0),(2,2,0)],
'sample2': [(1,1,21),(1,2,1),(1,3,4),(2,1,23),(2,2,43)],
'sample199': [(1,1,0),(1,2,13),(1,3,32),(2,1,0),(2,2,15)],
'sample200': [(1,1,43),(1,2,30),(1,3,6),(2,1,0),(2,2,4)]},
'Id2':{'sample1': [(1,1,0),(1,2,0),(1,3,2),(2,1,0),(2,2,32)],
'sample2': [(1,1,0),(1,2,15),(1,3,43),(2,1,2),(2,2,12)],
'sample199': [(1,1,0),(1,2,3),(1,3,16),(2,1,17),(2,2,11)]}}
# COnvert dict to dataframe
# it will have column IDs, index samples
df = pd.DataFrame(mydict)
# Make index samples as columns
df = df.reset_index()
# Convert the column names as rows
df = df.melt(id_vars=['index'])
df = df.dropna()
# This will move all values and columns as rows
# Take all the value columns, extract last element of tuple
# Generally apply will give series, but if you expand a list, it will give dataframe
df[[f'f{i}'for i in range(5)]] = df.apply(lambda x: [t[-1] for t in x['value']], axis=1, result_type='expand')
df.drop(['value'], axis=1)