OrderedDict 包含 DataFrame 列表

Question

我不明白为什么方法 1 可以，但方法 2 不行...

方法一

import pandas as pd
import collections
d = collections.OrderedDict([('key', []), ('key2', [])])
df = pd.DataFrame({'id': [1], 'test': ['ok']})
d['key'].append(df)
d
OrderedDict([('key', [   id test
0   1   ok]), ('key2', [])])

方法二

l = ['key', 'key2']
dl = collections.OrderedDict(zip(l, [[]]*len(l)))
dl
OrderedDict([('key', []), ('key2', [])])
 dl['key'].append(df)
 dl
OrderedDict([('key', [   id test
0   1   ok]), ('key2', [   id test
0   1   ok])])

dl == d 真

Answer 1

问题源于像这样创建空列表：[[]] * len(l)这实际上是在多次复制对空列表的引用。所以你最终得到的是一个空列表的列表，它们都指向同一个底层对象。发生这种情况时，您通过就地操作（例如 append）对基础列表所做的任何更改都将更改对该列表的所有引用中的值。

将变量分配给彼此时会出现相同类型的问题：

a = []
b = a

# `a` and `b` both point to the same underlying object.
b.append(1) # inplace operation changes underlying object

print(a, b)
[1], [1]

要绕过您的问题而不是使用 [[]] * len(l)，您可以使用生成器表达式或列表理解来确保为列表 l:

中的每个元素创建一个新的空列表

collections.OrderedDict(zip(l, ([] for _ in l))

使用生成器表达式 ([] for _ in l) 为 l 中的每个元素创建一个新的空列表，而不是将引用复制到单个空列表。最简单的检查方法是使用 id 函数来检查对象的底层 ID。在这里，我们将您的原始方法与新方法进行比较：

# The ids come out the same, indicating that the objects are reference to the same underlying list
>>> [id(x) for x in [[]] * len(l)]
[2746221080960, 2746221080960] 

# The ids come out different, indicating that they point to different underlying lists
>>> [id(x) for x in ([] for _ in l)]
[2746259049600, 2746259213760]

OrderedDict 包含 DataFrame 列表

OrderedDict contain list of DataFrame

python-2.7

pandas