如何将 for 循环中的 .pkl 文件附加到 for 循环中创建的 pandas 数据帧？

Question

我有一段看似简单的代码，但不知何故它不起作用。代码的目标是找到一个文件夹中的所有 pickle 数据，将第一个作为 pandas 数据帧加载到 for 循环中，该数据帧在之前不存在的变量下命名，如果该变量存在，它应该将剩余的 pickle 文件加载为 pandas 并将它们附加到来自第一个循环的新创建的 pandas 数据帧：

import pandas as pd
import os

# Creating the first Dataframe using dictionary 
df1  = pd.DataFrame({"a":[1, 2, 3, 4], 
                         "b":[5, 6, 7, 8]}) 
  
# Creating the Second Dataframe using dictionary 
df2 = pd.DataFrame({"a":[1, 2, 3], 
                    "b":[5, 6, 7]}) 


df1.append(df2)

印刷精美：

然而，当我尝试在 for 循环中附加我存储的 pickle 文件中的数据帧时，它不会打印错误，但它仅适用于第一个数据帧：

df1.to_pickle("DF1.pkl")
df2.to_pickle("DF2.pkl")

files = [f for f in os.listdir('.') if os.path.isfile(f)]
#The line above should produce the line below
files=["DF1.pkl", "DF2.pkl"]

for i in files:
    if ".pkl" in i:
        if "ALL_DATA" not in globals():
            ALL_DATA=pd.read_pickle(i)
        else:
            ALL_DATA.append(pd.read_pickle(i))

只打印：

谁能帮我解释一下？

Answer 1

DataFrame.append returns 一个新对象，因此尽管您调用 ALL_DATA.append(pd.read_pickle(i)) 因为您永远不会将其写回 ALL_DATA，这些更改将被丢弃。您需要将更改分配回去：

ALL_DATA = ALL_DATA.append(pd.read_pickle(i))

但是，在循环中附加是低效的，因为它会在每次迭代时复制数据，所以你应该避免它。相反，附加到一个列表，这是快速的，然后在循环后 concat 一次。

l = [] # Holds everything you may possibly append
for i in files:
    if ".pkl" in i:
        if "ALL_DATA" not in globals():
            ALL_DATA=pd.read_pickle(i)
        else:
            l.append(pd.read_pickle(i)) # List append which modifies `l`

# Create df from ALL_DATA and everything that you append
ALL_DATA = pd.concat([ALL_DATA, *l])

如何将 for 循环中的 .pkl 文件附加到 for 循环中创建的 pandas 数据帧？

How to append .pkl files in for loop to pandas dataframe created in for loop?

python

for-loop

append

pickle

pandas