如何组合大量数据帧？

Question

我的文件夹中有很多 .txt 文件。例如，每个 .txt 文件如下所示。

FileA = pd.DataFrame({'Id':["a","b","c"],'Id2':["a","b","z"],'Amount':[10, 30,50]})
FileB= pd.DataFrame({'Id':["d","e","f","z"],'Id2':["g","h","i","j"],'Amount':[10, 30,50,100]})
FileC= pd.DataFrame({'Id':["r","e"],'Id2':["o","i"],'Amount':[6,33]})
FileD...

我想提取文件夹中每个数据框的第一行，然后将它们全部合并。所以我在下面做了什么。

为了列出 txt 文件，我执行了以下操作。

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

要提取第一行并合并所有行，我在下面做了。

pd.read_table(txtfiles[0])[:1].append([pd.read_table(txtfiles[1])[:1],pd.read_table(txtfiles[2])[:1]],pd.read_table.......)

如果txt的数量。文件很小，我可以这样做，但如果有很多 .txt 文件，我需要一种自动化方法。有谁知道如何自动化这个？感谢您的帮助！

Answer 1

基于Sid's answer to this post：

input_path = r"insert/your/path" # use the patk where you stored the txt files
all_files = glob.glob(os.path.join(input_path, "*.txt"))     
df_from_each_file = (pd.read_csv(f, nrows=1) for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)

Update 使用 pd.read_csv 未正确摄取文件。将 read_csv 替换为 read_table 应该会得到预期的结果

input_path = r"insert/your/path" # use the patk where you stored the txt files
all_files = glob.glob(os.path.join(input_path, "*.txt"))     
df_from_each_file = (pd.read_table(f, nrows=1) for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)

如何组合大量数据帧？

How to combine a large number of dataframes?

python

merge

concat

append

pandas