将文件中的前几行保存到位于不同目录中的 cvs

Question

我的问题是我需要从不同文件夹中的大量文本文件中读取特定行，并将它们保存在每个文件作者的一个 CSV 文件中，文件夹结构如下所示：

main---author1--file1
    |         --....
    |         --file1000         
    ---author2--file1
    |         --....
    |         --file1000  
    |--...
    ---author27--file1
               --....
               --file1000

我设法读取了不同目录中的所有文件并从文件夹名称中获取了作者姓名，还从文件中读取了 1-3 行，但我很难找到将这些行保存为 CSV 的方法。

import os
path = '/content/Data2/STN_INV/'
authors = os.listdir('/content/Data2/STN_INV/'); 
for auth in authors:  
    files = os.listdir(path+auth+'/');
    tmpD,tmpA=[],[]
    for file in files:
        f=open(path+auth+'/'+file, 'r')
        data = f.read()[0:3]
        print(path+auth+'/'+file, os.path.exists(path+auth+'/'+file),'size',len(data),auth)
        tmpD.append(data)
        tmpA.append(auth)

在 google colab 中有简单的方法吗？

Answer 1

要遍历文件夹，可以使用：

import glob
glob.glob('main/author*/file*')

出于保存目的（根据您的代码）：

import os
import pandas as pd

path = '/content/Data2/STN_INV/'
authors = os.listdir('/content/Data2/STN_INV/')
for auth in authors:
    files = os.listdir(path+auth+'/')
    tmp = []
    for file in files:
        f = open(path+auth+'/'+file, 'r')
        data = f.readlines()[:3]
        print(path+auth+'/'+file, os.path.exists(path +
              auth+'/'+file), 'size', len(data), auth)
        tmp.append([auth]+data)
    df = pd.DataFrame(tmp, columns=["Author", "Line1", "Line2", "Line3"])
    df.to_csv(f"{auth}.csv", index=False)

将文件中的前几行保存到位于不同目录中的 cvs

Save first few lines from files to cvs that are located in different directories

python

google-colaboratory