遍历 Jupyter 目录并将文件名添加到列表中

Question

我设置了一个简单的文件（大约 15 个 .xlsx 文件位于一个名为 FILE 的较大文件中，该文件位于 Jupyter 的主目录中）。我想遍历所有以特定字母组合开头的文件，然后将这些文件名添加到列表中。这是我到目前为止所拥有的。我想知道： 1. 正确的文件路径名是什么？ 2. 如何 return 得到想要的输出？

import os

directory = '???/'   <--- to find this enter pwd into cell

file_name_list = []

for filename in os.listdir(directory):
    if filename.startswith("SOME_LETTERS"):
        file_name_list.append(filename)
    else:
        continue

示例文件设置：

FILE --> 
SOME_LETTERS_1.xlsx 
DIFFERENT_LETTERS_1.xlsx
ONE_NUMBER. xlsx
SOME_LETTERS_2.xlsx
DIFFERENT_LETTERS_2.xlsx
SOME_LETTERS_3.xlsx
SOME_LETTERS_4.xlsx

期望的输出：

[SOME_LETTERS_1, SOME_LETTERS_2, SOME_LETTERS_3, SOME_LETTERS_4]

Answer 1

使用 glob 模块 https://docs.python.org/3/library/glob.html

来自文档：

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.

这是一个例子：

from glob import glob

for file in glob("path/to/some/folder/*.txt"):
    print(file)

上面的代码将打印给定文件夹中所有 .txt 文件的名称。

因此在您的情况下，代码将类似于：

"""
Folder structure:
├── samples
│   ├── Asample2.txt
│   ├── Bsample4.txt
│   ├── sample3.txt
│   ├── sample5.txt
│   └── sample.txt
└── stack.py
"""

from glob import glob
import os

# Using os.path.join so it works on multiple platforms
dir = os.path.join("samples", "*.txt")


 # os.path.basename extracts the file name from the fullpath
file_name_list = [file for file in glob(dir) if os.path.basename(file).startswith("s")]
print(file_name_list)
>>>['samples/sample5.txt', 'samples/sample.txt', 'samples/sample3.txt']

通过使用 Unix 扩展可以实现的另一种方法：

from glob import glob
import os

# Some letter here: ---------
#                             \
#                              v
dir = os.path.join("samples", "s*.txt")

file_name_list = [file for file in glob(dir)]
print(file_name_list)

遍历 Jupyter 目录并将文件名添加到列表中

Looping through Jupyter directory and adding file names to a list

python

directory

for-loop

pandas

jupyter