遍历 Jupyter 目录并将文件名添加到列表中
Looping through Jupyter directory and adding file names to a list
我设置了一个简单的文件(大约 15 个 .xlsx 文件位于一个名为 FILE 的较大文件中,该文件位于 Jupyter 的主目录中)。我想遍历所有以特定字母组合开头的文件,然后将这些文件名添加到列表中。这是我到目前为止所拥有的。我想知道: 1. 正确的文件路径名是什么? 2. 如何 return 得到想要的输出?
import os
directory = '???/' <--- to find this enter pwd into cell
file_name_list = []
for filename in os.listdir(directory):
if filename.startswith("SOME_LETTERS"):
file_name_list.append(filename)
else:
continue
示例文件设置:
FILE -->
SOME_LETTERS_1.xlsx
DIFFERENT_LETTERS_1.xlsx
ONE_NUMBER. xlsx
SOME_LETTERS_2.xlsx
DIFFERENT_LETTERS_2.xlsx
SOME_LETTERS_3.xlsx
SOME_LETTERS_4.xlsx
期望的输出:
[SOME_LETTERS_1, SOME_LETTERS_2, SOME_LETTERS_3, SOME_LETTERS_4]
使用 glob 模块 https://docs.python.org/3/library/glob.html
来自文档:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
这是一个例子:
from glob import glob
for file in glob("path/to/some/folder/*.txt"):
print(file)
上面的代码将打印给定文件夹中所有 .txt 文件的名称。
因此在您的情况下,代码将类似于:
"""
Folder structure:
├── samples
│ ├── Asample2.txt
│ ├── Bsample4.txt
│ ├── sample3.txt
│ ├── sample5.txt
│ └── sample.txt
└── stack.py
"""
from glob import glob
import os
# Using os.path.join so it works on multiple platforms
dir = os.path.join("samples", "*.txt")
# os.path.basename extracts the file name from the fullpath
file_name_list = [file for file in glob(dir) if os.path.basename(file).startswith("s")]
print(file_name_list)
>>>['samples/sample5.txt', 'samples/sample.txt', 'samples/sample3.txt']
通过使用 Unix 扩展可以实现的另一种方法:
from glob import glob
import os
# Some letter here: ---------
# \
# v
dir = os.path.join("samples", "s*.txt")
file_name_list = [file for file in glob(dir)]
print(file_name_list)
我设置了一个简单的文件(大约 15 个 .xlsx 文件位于一个名为 FILE 的较大文件中,该文件位于 Jupyter 的主目录中)。我想遍历所有以特定字母组合开头的文件,然后将这些文件名添加到列表中。这是我到目前为止所拥有的。我想知道: 1. 正确的文件路径名是什么? 2. 如何 return 得到想要的输出?
import os
directory = '???/' <--- to find this enter pwd into cell
file_name_list = []
for filename in os.listdir(directory):
if filename.startswith("SOME_LETTERS"):
file_name_list.append(filename)
else:
continue
示例文件设置:
FILE -->
SOME_LETTERS_1.xlsx
DIFFERENT_LETTERS_1.xlsx
ONE_NUMBER. xlsx
SOME_LETTERS_2.xlsx
DIFFERENT_LETTERS_2.xlsx
SOME_LETTERS_3.xlsx
SOME_LETTERS_4.xlsx
期望的输出:
[SOME_LETTERS_1, SOME_LETTERS_2, SOME_LETTERS_3, SOME_LETTERS_4]
使用 glob 模块 https://docs.python.org/3/library/glob.html
来自文档:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
这是一个例子:
from glob import glob
for file in glob("path/to/some/folder/*.txt"):
print(file)
上面的代码将打印给定文件夹中所有 .txt 文件的名称。
因此在您的情况下,代码将类似于:
"""
Folder structure:
├── samples
│ ├── Asample2.txt
│ ├── Bsample4.txt
│ ├── sample3.txt
│ ├── sample5.txt
│ └── sample.txt
└── stack.py
"""
from glob import glob
import os
# Using os.path.join so it works on multiple platforms
dir = os.path.join("samples", "*.txt")
# os.path.basename extracts the file name from the fullpath
file_name_list = [file for file in glob(dir) if os.path.basename(file).startswith("s")]
print(file_name_list)
>>>['samples/sample5.txt', 'samples/sample.txt', 'samples/sample3.txt']
通过使用 Unix 扩展可以实现的另一种方法:
from glob import glob
import os
# Some letter here: ---------
# \
# v
dir = os.path.join("samples", "s*.txt")
file_name_list = [file for file in glob(dir)]
print(file_name_list)