从目录中提取文件名并根据 excel 中的扩展名进行分类 |使用 PYTHON
Extract File names from directory and classify on the basis of its extension in excel | Using PYTHON
我正在尝试根据从本地目录到 excel sheet.
的文件扩展名进行分类
就像我的输入应该是:
目录路径
我的输出应该是:
excel sheet 根据扩展名具有不同的 sheet。
就像输入目录有 5 个 .sh 文件、8 个 .py 文件等等。
在扩展的基础上,sheets 应该用文件名创建。
我能够实现相同的功能,但它有点硬编码。
如果可以使用硬代码实现自动化,我们将不胜感激:
下面是我试过的代码,它工作正常:
import glob
import pandas as pd
path = r'<path_name>' #base path
files = glob.glob(path + '/**/*.*', recursive=True)
hql, hive, ksh, sh, csv, txt, sql,py = ([] for i in range(8))
for fpath in files:
chk_file = fpath.split('\')
for file_name in chk_file:
if '.hql' in file_name:
print("Hql:",file_name)
comb = f'{file_name}'
hql.append(comb)
if '.hive' in file_name:
print(file_name)
comb = f'{file_name}'
hive.append(comb)
if '.ksh' in file_name:
print(file_name)
comb = f'{file_name}'
ksh.append(comb)
if '.sh' in file_name:
print(file_name)
comb = f'{file_name}'
sh.append(comb)
if '.sql' in file_name:
print(file_name)
comb = f'{file_name}'
sql.append(comb)
if '.txt' in file_name:
print(file_name)
comb = f'{file_name}'
txt.append(comb)
if '.csv' in file_name:
print(file_name)
comb = f'{file_name}'
csv.append(comb)
if '.py' in file_name:
print(file_name)
comb = f'{file_name}'
py.append(comb)
writer = pd.ExcelWriter(r'C:\Users\saurabh.arun.kumar\OneDrive - Accenture\Desktop\outfile2.xlsx',
engine='xlsxwriter')
new_hql = pd.DataFrame(hql,columns=['file'])
new_hive = pd.DataFrame(hive,columns=['file'])
new_sql = pd.DataFrame(sql,columns=['file'])
new_ksh = pd.DataFrame(ksh,columns=['file'])
new_txt = pd.DataFrame(txt,columns=['file'])
new_sh = pd.DataFrame(sh,columns=['file'])
new_csv = pd.DataFrame(csv,columns=['file'])
new_py = pd.DataFrame(py,columns=['file'])
new_hql.to_excel(writer, sheet_name='hql', index=False)
new_hive.to_excel(writer, sheet_name='hive', index=False)
new_sql.to_excel(writer, sheet_name='sql', index=False)
new_ksh.to_excel(writer, sheet_name='ksh', index=False)
new_csv.to_excel(writer, sheet_name='csv', index=False)
new_txt.to_excel(writer, sheet_name='txt', index=False)
new_sh.to_excel(writer, sheet_name='sh', index=False)
new_py.to_excel(writer, sheet_name='py', index=False)
writer.save()
writer.close()
print ("Executed")
此代码将与代码中提供的扩展一起使用。我希望它应该通过自己阅读扩展名进行分类,并使用文件名创建新的 sheets。
希望我能解释一下这个场景。
您可以使用
从文件路径中拆分扩展名
fname, fext = os.path.splitext("/what/ever/kind/of/file/this.is.txt")
用它来创建“ext”->“文件列表”的字典。
使用字典创建 n 数据帧。将它们写到 excel.
如果您只需要某些扩展,请将 dict-keys 过滤为您想要的那些:
import glob
import pandas as pd
from os import path
p = r'/redacted/location' # fix this to your path
files = glob.glob(p + '/**/*.*', recursive=True)
d = {}
i = 0 # used to redact my file names - you would simply store fn+fex
for f in files:
fn, fex = path.splitext(f)
# filter for extensions you want
if (fex in (".txt",".xlsx", ".docx") ):
# use d.setdefault(fex,[]).append(f) - I use something
# to blank out my file names here
# use collections.defaultdict to get a speed kick if needed
d.setdefault(fex,[]).append(f"file...{i}{fex}")
i += 1
# create single data frames per file extension from dictionary
dfs = []
for key,value in d.items():
df = pd.DataFrame({key:value})
dfs.append(df)
# do your excel writing here - use column header for sheet name etc.
for df in dfs:
print (df)
输出(files/names 编辑):
.docx
0 file...0.docx
1 file...2.docx
2 file...3.docx
3 file...4.docx
4 file...5.docx
5 file...6.docx
6 file...7.docx
7 file...12.docx
8 file...13.docx
9 file...14.docx
10 file...15.docx
11 file...16.docx
.xlsx
0 file...1.xlsx
1 file...8.xlsx
2 file...9.xlsx
3 file...10.xlsx
4 file...11.xlsx
5 file...17.xlsx
然后您可以使用每个单独 DF 的列 header 来编写您的 excel sheet - 类似于:
with pd.ExcelWriter('C:/temp/outfile2.xlsx') as writer:
for df in dfs:
df.to_excel(writer, sheet_name = df.columns[0])
应该这样做 - 现在无法测试。
我正在尝试根据从本地目录到 excel sheet.
的文件扩展名进行分类就像我的输入应该是: 目录路径
我的输出应该是:
excel sheet 根据扩展名具有不同的 sheet。 就像输入目录有 5 个 .sh 文件、8 个 .py 文件等等。 在扩展的基础上,sheets 应该用文件名创建。
我能够实现相同的功能,但它有点硬编码。
如果可以使用硬代码实现自动化,我们将不胜感激:
下面是我试过的代码,它工作正常:
import glob
import pandas as pd
path = r'<path_name>' #base path
files = glob.glob(path + '/**/*.*', recursive=True)
hql, hive, ksh, sh, csv, txt, sql,py = ([] for i in range(8))
for fpath in files:
chk_file = fpath.split('\')
for file_name in chk_file:
if '.hql' in file_name:
print("Hql:",file_name)
comb = f'{file_name}'
hql.append(comb)
if '.hive' in file_name:
print(file_name)
comb = f'{file_name}'
hive.append(comb)
if '.ksh' in file_name:
print(file_name)
comb = f'{file_name}'
ksh.append(comb)
if '.sh' in file_name:
print(file_name)
comb = f'{file_name}'
sh.append(comb)
if '.sql' in file_name:
print(file_name)
comb = f'{file_name}'
sql.append(comb)
if '.txt' in file_name:
print(file_name)
comb = f'{file_name}'
txt.append(comb)
if '.csv' in file_name:
print(file_name)
comb = f'{file_name}'
csv.append(comb)
if '.py' in file_name:
print(file_name)
comb = f'{file_name}'
py.append(comb)
writer = pd.ExcelWriter(r'C:\Users\saurabh.arun.kumar\OneDrive - Accenture\Desktop\outfile2.xlsx',
engine='xlsxwriter')
new_hql = pd.DataFrame(hql,columns=['file'])
new_hive = pd.DataFrame(hive,columns=['file'])
new_sql = pd.DataFrame(sql,columns=['file'])
new_ksh = pd.DataFrame(ksh,columns=['file'])
new_txt = pd.DataFrame(txt,columns=['file'])
new_sh = pd.DataFrame(sh,columns=['file'])
new_csv = pd.DataFrame(csv,columns=['file'])
new_py = pd.DataFrame(py,columns=['file'])
new_hql.to_excel(writer, sheet_name='hql', index=False)
new_hive.to_excel(writer, sheet_name='hive', index=False)
new_sql.to_excel(writer, sheet_name='sql', index=False)
new_ksh.to_excel(writer, sheet_name='ksh', index=False)
new_csv.to_excel(writer, sheet_name='csv', index=False)
new_txt.to_excel(writer, sheet_name='txt', index=False)
new_sh.to_excel(writer, sheet_name='sh', index=False)
new_py.to_excel(writer, sheet_name='py', index=False)
writer.save()
writer.close()
print ("Executed")
此代码将与代码中提供的扩展一起使用。我希望它应该通过自己阅读扩展名进行分类,并使用文件名创建新的 sheets。
希望我能解释一下这个场景。
您可以使用
从文件路径中拆分扩展名fname, fext = os.path.splitext("/what/ever/kind/of/file/this.is.txt")
用它来创建“ext”->“文件列表”的字典。 使用字典创建 n 数据帧。将它们写到 excel.
如果您只需要某些扩展,请将 dict-keys 过滤为您想要的那些:
import glob
import pandas as pd
from os import path
p = r'/redacted/location' # fix this to your path
files = glob.glob(p + '/**/*.*', recursive=True)
d = {}
i = 0 # used to redact my file names - you would simply store fn+fex
for f in files:
fn, fex = path.splitext(f)
# filter for extensions you want
if (fex in (".txt",".xlsx", ".docx") ):
# use d.setdefault(fex,[]).append(f) - I use something
# to blank out my file names here
# use collections.defaultdict to get a speed kick if needed
d.setdefault(fex,[]).append(f"file...{i}{fex}")
i += 1
# create single data frames per file extension from dictionary
dfs = []
for key,value in d.items():
df = pd.DataFrame({key:value})
dfs.append(df)
# do your excel writing here - use column header for sheet name etc.
for df in dfs:
print (df)
输出(files/names 编辑):
.docx
0 file...0.docx
1 file...2.docx
2 file...3.docx
3 file...4.docx
4 file...5.docx
5 file...6.docx
6 file...7.docx
7 file...12.docx
8 file...13.docx
9 file...14.docx
10 file...15.docx
11 file...16.docx
.xlsx
0 file...1.xlsx
1 file...8.xlsx
2 file...9.xlsx
3 file...10.xlsx
4 file...11.xlsx
5 file...17.xlsx
然后您可以使用每个单独 DF 的列 header 来编写您的 excel sheet - 类似于:
with pd.ExcelWriter('C:/temp/outfile2.xlsx') as writer:
for df in dfs:
df.to_excel(writer, sheet_name = df.columns[0])
应该这样做 - 现在无法测试。