多标签 Excel 工作表,1 列中的唯一条目,使用另一列中的数据作为名称创建新文件,全部带有 headers
Multi tab Excel sheets, unique entries in 1 column, create new file with data from another column as the name, all with headers
像往常一样,我咬得太多了。
我有一个文件,"list.xlsx"。该文件有 3 个 sheet、"current students"、"finished" 和 "cancelled"。
sheet 都包含以下 headers 下的数据
[StudentId, FirstName, Lastname, DoB, Nationality, CourseID, CourseName, Startdate, Finishdate, UnitID, UnitName, UnitCompetency]
我创建了以下可憎的东西,开始了我需要的东西。
我想要它做的是:
1) 在以 sheet
命名的文件夹中,根据 StudentId(唯一)创建一个 FirstName + Lastname.xlsx 的文件
2) 在该文件中,从其余列中获取所有信息并将其附加到它们的文件中
#python 3.8
import pandas as pd
import os
import shutil
file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)
if os.path.exists(CS):
shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'
CancelID = CanXlsx.StudentId.unique()
FinID = FinXlsx.StudentId.unique()
CSID = CSXlsx.StudentId.unique()
我以为我在 for 循环等方面做得越来越好,但似乎无法理解它们。
逻辑可以想,就是代码不行。
https://drive.google.com/file/d/134fqWx6veF7zp_12GqFYlbmPZnK8ihaV/view?usp=sharing
我认为为此所需的方法是创建 3 个数据框(可能可以用一个来完成,但我不记得了)。 1) 然后,在每个数据帧上,您需要提取 "First Name + Last Name" 的列表,然后,2) 您需要在数据帧上创建掩码以提取信息并保存。
import os
import shutil
file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)
## File Creation
if os.path.exists(CS):
shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'
# Create full names
CanXlsx["Fullname"] = CanXlsx["StudentId"] + "_" + CanXlsx["First Name"] + "_" + CanXlsx["Last Name"]
## Same for the other dfs
# Get a list of ids
# canFullNames = list(CanXlsx["Fullname"]) Edit: Preferred approach through student Ids
canIds = list(CanXlsx["StudentId"])
## Same for the other dfs
# Loop over the list of full names to create your df
for id in canIds:
df1 = CanXlsx[CanXlsx["StudenId"] == id] # This will filter the rows by the id you want
# Retrieve the full name
name = df1.iloc[0]["Fullname"]
# Create the filename
filename = os.path.join(CanDir,name + ".xlsx")
df1.drop(columns = ["First Name", "Last Name"] # I understand that these columns are not required on each file
df1.to_excel(filename,header=True,index=False)
## Same for the other dfs
如果这有帮助,请告诉我,至少这是我所了解的您想通过代码实现的目标。 :D
像往常一样,我咬得太多了。 我有一个文件,"list.xlsx"。该文件有 3 个 sheet、"current students"、"finished" 和 "cancelled"。 sheet 都包含以下 headers 下的数据 [StudentId, FirstName, Lastname, DoB, Nationality, CourseID, CourseName, Startdate, Finishdate, UnitID, UnitName, UnitCompetency]
我创建了以下可憎的东西,开始了我需要的东西。
我想要它做的是:
1) 在以 sheet
命名的文件夹中,根据 StudentId(唯一)创建一个 FirstName + Lastname.xlsx 的文件2) 在该文件中,从其余列中获取所有信息并将其附加到它们的文件中
#python 3.8
import pandas as pd
import os
import shutil
file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)
if os.path.exists(CS):
shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'
CancelID = CanXlsx.StudentId.unique()
FinID = FinXlsx.StudentId.unique()
CSID = CSXlsx.StudentId.unique()
我以为我在 for 循环等方面做得越来越好,但似乎无法理解它们。 逻辑可以想,就是代码不行。
https://drive.google.com/file/d/134fqWx6veF7zp_12GqFYlbmPZnK8ihaV/view?usp=sharing
我认为为此所需的方法是创建 3 个数据框(可能可以用一个来完成,但我不记得了)。 1) 然后,在每个数据帧上,您需要提取 "First Name + Last Name" 的列表,然后,2) 您需要在数据帧上创建掩码以提取信息并保存。
import os
import shutil
file = "list.xlsx"
CS = "current student"
Fin = "finished"
Can = "cancelled"
TheList = {CS, Fin, Can}
CanXlsx = pd.read_excel(file, sheet_name = Can)
FinXlsx = pd.read_excel(file, sheet_name = Fin)
CSXlsx = pd.read_excel(file, sheet_name = CS)
## File Creation
if os.path.exists(CS):
shutil.rmtree(CS)
os.mkdir(CS)
CSDir = '//current student//'
if os.path.exists(Fin):
shutil.rmtree(Fin)
os.mkdir(Fin)
FinDir = '//finished//'
if os.path.exists(Can):
shutil.rmtree(Can)
os.mkdir(Can)
CanDir = '//cancelled//'
# Create full names
CanXlsx["Fullname"] = CanXlsx["StudentId"] + "_" + CanXlsx["First Name"] + "_" + CanXlsx["Last Name"]
## Same for the other dfs
# Get a list of ids
# canFullNames = list(CanXlsx["Fullname"]) Edit: Preferred approach through student Ids
canIds = list(CanXlsx["StudentId"])
## Same for the other dfs
# Loop over the list of full names to create your df
for id in canIds:
df1 = CanXlsx[CanXlsx["StudenId"] == id] # This will filter the rows by the id you want
# Retrieve the full name
name = df1.iloc[0]["Fullname"]
# Create the filename
filename = os.path.join(CanDir,name + ".xlsx")
df1.drop(columns = ["First Name", "Last Name"] # I understand that these columns are not required on each file
df1.to_excel(filename,header=True,index=False)
## Same for the other dfs
如果这有帮助,请告诉我,至少这是我所了解的您想通过代码实现的目标。 :D