Openpyxl:从列中删除重复的单元格
Openpyxl: Removing Duplicate cells from a column
我正在尝试使用 openpyxl 从列中删除重复条目并将唯一条目写入不同的工作簿。
输入文件:
Cust1
Cust1
Cust1
Cust2
Cust2
Cust3
预期输出为:
Cust1
Cust2
Cust3
wb1 = openpyxl.load_workbook('OldFile.xlsx')
ws = wb1.active
wb2 = openpyxl.Workbook()
ws2 = wb2.active
k=1
new_row1 = []
for i in range(2, ws.max_row + 1 ):
new_row1.append([]) #list for storing the unique entries
row_name = ws.cell(row=i,column=1).value #taking the 1st cell's value
new_row1[k].append(row_name) #Appending the list
ws2.append(new_row1) #writing to new workbook
k+=1
for j in range(3, ws.max_row + 1 ):
row_name2 = ws.cell(row=j, column=1).value #taking 2nd cell's value
if row_name == row_name2: #comparing both the values
i+=1
j+=1
wb2.save('NewFile.xlsx')
我得到 "IndexError: list index out of range" for line "new_row1[k].append(row_name)"
,除了提到的错误之外,还有一些必须更改才能获得所需输出的内容。
正如@CharlieClark 所说,您的代码过于复杂。改为尝试:
ws1 = wb1.active # keep naming convention consistent
values = []
for i in range(2,ws1.max_row+1):
if ws1.cell(row=i,column=1).value in values:
pass # if already in list do nothing
else:
values.append(ws1.cell(row=i,column=1).value)
for value in values:
ws2.append([value])
iter_rows可以用
ws1 = wb1.active
values = []
for row in ws.iter_rows(min_row=2):
if row[1].value in values:
pass
else:
values.append(row[1].value)
for value in values:
ws2.append([value])
测试好的解决方案:
import openpyxl
wb=openpyxl.load_workbook('old_file.xlsx')
wb2=openpyxl.load_workbook('new_file.xlsx')
sh=wb['Sheet1']
sh2=wb2['Sheet1']
values= []
for i in range(1,sh.max_row+1):
a=sh.cell(row=i,column=1).value
if a in values:
pass
else:
values.append(sh.cell(row=i,column=1).value)
for x in range(len(values)):
sh2.cell(row=x+1,column=1).value=values[x]
wb2.save('new_file.xlsx')
我正在尝试使用 openpyxl 从列中删除重复条目并将唯一条目写入不同的工作簿。
输入文件:
Cust1 Cust1 Cust1 Cust2 Cust2 Cust3
预期输出为:
Cust1 Cust2 Cust3
wb1 = openpyxl.load_workbook('OldFile.xlsx')
ws = wb1.active
wb2 = openpyxl.Workbook()
ws2 = wb2.active
k=1
new_row1 = []
for i in range(2, ws.max_row + 1 ):
new_row1.append([]) #list for storing the unique entries
row_name = ws.cell(row=i,column=1).value #taking the 1st cell's value
new_row1[k].append(row_name) #Appending the list
ws2.append(new_row1) #writing to new workbook
k+=1
for j in range(3, ws.max_row + 1 ):
row_name2 = ws.cell(row=j, column=1).value #taking 2nd cell's value
if row_name == row_name2: #comparing both the values
i+=1
j+=1
wb2.save('NewFile.xlsx')
我得到 "IndexError: list index out of range" for line "new_row1[k].append(row_name)"
,除了提到的错误之外,还有一些必须更改才能获得所需输出的内容。
正如@CharlieClark 所说,您的代码过于复杂。改为尝试:
ws1 = wb1.active # keep naming convention consistent
values = []
for i in range(2,ws1.max_row+1):
if ws1.cell(row=i,column=1).value in values:
pass # if already in list do nothing
else:
values.append(ws1.cell(row=i,column=1).value)
for value in values:
ws2.append([value])
iter_rows可以用
ws1 = wb1.active
values = []
for row in ws.iter_rows(min_row=2):
if row[1].value in values:
pass
else:
values.append(row[1].value)
for value in values:
ws2.append([value])
测试好的解决方案:
import openpyxl
wb=openpyxl.load_workbook('old_file.xlsx')
wb2=openpyxl.load_workbook('new_file.xlsx')
sh=wb['Sheet1']
sh2=wb2['Sheet1']
values= []
for i in range(1,sh.max_row+1):
a=sh.cell(row=i,column=1).value
if a in values:
pass
else:
values.append(sh.cell(row=i,column=1).value)
for x in range(len(values)):
sh2.cell(row=x+1,column=1).value=values[x]
wb2.save('new_file.xlsx')