如何使用python循环填充excel文件中的selenium scraping

Question

我正在尝试抓取一个包含许多页面的网站，使用 selenium，我每次在第二个 'TAB' 中打开一个页面并启动我的函数来获取数据。之后我关闭选项卡并打开下一个选项卡并继续提取直到最后一页。我的问题是当我将数据保存在 excel 文件中时，我发现它只保存了从最后一页（选项卡）提取的最后信息。你能帮我找出我的错误吗？

def scrape_client_infos(linksss):

tds=[] # tds is the list that contain the data

reader=pd.read_excel(r'C:\python projects\mada\db.xlsx')
writer= pd.ExcelWriter(r'C:\python projects\mada\db.xlsx',engine='openpyxl')
html = urlopen(linksss)
soup=BeautifulSoup.BeautifulSoup(html,'html.parser')

table=soup.find('table',attrs={'class':'r2'})

#scrab all the tr that contain text    
for tr in table.find_all('tr'):
    elem = tr.find('td').get_text()
    elem=elem.replace('\t','')
    elem=elem.replace('\n','')
    elem=elem.replace('\r','')
    tds.append(elem)
    
print(tds)   

#selecting the data that i need to save in excel
raw_data={'sub_num':[tds[1]],'id':[tds[0]],'nationality':[tds[2]],'country':[tds[3]],'city':[tds[3]],'age':[tds[7]],'marital_status':[tds[6]],'wayy':[tds[5]]}    
df=pd.DataFrame(raw_data,columns=['sub_num','id','nationality','country','city','age','marital_status','wayy'])

#save the data in excel file
df.to_excel(writer, sheet_name='Sheet1',startrow=len(reader), header=False)
writer.save()
return soup

P.S：我一直想从最后一行

开始填充excel文件

Answer 1

要使用Pandas附加excel数据，您需要在编写器对象中设置工作表.

更新代码中的最后一部分：

#save the data in excel file
from openpyxl import load_workbook
book = load_workbook(path)
startrw = book['Sheet1'].max_row+1
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)  # prevent overwrite
df.to_excel(writer, sheet_name='Sheet1',startrow=startrw, header=False)
writer.save()
return soup

如何使用python循环填充excel文件中的selenium scraping

how to fill excel file from selenium scraping in loop with python

python

excel

selenium

beautifulsoup

openpyxl