无法获取代码以抓取多个页面
Cannot get code to scrape more than one page
我想从网站上抓取多个页面。我已经编写了获取一页的代码,我可以让它抓取多个页面,但是我无法存储信息,它只存储抓取的最后一页的数据。我需要抓取内容,然后我可以将其放在数据框中。代码如下
for page in range(0, 100, 50):
#This website outputs search results of 50 items per page, for testing I tried to make it
get two pages. That it it opens up the links of each page.
driver.get(f"https://www.website.com/search/{page}...")
firstvariable = driver.find_elements_by_css_selector("1stselector")
secondvariable = driver.find_elements_by_css_selector("2ndselector")
for k in range(len(firstvariable)):
temporarydata = {"First Variable": firstvariable[k].text,
"Second Variable": secondvariable[k].text}
#check that it does print correctly
scraperesult.append(temporarydata) #tried adding a (page) index but says not callable
df_data= df_data.append(scraperesult) #Tried df_data= df_data.append(scraperesult)(page) as well
我能够弄清楚,意识到我的索引是错误的。我必须减少范围 -50 以便第一个 link 将是:
for page in range(-50, 100, 50):
driver.get(f"https://www.website.com/search/{page}...")
第一个元素是:
driver.get(f"https://www.website.com/search/{0}...") #This was what I wanted it to start with
我想从网站上抓取多个页面。我已经编写了获取一页的代码,我可以让它抓取多个页面,但是我无法存储信息,它只存储抓取的最后一页的数据。我需要抓取内容,然后我可以将其放在数据框中。代码如下
for page in range(0, 100, 50):
#This website outputs search results of 50 items per page, for testing I tried to make it
get two pages. That it it opens up the links of each page.
driver.get(f"https://www.website.com/search/{page}...")
firstvariable = driver.find_elements_by_css_selector("1stselector")
secondvariable = driver.find_elements_by_css_selector("2ndselector")
for k in range(len(firstvariable)):
temporarydata = {"First Variable": firstvariable[k].text,
"Second Variable": secondvariable[k].text}
#check that it does print correctly
scraperesult.append(temporarydata) #tried adding a (page) index but says not callable
df_data= df_data.append(scraperesult) #Tried df_data= df_data.append(scraperesult)(page) as well
我能够弄清楚,意识到我的索引是错误的。我必须减少范围 -50 以便第一个 link 将是:
for page in range(-50, 100, 50):
driver.get(f"https://www.website.com/search/{page}...")
第一个元素是:
driver.get(f"https://www.website.com/search/{0}...") #This was what I wanted it to start with