Google Webscraper (URLS) - 包括超过第一页的结果

Google Webscraper (URLS) - including more than the first page in results

获得了一个基本的 Google 网络抓取工具,其中 returns 第一个 google 搜索页面的 url - 我希望它在更多页面上包含 URLS。对这段代码进行分页的最佳方式是什么,以便它从第 2、3、4、5、6、7 等页面获取 URLS

不想进入 space 我废弃了多少页,但绝对想要的不仅仅是第一页!

import requests
import urllib
import pandas as pd
from requests_html import HTML
from requests_html import HTMLSession


def get_source(url):
    try:
        session = HTMLSession()
        response = session.get(url)
        return response

    except requests.exceptions.RequestException as e:
        print(e)


def scrape_google(query):

    query = urllib.parse.quote_plus(query)
    response = get_source("https://www.google.co.uk/search?q=" + query)

    links = list(response.html.absolute_links)
    google_domains = ('https://www.google.',
                      'https://google.',
                      'https://webcache.googleusercontent.',
                      'http://webcache.googleusercontent.',
                      'https://policies.google.',
                      'https://support.google.',
                      'https://maps.google.')

    for url in links[:]:
        if url.startswith(google_domains):
            links.remove(url)

    return links

print(scrape_google('https://www.google.com/search?q=letting agent'))

您可以迭代特定的 range() 并通过将迭代次数乘以 10 来设置开始参数 - 将结果保存到 list 并使用 set() 删除重复项:

data = []

for i in range(3):
    data.extend(scrape_google('letting agent', i*10))

set(data)

例子

import requests

def scrape_google(query,start):

    response = get_source(f"https://www.google.co.uk/search?q={query}&start={start}")

    links = list(response.html.absolute_links)
    google_domains = ('https://www.google.',
                      'https://google.',
                      'https://webcache.googleusercontent.',
                      'http://webcache.googleusercontent.',
                      'https://policies.google.',
                      'https://support.google.',
                      'https://maps.google.')

    for url in links[:]:
        if url.startswith(google_domains):
            links.remove(url)

    return links
    
data = []

for i in range(3):
    data.extend(scrape_google('letting agent', i*10))

print(set(data))

输出

{'https://www.lettingagenttoday.co.uk/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://howsy.com/&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.propertymark.co.uk/professional-standards/consumer-guides/landlords/what-does-a-letting-agent-do.html&prev=search&pto=aue', 'https://www.citizensadvice.org.uk/housing/renting-privately/during-your-tenancy/complaining-about-your-letting-agent/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.allagents.co.uk/find-agent/&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.theonlinelettingagents.co.uk/&prev=search&pto=aue', 'https://www.which.co.uk/money/mortgages-and-property/buy-to-let/using-a-letting-agent-a16lu1w364rv', 'https://www.gov.uk/government/publications/non-resident-landord-guidance-notes-for-letting-agents-and-tenants-non-resident-landlords-scheme-guidance-notes', 'https://lettingagentregistration.gov.scot/renew', 'https://en.wikipedia.org/wiki/Letting_agent#Services_and_fees', 'https://patriciashepherd.co.uk/', 'https://dict.leo.org/englisch-deutsch/letting%20agent', 'https://www.diamonds-salesandlettings.co.uk/', 'https://www.lettingagentproperties.com/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.ukala.org.uk/&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://register.lettingagentregistration.gov.scot/search&prev=search&pto=aue', 'https://context.reverso.net/%C3%BCbersetzung/englisch-deutsch/letting+agent', 'https://www.cubittandwest.co.uk/landlord-guides/what-is-a-letting-agent/', 'https://en.wikipedia.org/wiki/Letting_agent', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://safeagents.co.uk/&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://charlesroseproperties.co.uk/news/letting-agent-vs-estate-agent-the-differences/&prev=search&pto=aue', 'https://www.tenantshop.co.uk/letting-agents/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://lettingagentregistration.gov.scot/renew&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.winkworth.co.uk/&prev=search&pto=aue', 'https://objego.de/lp-immobilienverwaltung/', 'https://www.facebook.com/agestateagents/videos/looking-to-instruct-a-letting-agent-not-sure-what-you-should-be-looking-for-or-w/688390845096579/', 'https://www.ukala.org.uk/', 'https://en.wikipedia.org/wiki/Letting_agent#Regulation', 'https://www.foxtons.co.uk/', 'https://ibizaprestige.com/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.which.co.uk/money/mortgages-and-property/buy-to-let/using-a-letting-agent-a16lu1w364rv&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.tenantshop.co.uk/letting-agents/&prev=search&pto=aue', 'https://www.dict.cc/?s=letting+agent', 'https://www.landlordaccreditationscotland.com/letting-agent-training/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.gov.uk/government/publications/non-resident-landord-guidance-notes-for-letting-agents-and-tenants-non-resident-landlords-scheme-guidance-notes&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.propertyinvestmentsuk.co.uk/what-is-a-letting-agent/&prev=search&pto=aue', 'https://www.propertyinvestmentsuk.co.uk/what-is-a-letting-agent/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.leaders.co.uk/&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://en.wikipedia.org/wiki/Letting_agent&prev=search&pto=aue', 'https://www.allagents.co.uk/find-agent/', 'https://www.leaders.co.uk/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.foxtons.co.uk/&prev=search&pto=aue', 'https://howsy.com/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://patriciashepherd.co.uk/&prev=search&pto=aue', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.lettingagenttoday.co.uk/&prev=search&pto=aue', 'https://register.lettingagentregistration.gov.scot/search', 'https://www.linguee.de/englisch-deutsch/uebersetzung/letting+agent.html', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.diamonds-salesandlettings.co.uk/&prev=search&pto=aue', 'https://www.theonlinelettingagents.co.uk/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.lettingagentproperties.com/&prev=search&pto=aue', 'http://www.paul-partner.com/', 'https://www.homeday.de/de/homeday-makler/rhein-main-gebiet-sued/?utm_medium=seo&utm_source=gmb&utm_campaign=rhein_main_gebiet_sued', 'https://www.propertymark.co.uk/professional-standards/consumer-guides/landlords/what-does-a-letting-agent-do.html', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.citizensadvice.org.uk/housing/renting-privately/during-your-tenancy/complaining-about-your-letting-agent/&prev=search&pto=aue', 'https://safeagents.co.uk/', 'https://charlesroseproperties.co.uk/news/letting-agent-vs-estate-agent-the-differences/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.landlordaccreditationscotland.com/letting-agent-training/&prev=search&pto=aue', 'https://move.uk.net/', 'https://www.winkworth.co.uk/', 'https://translate.google.co.uk/translate?hl=de&sl=en&u=https://www.cubittandwest.co.uk/landlord-guides/what-is-a-letting-agent/&prev=search&pto=aue'}