Google 抓取 returns 没有描述或电子邮件

Google scrape returns no description or email

我试图从每个 Google 搜索中获取描述和电子邮件,但 returns 只有标题和链接。我使用 Selenium 打开页面并使用 bs4 抓取实际内容。

我做错了什么?请帮忙。 谢谢!

soup = BeautifulSoup(driver.page_source,'lxml')
result_div = soup.find_all('div', attrs={'class': 'g'})


links = []
titles = []
descriptions = []
emails = []
phones = []

for r in result_div:
# Checks if each element is present, else, raise exception
    try:
    # link
        link = r.find('a', href=True)

    # title
        title = None
        title = r.find('h3')

        if isinstance(title,Tag):
            title = title.get_text()

    # desc
        description = None
        description = r.find('div', attrs={'class': 'IsZvec'})
        #description = r.find('span')
    

        if isinstance(description, Tag):
            description = description.get_text()
            print(description)
    # email

        email = r.find(text=re.compile(r'[A-Za-z0-9\.\+_-]+@[A-Za-z0-9\._-]+\.[a-zA-Z]*'))

这里的主要问题是 class 名称是动态的,因此您必须通过 tagid 更改您的策略和 select 您的元素。

...
data = []

for e in soup.select('div:has(> div > a h3)'):
    data.append({
        'title':e.h3.text,
        'url':e.a.get('href'),
        'desc':e.next_sibling.text,
        'email':m.group(0) if (m:= re.search(r'[\w.+-]+@[\w-]+\.[\w.-]+', e.parent.text)) else None
    })
    
data

输出

[{'title': 'Email design at Stack Overflow',
  'url': 'https://Whosebug.design/email/guidelines/getting-started/',
  'desc': 'An email design system that helps us work together to create consistently-designed, properly-rendered email for all Stack Overflow users.',
  'email': None},
 {'title': 'Is email from do-not-reply@Whosebug.email legit? - Meta ...',
  'url': 'https://meta.whosebug.com/questions/338332/is-email-from-do-not-replyWhosebug-email-legit',
  'desc': '23.11.2016 · 1\xa0AntwortYes it is legit. We use it to protect whosebug.com user cookies from third parties. The links in the email are all rewritten to a\xa0...',
  'email': 'do-not-reply@Whosebug.email'},
 {'title': "Newest 'email' Questions - Stack Overflow",
  'url': 'https://whosebug.com/questions/tagged/email',
  'desc': 'Use this tag for questions involving code to send or receive email messages. Posting to ask why the emails you send are marked as spam is off-topic for Stack\xa0...',
  'email': None},
 {'title': 'Contact information - contact us today - Stack Overflow',
  'url': 'https://Whosebug.co/company/contact',
  'desc': "A private, secure home for your team's questions and answers. Perfect for teams of 10-500 members. No more digging through stale wikis and lost emails—give your\xa0...",
  'email': None},
 {'title': 'How can I get the email of a Whosebug user? - Meta Stack ...',
  'url': 'https://meta.stackexchange.com/questions/64970/how-can-i-get-the-email-of-a-Whosebug-user',
  'desc': '18.09.2010 · 1\xa0AntwortYou can\'t. Read your own profile. The e-mail box says "never displayed". The closest we have to private messaging is commenting as a reply\xa0...',
  'email': None},...]