使用 beautifulsoup 从 html 页面获取星级

Take the star rating from html page using beautifulsoup

我正在尝试从此页面获取星级 (https://www.edmunds.com/tesla/model-3/2019/consumer-reviews/)

我说的是安全、性能、舒适等部分

下面是 html 代码:

<div class="justify-content-between flex-column flex-md-row row"><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Safety</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Technology</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Performance</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Interior</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Comfort</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Reliability</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Value</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl></div></div></div>

如果代码太长,我会发布屏幕截图

这是我使用的代码,但是当涉及到上述标签时它不起作用

data = []
ua = UserAgent()
header = {'User-Agent':str(ua.safari)}
url = 'https://www.edmunds.com/tesla/model-3/2019/consumer-reviews/'
response = requests.get(url, headers=header)
html_soup = BeautifulSoup(response.text, 'lxml')
content_list = html_soup.find_all('div', attrs={'class': 'review-item'})
for e in content_list:

  d = {'review_title': e.a.text,
                'review_content': e.select_one('p').text,
                'overall_rating': e.select_one('span.sr-only').text,
                'reviewer_name':e.div.text.split(',')[0].strip(),
                'review_date':e.div.text.split(',')[1].strip(),
                 
              }

  data.append(d)
df = pd.DataFrame(data)
df1 = df.drop_duplicates(subset=['reviewer_name', 'review_title'], keep='first')

基本上,我想要实现的是为每个星级评分设置列,例如安全性:5.0、性能:5.0、舒适性:5.0 等等。

我正在尝试使用这部分代码:

d.update(dict(s.stripped_strings for s in e.select('span.rating-stars span.sr-only')))
data.append(d)

然而它不起作用。此外,包含总体星级和详细星级的标签具有相同的 class,不同之处在于这两个标签位于不同的标签下(我希望我没有把它弄得太复杂)。无论如何,我希望有人能帮助我。

编辑 我稍微编辑了一段代码,因为我粘贴的代码似乎不起作用,这很奇怪

一般来说,在正确选择元素的情况下使用 stripped_strings 会很安静:

d.update(dict(s.stripped_strings for s in e.select('dl')))

由于您的预期输出,我建议分别为 keyvalue 选择字符串:

...
d.update({s.dt.text:float(s.dd.text.split()[0]) for s in e.select('dl')})

data.append(d)
...

这会将您的 dict 更新为:

{'Safety': 5.0, 'Technology': 5.0, 'Performance': 5.0, 'Interior': 5.0, 'Comfort': 5.0, 'Reliability': 5.0, 'Value': 5.0}

或者在没有 ResultSetdict 为空的情况下。