Python 中的 Web 抓取问题,将所需的标签放入列表会导致列表中的内容有些为空,有些则为非空

Issues with Web Scraping in Python, putting desired tags into list results in some empty and some not empty contents of the list

我是网络抓取的新手,我正在使用 BeautifulSoup 来做到这一点

我的问题是,当我将所需内容从另一个包含一些标签的列表放入列表时,第二个列表有一些缺失值。

Here is my list from which I am getting values

我想从这个列表中创建一个包含评论的列表,所以我使用

names = []
for item in basic_info:
  for i in item:
    names.append(i.find_all("p", attrs = {"class" : "review-body"}))

问题是输出看起来像这样

The output of the above code

所以基本上我不是一个一个地获取值,而是在列表中每隔一个位置获取它们,所以第一个是空的,第二个有数据,第三个是空的,然后第四个有数据,依此类推

注意 以文本而非图片形式提供所有相关信息会很棒。

假设您想提取多个评论信息,您应该select 所有容器并遍历以抓取和存储结构化数据:

data = []

for e in soup.select('div.consumer-review-container'):
    data.append({
        'review-title':e.h3.text,
        'review-type':e.select_one('div.review-type').text,
        'review-section':e.select_one('p.review-body').text
    })

例子

sample = '''
<div class="sds-container consumer-review-container">
    <h3 class="sds-heading--7 title">Excellent Car</h3>
    <div class="review-byline review-section">
      <div>June  8, 2021</div>
      <div>By ValC from Fairfield, CT</div>
        <div class="review-type"><strong>Owns this car</strong></div>
    </div>
    <div class="review-section">
      <p class="review-body">I love that BMW makes a mid size suv that is part electric now. We purchased the x3 edrive. I do mostly local driving during the week so only need to fill up with gas once a month to every six weeks. Excellent car!</p>
    </div>
</div>
<div class="sds-container consumer-review-container">
    <h3 class="sds-heading--7 title">Best Purchase for the Value and Cost</h3>
    <div class="review-byline review-section">
      <div>June  7, 2021</div>
      <div>By Brandon from Peachtree City from Peachtree City, GA</div>
        <div class="review-type"><strong>Owns this car</strong></div>
    </div>
    <div class="review-section">
      <p class="review-body">The BMW X3 is not a crossover that should be ignored. It is more than I expected coming from someone who has owned a 3 Series BMW for the last 6 years. Now I ask myself, why didn't I opt for the X3 much sooner, especially since it is more spacious with better features than my 3 series. Plus the price was only about ,000 more for plenty more space and amenities. You owe it to yourself to test drive one soon!</p>
    </div>
</div>

'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(sample)

data = []

for e in soup.select('div.consumer-review-container'):
    data.append({
        'review-title':e.h3.text,
        'review-type':e.select_one('div.review-type').text,
        'review-section':e.select_one('p.review-body').text
    })
    
print(data)
输出
[{'review-title': 'Excellent Car',
  'review-type': 'Owns this car',
  'review-section': 'I love that BMW makes a mid size suv that is part electric now. We purchased the x3 edrive. I do mostly local driving during the week so only need to fill up with gas once a month to every six weeks. Excellent car!'},
 {'review-title': 'Best Purchase for the Value and Cost',
  'review-type': 'Owns this car',
  'review-section': "The BMW X3 is not a crossover that should be ignored. It is more than I expected coming from someone who has owned a 3 Series BMW for the last 6 years. Now I ask myself, why didn't I opt for the X3 much sooner, especially since it is more spacious with better features than my 3 series. Plus the price was only about ,000 more for plenty more space and amenities. You owe it to yourself to test drive one soon!"}]