试图抓取具有相同 div 且没有其他信息的文本

Question

这个html有3个同名accounts-table__count但信息类型不同的div。

我正在尝试获取此页面的帖子数和关注者数。有没有办法使用 css 选择器获取文本？

<div class='directory__card__extra'>
    <div class='accounts-table__count'>
        629
        <small>posts</small>
    </div>
    <div class='accounts-table__count'>
        72
        <small>followers</small>
    </div>
    <div class='accounts-table__count'>
        <time class='time-ago' datetime='2021-05-18' title='May 18, 2021'>May 18, 2021</time>
        <small>last active</small>
    </div>
</div>

我的代码；

    def parse(self, response):
        for users in response.css('div.directory__card'):
            yield {
                'id': users.css('span::text').get().replace('@','').replace('.','-'),
                'name': users.css('strong.p-name::text').get(),
                'posts': ''              // this is the post count //
                'followers': ''             // this is the follower count //
                'description': users.css('p::text').get(),
                'fediverse': users.css('span::text').get(),
                'link': users.css('a.directory__card__bar__name').attrib['href'],
                'image': users.css('img.u-photo').attrib['src'],
                'bg-image': users.css('img').attrib['src'],

            }
        for nextpage in response.css('span.next'):
            next_page = nextpage.css('a').attrib['href']
            if next_page is not None:
                yield response.follow(next_page, callback=self.parse)

Answer 1

例如，遍历卡片，为每个卡片获取 text 形状的值并过滤掉这些值。

raw_data = response.css(".directory__card")[0].css(".accounts-table__count::text").getall()
values = list(filter(lambda s: s != "", map(lambda s: s.strip(), raw_data)))

.accounts-table__count::text 的 css 选择器中的某些值是空的，因为带有此 class 的 div 元素没有文本，但其他 html 元素它。

试图抓取具有相同 div 且没有其他信息的文本

Trying to scrape texts with the same divs and no other info

scrapy