网页资源 html <time class=""></time>
Websrcaping html <time class=""></time>
我想知道日期。但是我只有一个 None.
<div class="article-cover">
<div class="article-cover-img">
<img src="https://api.hvg.hu/Img/da658e97-86c0-40f3-acd3-b0a850f32c30/e6f183bb-25a9-468e-ae30-6d98952ffc00.jpg" alt="Will Smith lemondott amerikai filmakadémiai tagságáról" width="800" height="370">
</div>
<div class="article-cover-text">
<div class="article-info byline">
<div class="info">
<time class="article-datetime" datetime="2022-04-02T08:13:00.0000000+02:00">2022. április. 02. 08:13</time>
<time class="lastdate" datetime="2022-04-02T08:16:17.0000000+02:00">2022. április. 02. 08:16</time>
<a href="/kultura" class="uppercase">Kult</a>
</div>
</div>
<div class="article-title article-title">
<h1>Will Smith lemondott amerikai filmakadémiai tagságáról</h1>
</div>
</div>
<button class="articlesavebutton bookmark large" data-id="994e8ff1-f28e-4153-9f6c-87283f187af7" data-event-category="Myhvg_article_save" data-event-action="ClickOnLink" data-event-label="Article_save_MTI"></button>
</div>
我想回来:2022 年。 02.08:13
我的代码是:
article_soup = BeautifulSoup(article.content, "html.parser")
d=article_soup.find('time', class\_='article-datetime')
soup = BeautifulSoup(html, "html.parser")
adate = soup.findAll("time", {"class": "article-datetime"})
print(adate[0].get_text())
主要问题是打字错误 class\_='article-datetime'
要获取文本,只需使用 get_text()
方法:
article_soup.find('time', class_='article-datetime').get_text()
或
article_soup.find('time', class_='article-datetime').text
例子
html = '''
<div class="article-cover">
<div class="article-cover-img">
<img src="https://api.hvg.hu/Img/da658e97-86c0-40f3-acd3-b0a850f32c30/e6f183bb-25a9-468e-ae30-6d98952ffc00.jpg" alt="Will Smith lemondott amerikai filmakadémiai tagságáról" width="800" height="370">
</div>
<div class="article-cover-text">
<div class="article-info byline">
<div class="info">
<time class="article-datetime" datetime="2022-04-02T08:13:00.0000000+02:00">2022. április. 02. 08:13</time>
<time class="lastdate" datetime="2022-04-02T08:16:17.0000000+02:00">2022. április. 02. 08:16</time>
<a href="/kultura" class="uppercase">Kult</a>
</div>
</div>
<div class="article-title article-title">
<h1>Will Smith lemondott amerikai filmakadémiai tagságáról</h1>
</div>
</div>
<button class="articlesavebutton bookmark large" data-id="994e8ff1-f28e-4153-9f6c-87283f187af7" data-event-category="Myhvg_article_save" data-event-action="ClickOnLink" data-event-label="Article_save_MTI"></button>
</div>'''
article_soup = BeautifulSoup(html)
article_soup.find('time', class_='article-datetime').get_text()
输出
2022. április. 02. 08:13
我想知道日期。但是我只有一个 None.
<div class="article-cover">
<div class="article-cover-img">
<img src="https://api.hvg.hu/Img/da658e97-86c0-40f3-acd3-b0a850f32c30/e6f183bb-25a9-468e-ae30-6d98952ffc00.jpg" alt="Will Smith lemondott amerikai filmakadémiai tagságáról" width="800" height="370">
</div>
<div class="article-cover-text">
<div class="article-info byline">
<div class="info">
<time class="article-datetime" datetime="2022-04-02T08:13:00.0000000+02:00">2022. április. 02. 08:13</time>
<time class="lastdate" datetime="2022-04-02T08:16:17.0000000+02:00">2022. április. 02. 08:16</time>
<a href="/kultura" class="uppercase">Kult</a>
</div>
</div>
<div class="article-title article-title">
<h1>Will Smith lemondott amerikai filmakadémiai tagságáról</h1>
</div>
</div>
<button class="articlesavebutton bookmark large" data-id="994e8ff1-f28e-4153-9f6c-87283f187af7" data-event-category="Myhvg_article_save" data-event-action="ClickOnLink" data-event-label="Article_save_MTI"></button>
</div>
我想回来:2022 年。 02.08:13
我的代码是:
article_soup = BeautifulSoup(article.content, "html.parser")
d=article_soup.find('time', class\_='article-datetime')
soup = BeautifulSoup(html, "html.parser")
adate = soup.findAll("time", {"class": "article-datetime"})
print(adate[0].get_text())
主要问题是打字错误 class\_='article-datetime'
要获取文本,只需使用 get_text()
方法:
article_soup.find('time', class_='article-datetime').get_text()
或
article_soup.find('time', class_='article-datetime').text
例子
html = '''
<div class="article-cover">
<div class="article-cover-img">
<img src="https://api.hvg.hu/Img/da658e97-86c0-40f3-acd3-b0a850f32c30/e6f183bb-25a9-468e-ae30-6d98952ffc00.jpg" alt="Will Smith lemondott amerikai filmakadémiai tagságáról" width="800" height="370">
</div>
<div class="article-cover-text">
<div class="article-info byline">
<div class="info">
<time class="article-datetime" datetime="2022-04-02T08:13:00.0000000+02:00">2022. április. 02. 08:13</time>
<time class="lastdate" datetime="2022-04-02T08:16:17.0000000+02:00">2022. április. 02. 08:16</time>
<a href="/kultura" class="uppercase">Kult</a>
</div>
</div>
<div class="article-title article-title">
<h1>Will Smith lemondott amerikai filmakadémiai tagságáról</h1>
</div>
</div>
<button class="articlesavebutton bookmark large" data-id="994e8ff1-f28e-4153-9f6c-87283f187af7" data-event-category="Myhvg_article_save" data-event-action="ClickOnLink" data-event-label="Article_save_MTI"></button>
</div>'''
article_soup = BeautifulSoup(html)
article_soup.find('time', class_='article-datetime').get_text()
输出
2022. április. 02. 08:13