如何解析从 nodejs 中的 html 标签中获取文本?
How parse fetch text from html tags in nodejs?
我在 nodejs 中有一个 html 作为文本,如下所示:
var htmlText = `<div class="X7NTVe">
<a class="tHmfQe" href="/link1">
<div class="am3QBf">
<div>
<span>
<div class="BNeawe deIvCb AP7Wnd">
<span dir="rtl">My First Text</span>
</div>
</span>
</div>
</div>
</a>
<div class="HBTM6d XS7yGd">
<a href="/anotherLink1">
<div class="BNeawe mAdjQc uEec3 AP7Wnd">></div>
</a>
</div>
</div>
<div class="x54gtf"></div>
<div class="X7NTVe">
<a class="tHmfQe" href="/link2">
<div class="am3QBf">
<div>
<span>
<div class="BNeawe deIvCb AP7Wnd">
<span dir="rtl">My Second Text</span>
</div>
</span>
</div>
</div>
</a>
<div class="HBTM6d XS7yGd">
<a href="/anotherLink2">
<div class="BNeawe mAdjQc uEec3 AP7Wnd">></div>
</a>
</div>
</div>
<div class="x54gtf"></div>`
现在我想从数组中获取文本。在一个例子中,它必须 return My First Text
和 My Second Text
。我该怎么做?
注意:我想在javascript.
中的nodejs注释中做
方法#1
用正则表达式替换所有标签 /<[^>]*>/g
。
方法#2
用jsdom解析html,通过js文档api.
访问html节点
method#2 is much more flexible.
与 cheerio:
let $ = cheerio.load(html)
let strings = $('div[class="BNeawe deIvCb AP7Wnd"]>span[dir]')
.get().map(span => $(span).text())
我在 nodejs 中有一个 html 作为文本,如下所示:
var htmlText = `<div class="X7NTVe">
<a class="tHmfQe" href="/link1">
<div class="am3QBf">
<div>
<span>
<div class="BNeawe deIvCb AP7Wnd">
<span dir="rtl">My First Text</span>
</div>
</span>
</div>
</div>
</a>
<div class="HBTM6d XS7yGd">
<a href="/anotherLink1">
<div class="BNeawe mAdjQc uEec3 AP7Wnd">></div>
</a>
</div>
</div>
<div class="x54gtf"></div>
<div class="X7NTVe">
<a class="tHmfQe" href="/link2">
<div class="am3QBf">
<div>
<span>
<div class="BNeawe deIvCb AP7Wnd">
<span dir="rtl">My Second Text</span>
</div>
</span>
</div>
</div>
</a>
<div class="HBTM6d XS7yGd">
<a href="/anotherLink2">
<div class="BNeawe mAdjQc uEec3 AP7Wnd">></div>
</a>
</div>
</div>
<div class="x54gtf"></div>`
现在我想从数组中获取文本。在一个例子中,它必须 return My First Text
和 My Second Text
。我该怎么做?
注意:我想在javascript.
中的nodejs注释中做方法#1
用正则表达式替换所有标签 /<[^>]*>/g
。
方法#2
用jsdom解析html,通过js文档api.
访问html节点method#2 is much more flexible.
与 cheerio:
let $ = cheerio.load(html)
let strings = $('div[class="BNeawe deIvCb AP7Wnd"]>span[dir]')
.get().map(span => $(span).text())