如何解析从 nodejs 中的 html 标签中获取文本?

How parse fetch text from html tags in nodejs?

我在 nodejs 中有一个 html 作为文本,如下所示:

var htmlText = `<div class="X7NTVe">
        <a class="tHmfQe" href="/link1">
            <div class="am3QBf">
                <div>
                    <span>
                        <div class="BNeawe deIvCb AP7Wnd">
                            <span dir="rtl">My First Text</span>
                        </div>
                    </span>
                </div>
            </div>
        </a>
        <div class="HBTM6d XS7yGd">
            <a href="/anotherLink1">
                <div class="BNeawe mAdjQc uEec3 AP7Wnd">&gt;</div>
            </a>
        </div>
    </div>
    <div class="x54gtf"></div>
    <div class="X7NTVe">
        <a class="tHmfQe" href="/link2">
            <div class="am3QBf">
                <div>
                    <span>
                        <div class="BNeawe deIvCb AP7Wnd">
                            <span dir="rtl">My Second Text</span>
                        </div>
                    </span>
                </div>
            </div>
        </a>
        <div class="HBTM6d XS7yGd">
            <a href="/anotherLink2">
                <div class="BNeawe mAdjQc uEec3 AP7Wnd">&gt;</div>
            </a>
        </div>
    </div>
    <div class="x54gtf"></div>`

现在我想从数组中获取文本。在一个例子中,它必须 return My First TextMy Second Text 。我该怎么做?

注意:我想在javascript.

中的nodejs注释中做

方法#1

用正则表达式替换所有标签 /<[^>]*>/g

方法#2

jsdom解析html,通过js文档api.

访问html节点

method#2 is much more flexible.

与 cheerio:

let $ = cheerio.load(html)
let strings = $('div[class="BNeawe deIvCb AP7Wnd"]>span[dir]')
              .get().map(span => $(span).text())