由于 CSS,Puppeteer returns 一个大写的 innerText 值

Puppeteer returns an innerText value in uppercase because of CSS

使用正确的选择器、评估函数和 innerText 属性,我正在尝试提取 div 的内容,例如:

<div class="abc">Interesting stuff</div>

但是 css class 将内容转换为大写:有趣的东西

innerText 属性 returns 大写而不是 "original" 文本是否正常?有没有办法获取此 "original" 文本?

您可以使用以下属性来实现:

  • innerHTML 将内容解析为 HTML,因此需要更长的时间。
  • textContent 使用纯文本,不解析 HTML,速度更快。

示例:

内HTML:

const text = await page.$eval('.abc', elem => elem.innerHTML); // returns 'Interesting stuff'

文本内容:

const text = await page.$eval('.abc', elem => elem.textContent); // returns 'Interesting stuff'

来自 API docs:

The innerHTML returns HTML or XML fragment is generated based on the current contents of the element, so the markup and formatting of the returned fragment is likely not to match the original page markup.

The textContent returns every element in the node. In contrast, innerText is aware of styling and won’t return the text of “hidden” elements. Moreover, since innerText takes CSS styles into account, reading the value of innerText triggers a reflow to ensure up-to-date computed styles. (Reflows can be computationally expensive, and thus should be avoided when possible.)