使用 Puppeteer 从 span 中提取文本

Using Puppeteer to extract text from span

我正在使用 Puppeteer 通过它的 class 名称提取跨度的文本,但我没有得到任何返回。不知道是不是页面加载不及时

这是我当前的代码:

async function Reload() {
    Page.reload()

    Price = await Page.evaluate(() => document.getElementsByClassName("text-robux-lg wait-for-i18n-format-render"))
    console.log(Price)
}
Reload()

HTML

<div class="icon-text-wrapper clearfix icon-robux-price-container">
     <span class="icon-robux-16x16 wait-for-i18n-format-render"></span>
     <span class="text-robux-lg wait-for-i18n-format-render">689</span>
</div>

因为您传递给 Page.evaluate() returns 的函数是一个不可序列化的值。

来自 puppeteer official document

If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined

所以你必须使传递给 Page.evaluate() returns span 元素的文本而不是 returns span 的 Element 对象的函数。

喜欢下面的代码

const puppeteer = require('puppeteer');

const htmlCode = `
  <div class="icon-text-wrapper clearfix icon-robux-price-container">
     <span class="icon-robux-16x16 wait-for-i18n-format-render"></span>
     <span class="text-robux-lg wait-for-i18n-format-render">689</span>
  </div>
`;

(async () => {
  const browser = await puppeteer.launch();

  const page = await browser.newPage();
  await page.setContent(htmlCode);

  const price = await page.evaluate(() => {
    const elements = document.getElementsByClassName('text-robux-lg wait-for-i18n-format-render');
    return Array.from(elements).map(element => element.innerText); // as you see, now this function returns array of texts instead of Array of elements
  })

  console.log(price); // this will log the text of all elements that have the specific class above
  console.log(price[0]); // this will log the first element that have the specific class above

  // other actions...
  await browser.close();
})();

NOTE: if you want to get the html code from another site by its url use page.goto() instead of page.setContent()

NOTE: because you are using document.getElementsByClassName() the returned value of the function that passed to page.evaluate() in the code above will be array of texts and not text as document.getElementById() do

NOTE: if you want to know what is the difference between Serializable objects and non-serializable objects read the answers of this question on Whosebug