rvest - 从 HTML 字符串中删除标签及其内容

rvest - remove tags and its content from HTML string

假设我有以下文字:

x <- "<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code>. Furthermore, the results from <code>testthat</code> should be saved in the JUnit XML format and the results from <code>covr</code> should be saved in the Cobertura format.</p>\n\n<p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>\n\n<pre><code>options(\"testthat.output_file\" = \"test-results.xml\")\ndevtools::test(reporter = testthat::JunitReporter$new())\n\ncov &lt;- covr::package_coverage()\ncovr::to_cobertura(cov, \"coverage.xml\")\n</code></pre>\n\n<p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::package_coverage</code>. </p>\n\n<p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test-results.xml</code>.</p>\n\n<p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a single execution of the test suite.</p>\n" 

**问题:** 我需要删除所有 <code></code> 标签 及其内容 ,无论它们是单独存在还是在另一个标签内。


我试过了:

我尝试了以下方法,但如您所见,标签仍然存在:

content <- xml2::read_html(x) %>%
    rvest::html_nodes(css = ":not(code)")
print(content)

但是我得到的结果如下,标签还在:

{xml_nodeset (8)}
[1] <body>\n<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>cov ...
[2] <p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code> ...
[3] <p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>
[4] <pre><code>options("testthat.output_file" = "test-results.xml")\ndevtools::test(reporter = testthat::JunitReporter$new ...
[5] <p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::pac ...
[6] <em>twice</em>
[7] <p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test ...
[8] <p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a sin ...

解决方案如下:

  content <- xml2::read_html(x)

  toRemove <- content %>% rvest::html_nodes(css = "code")
  xml_remove(toRemove)

在那之后,content 没有 code 标签,也没有它的内容,也没有作为字符串进行操作。