rvest - 从 HTML 字符串中删除标签及其内容
rvest - remove tags and its content from HTML string
假设我有以下文字:
x <- "<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code>. Furthermore, the results from <code>testthat</code> should be saved in the JUnit XML format and the results from <code>covr</code> should be saved in the Cobertura format.</p>\n\n<p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>\n\n<pre><code>options(\"testthat.output_file\" = \"test-results.xml\")\ndevtools::test(reporter = testthat::JunitReporter$new())\n\ncov <- covr::package_coverage()\ncovr::to_cobertura(cov, \"coverage.xml\")\n</code></pre>\n\n<p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::package_coverage</code>. </p>\n\n<p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test-results.xml</code>.</p>\n\n<p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a single execution of the test suite.</p>\n"
**问题:**
我需要删除所有 <code></code>
标签 及其内容 ,无论它们是单独存在还是在另一个标签内。
我试过了:
我尝试了以下方法,但如您所见,标签仍然存在:
content <- xml2::read_html(x) %>%
rvest::html_nodes(css = ":not(code)")
print(content)
但是我得到的结果如下,标签还在:
{xml_nodeset (8)}
[1] <body>\n<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>cov ...
[2] <p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code> ...
[3] <p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>
[4] <pre><code>options("testthat.output_file" = "test-results.xml")\ndevtools::test(reporter = testthat::JunitReporter$new ...
[5] <p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::pac ...
[6] <em>twice</em>
[7] <p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test ...
[8] <p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a sin ...
解决方案如下:
content <- xml2::read_html(x)
toRemove <- content %>% rvest::html_nodes(css = "code")
xml_remove(toRemove)
在那之后,content
没有 code
标签,也没有它的内容,也没有作为字符串进行操作。
假设我有以下文字:
x <- "<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code>. Furthermore, the results from <code>testthat</code> should be saved in the JUnit XML format and the results from <code>covr</code> should be saved in the Cobertura format.</p>\n\n<p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>\n\n<pre><code>options(\"testthat.output_file\" = \"test-results.xml\")\ndevtools::test(reporter = testthat::JunitReporter$new())\n\ncov <- covr::package_coverage()\ncovr::to_cobertura(cov, \"coverage.xml\")\n</code></pre>\n\n<p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::package_coverage</code>. </p>\n\n<p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test-results.xml</code>.</p>\n\n<p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a single execution of the test suite.</p>\n"
**问题:**
我需要删除所有 <code></code>
标签 及其内容 ,无论它们是单独存在还是在另一个标签内。
我试过了:
我尝试了以下方法,但如您所见,标签仍然存在:
content <- xml2::read_html(x) %>%
rvest::html_nodes(css = ":not(code)")
print(content)
但是我得到的结果如下,标签还在:
{xml_nodeset (8)}
[1] <body>\n<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>cov ...
[2] <p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code> ...
[3] <p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>
[4] <pre><code>options("testthat.output_file" = "test-results.xml")\ndevtools::test(reporter = testthat::JunitReporter$new ...
[5] <p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::pac ...
[6] <em>twice</em>
[7] <p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test ...
[8] <p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a sin ...
解决方案如下:
content <- xml2::read_html(x)
toRemove <- content %>% rvest::html_nodes(css = "code")
xml_remove(toRemove)
在那之后,content
没有 code
标签,也没有它的内容,也没有作为字符串进行操作。