将 XML 解析为 R
Parse XML into R
我正在尝试将 XML
解析为 R,但出现此错误:
Entity `thinsp` not defined
我找到了 &thinsp
实体,但我不知道如何处理它。
我将衷心感谢您的帮助。我尝试了以下方法:
file1 <- xmlTreeParse("1496019.xml",useInternalNodes = TRUE)
file2 <- xmlParse("1496019.xml",useInternalNodes = TRUE)
请在下面找到示例代码
<!DOCTYPE om PUBLIC "" "sm.dtd"><servinfo>
<servinfosub>
<title>Circuit Description</title>
<ptxt>The commanded throttle position (TP) is compared to the actual TP.</ptxt>
</servinfosub>
<servinfosub>
<title>DTC Descriptor</title>
<ptxt>This diagnostic procedure supports the following DTC:</ptxt>
<ptxt>DTC P2101 Throttle Actuator Position Performance</ptxt>
</servinfosub>
<servinfosub>
<title>Diagnostic Aids</title>
<list1 type="unordered-bullet">
<item><ptxt>The throttle valve should be open approximately 20 percent. </ptxt></item>
<item><ptxt>If the throttle blade becomes stuck, DTC P1516 and/or P2119 will set. </ptxt></item>
<item>
<important><title>Important</title><ptxt> this function.</ptxt></important>
<ptxt>The scan tool has the ability to operate the throttle control system using Special Functions. </ptxt></item>
<item><ptxt>Inspect for the following conditions:</ptxt></item>
<list2 type="unordered-dash">
<item><ptxt>Use the <object-link object-id="8917"/> Connector Test Adapter Kit for any test that requires probing the PCM harness connector or a component harness connector.</ptxt></item>
<item><ptxt>Poor connections at the PCM or at the component—Inspect the harness connectors for a poor terminal to wire connection. Refer to <cell-link cell-id="62112"/> for the proper procedure.</ptxt></item>
<item><ptxt>For intermittents, refer to <cell-link cell-id="81512"/>.</ptxt></item>
</list2>
</list1>
</servinfosub>
</servinfo>
解决这个问题的一种方法是预处理文档并替换未知实体:
library(XML)
txt <- '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><entry>abc </entry>'
xml <- xmlParse(txt, asText = TRUE)
# Error: 1: Entity 'thinsp' not defined
txt <- gsub(" ", "", txt, fixed = TRUE)
(xml <- xmlParse(txt, asText = TRUE))
# <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
# <entry>abc</entry>
我正在尝试将 XML
解析为 R,但出现此错误:
Entity `thinsp` not defined
我找到了 &thinsp
实体,但我不知道如何处理它。
我将衷心感谢您的帮助。我尝试了以下方法:
file1 <- xmlTreeParse("1496019.xml",useInternalNodes = TRUE)
file2 <- xmlParse("1496019.xml",useInternalNodes = TRUE)
请在下面找到示例代码
<!DOCTYPE om PUBLIC "" "sm.dtd"><servinfo>
<servinfosub>
<title>Circuit Description</title>
<ptxt>The commanded throttle position (TP) is compared to the actual TP.</ptxt>
</servinfosub>
<servinfosub>
<title>DTC Descriptor</title>
<ptxt>This diagnostic procedure supports the following DTC:</ptxt>
<ptxt>DTC P2101 Throttle Actuator Position Performance</ptxt>
</servinfosub>
<servinfosub>
<title>Diagnostic Aids</title>
<list1 type="unordered-bullet">
<item><ptxt>The throttle valve should be open approximately 20 percent. </ptxt></item>
<item><ptxt>If the throttle blade becomes stuck, DTC P1516 and/or P2119 will set. </ptxt></item>
<item>
<important><title>Important</title><ptxt> this function.</ptxt></important>
<ptxt>The scan tool has the ability to operate the throttle control system using Special Functions. </ptxt></item>
<item><ptxt>Inspect for the following conditions:</ptxt></item>
<list2 type="unordered-dash">
<item><ptxt>Use the <object-link object-id="8917"/> Connector Test Adapter Kit for any test that requires probing the PCM harness connector or a component harness connector.</ptxt></item>
<item><ptxt>Poor connections at the PCM or at the component—Inspect the harness connectors for a poor terminal to wire connection. Refer to <cell-link cell-id="62112"/> for the proper procedure.</ptxt></item>
<item><ptxt>For intermittents, refer to <cell-link cell-id="81512"/>.</ptxt></item>
</list2>
</list1>
</servinfosub>
</servinfo>
解决这个问题的一种方法是预处理文档并替换未知实体:
library(XML)
txt <- '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><entry>abc </entry>'
xml <- xmlParse(txt, asText = TRUE)
# Error: 1: Entity 'thinsp' not defined
txt <- gsub(" ", "", txt, fixed = TRUE)
(xml <- xmlParse(txt, asText = TRUE))
# <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
# <entry>abc</entry>