获取节点值以及 parent 属性
Get node values along with parent attribute
我有一个 XML-file 形状如下:
<dataDscr>
<var ID="V335" name="question1" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 1 label
</labl>
<qstn>
<qstnLit>
question 1 literal question
</qstnLit>
<ivuInstr>
question 1 interviewer instructions
</ivuInstr>
</qstn>
</var>
<var ID="V335" name="question2" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 2 label
</labl>
<qstn>
<preQTxt>
question 2 pre question text
</preQTxt>
<qstnLit>
question 2 literal question
</qstnLit>
<ivuInstr>
question 2 interviewer instructions
</ivuInstr>
</qstn>
</var>
<var ID="V335" name="question3" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 3 label
</labl>
<qstn>
<preQTxt>
question 3 pre question text
</preQTxt>
<qstnLit>
question 3 literal question
</qstnLit>
</qstn>
</var>
</dataDscr>
我想收集所有 <qstn>
children 的值,以及 parent 标签 <var>
中的 name
属性(即 "question1").请注意 <qstn>
有不同数量的 children。比如有question1
两个children,即<qstnLit>
和<ivuInstr>
。 question2
拥有 children <qstn>
所能拥有的一切。
我希望最终结果如下所示:
# name | preQTxt | qstnLit | ivuInstr
# ------------------------------------------
# question1 |... |... |...
# question2 |... |... |...
# question3 |... |... |...
谢谢!
这应该适用于您的情况:
library(tidyverse)
library(xml2)
doc <- read_xml( "data.xml" )
# get all var elements
vars <- xml_find_all( doc, "//var" )
# extract from each "var" element the children of the "qstn" elements,
# then take the tag names and the enclosed text and put each in a column
df_long <- do.call( rbind, lapply(vars,
function(x) {
lbl <- xml_attr( x, "name" )
tags <- xml_find_all( x, "qstn/*" )
data.frame( name = lbl,
col = xml_name(tags),
txt = trimws(xml_text(tags)) )
}) )
# spread the data frame to wide format
df <- df_long %>% pivot_wider( name, names_from = col, values_from = txt )
输出:
# A tibble: 3 x 4
name qstnLit ivuInstr preQTxt
<chr> <chr> <chr> <chr>
1 question1 question 1 literal question question 1 interviewer instructions NA
2 question2 question 2 literal question question 2 interviewer instructions question 2 pre question text
3 question3 question 3 literal question NA question 3 pre question text
此处,pivot_wider
处理不同数量的列,将 NA
放在 var
元素不存在的元素处。
我有一个 XML-file 形状如下:
<dataDscr>
<var ID="V335" name="question1" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 1 label
</labl>
<qstn>
<qstnLit>
question 1 literal question
</qstnLit>
<ivuInstr>
question 1 interviewer instructions
</ivuInstr>
</qstn>
</var>
<var ID="V335" name="question2" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 2 label
</labl>
<qstn>
<preQTxt>
question 2 pre question text
</preQTxt>
<qstnLit>
question 2 literal question
</qstnLit>
<ivuInstr>
question 2 interviewer instructions
</ivuInstr>
</qstn>
</var>
<var ID="V335" name="question3" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 3 label
</labl>
<qstn>
<preQTxt>
question 3 pre question text
</preQTxt>
<qstnLit>
question 3 literal question
</qstnLit>
</qstn>
</var>
</dataDscr>
我想收集所有 <qstn>
children 的值,以及 parent 标签 <var>
中的 name
属性(即 "question1").请注意 <qstn>
有不同数量的 children。比如有question1
两个children,即<qstnLit>
和<ivuInstr>
。 question2
拥有 children <qstn>
所能拥有的一切。
我希望最终结果如下所示:
# name | preQTxt | qstnLit | ivuInstr
# ------------------------------------------
# question1 |... |... |...
# question2 |... |... |...
# question3 |... |... |...
谢谢!
这应该适用于您的情况:
library(tidyverse)
library(xml2)
doc <- read_xml( "data.xml" )
# get all var elements
vars <- xml_find_all( doc, "//var" )
# extract from each "var" element the children of the "qstn" elements,
# then take the tag names and the enclosed text and put each in a column
df_long <- do.call( rbind, lapply(vars,
function(x) {
lbl <- xml_attr( x, "name" )
tags <- xml_find_all( x, "qstn/*" )
data.frame( name = lbl,
col = xml_name(tags),
txt = trimws(xml_text(tags)) )
}) )
# spread the data frame to wide format
df <- df_long %>% pivot_wider( name, names_from = col, values_from = txt )
输出:
# A tibble: 3 x 4
name qstnLit ivuInstr preQTxt
<chr> <chr> <chr> <chr>
1 question1 question 1 literal question question 1 interviewer instructions NA
2 question2 question 2 literal question question 2 interviewer instructions question 2 pre question text
3 question3 question 3 literal question NA question 3 pre question text
此处,pivot_wider
处理不同数量的列,将 NA
放在 var
元素不存在的元素处。