org.apache.hadoop.hive.ql.metadata.HiveException:处理行时 Hive 运行时错误 {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}

org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}

我正在尝试将 xml 格式的数据加载到配置单元 table:-

我的 XML 文件看起来像这样-

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

首先,我将 xml 数据加载到托管 table 中,然后我使用 xpath UDF 函数解析 XML 数据并在我的主 table.Following 是我正在尝试的配置单元查询:-

create table XmlSample(xmlData string);


load data inpath 'EmployeeDetails.xml' into table XmlSample;

create table xpath_table(id int,genre string,price string);

Insert overwrite table xpath_table select xpath_int(xmlData, '/catalog/book/id/text()'), xpath_string(xmlData, '/catalog/book/genre/text()'), xpath_string(xmlData, '/catalog/book/price/text()') from XmlSample;

但是我遇到了异常-

    java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
    at org.apache.hadoop.mapred.Child.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
    ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public int org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(java.lang.String,java.lang.String)  on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger@37fd3f of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger with arguments {<?xml version="1.0" encoding="UTF-8"?>:java.lang.String, /catalog/book/id/text():java.lang.String} of size 2
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1030)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
    at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
    at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
    ... 9 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1006)
    ... 20 more
Caused by: java.lang.RuntimeException: Invalid expression '/catalog/book/id/text()'
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNumber(UDFXPathUtil.java:87)
    at org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(UDFXPathInteger.java:35)

有人可以告诉我如何避免这些异常。

试试这个:

1.将每条记录合并为一行(删除 catalog 标记):

cat EmployeeDetails.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</book>|</book>\n|g' | sed 's/<catalog>//g' | grep -v '^\s*$' | sed '3d' > EmployeeDetails1.xml

2。创建一个目录,将转换后的xml文件复制到HDFS中:

hadoop fs -mkdir /usr/xml/

hadoop fs -put EmployeeDetails1.xml /usr/xml/EmployeeDetails.xml

3。创建 table 以在配置单元中加载 xml:

create table XmlSample(xmldata string);

4。将 HDFS 中的 xml 文件加载到 hive xml table:

load data inpath '/usr/xml/EmployeeDetails.xml' into table XmlSample;

5.在 hive 中创建一个 table 以从 xml table 中提取数据:

create table xpath_table(id int,genre string,price string);

6.将从 xml table 中提取的数据插入配置单元中的 table:

insert overwrite table xpath_table select xpath_int(xmldata,'book/id'), xpath_string(xmldata,'book/genre'), xpath_string(xmldata,'book/price') from XmlSample;

Note: I have just added step 1 and modified your step 6 format. These steps worked for me. Good luck :)