org.apache.hadoop.hive.ql.metadata.HiveException:处理行时 Hive 运行时错误 {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
我正在尝试将 xml 格式的数据加载到配置单元 table:-
我的 XML 文件看起来像这样-
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
<id>11</id>
<genre>Computer</genre>
<price>44</price>
</book>
<book>
<id>44</id>
<genre>Fantasy</genre>
<price>5</price>
</book>
</catalog>
首先,我将 xml 数据加载到托管 table 中,然后我使用 xpath UDF 函数解析 XML 数据并在我的主 table.Following 是我正在尝试的配置单元查询:-
create table XmlSample(xmlData string);
load data inpath 'EmployeeDetails.xml' into table XmlSample;
create table xpath_table(id int,genre string,price string);
Insert overwrite table xpath_table select xpath_int(xmlData, '/catalog/book/id/text()'), xpath_string(xmlData, '/catalog/book/genre/text()'), xpath_string(xmlData, '/catalog/book/price/text()') from XmlSample;
但是我遇到了异常-
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public int org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger@37fd3f of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger with arguments {<?xml version="1.0" encoding="UTF-8"?>:java.lang.String, /catalog/book/id/text():java.lang.String} of size 2
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1030)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1006)
... 20 more
Caused by: java.lang.RuntimeException: Invalid expression '/catalog/book/id/text()'
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNumber(UDFXPathUtil.java:87)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(UDFXPathInteger.java:35)
有人可以告诉我如何避免这些异常。
试试这个:
1.将每条记录合并为一行(删除 catalog
标记):
cat EmployeeDetails.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</book>|</book>\n|g' | sed 's/<catalog>//g' | grep -v '^\s*$' | sed '3d' > EmployeeDetails1.xml
2。创建一个目录,将转换后的xml文件复制到HDFS中:
hadoop fs -mkdir /usr/xml/
hadoop fs -put EmployeeDetails1.xml /usr/xml/EmployeeDetails.xml
3。创建 table 以在配置单元中加载 xml:
create table XmlSample(xmldata string);
4。将 HDFS 中的 xml 文件加载到 hive xml table:
load data inpath '/usr/xml/EmployeeDetails.xml' into table XmlSample;
5.在 hive 中创建一个 table 以从 xml table 中提取数据:
create table xpath_table(id int,genre string,price string);
6.将从 xml table 中提取的数据插入配置单元中的 table:
insert overwrite table xpath_table select xpath_int(xmldata,'book/id'), xpath_string(xmldata,'book/genre'), xpath_string(xmldata,'book/price') from XmlSample;
Note: I have just added step 1 and modified your step 6 format. These
steps worked for me. Good luck :)
我正在尝试将 xml 格式的数据加载到配置单元 table:-
我的 XML 文件看起来像这样-
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
<id>11</id>
<genre>Computer</genre>
<price>44</price>
</book>
<book>
<id>44</id>
<genre>Fantasy</genre>
<price>5</price>
</book>
</catalog>
首先,我将 xml 数据加载到托管 table 中,然后我使用 xpath UDF 函数解析 XML 数据并在我的主 table.Following 是我正在尝试的配置单元查询:-
create table XmlSample(xmlData string);
load data inpath 'EmployeeDetails.xml' into table XmlSample;
create table xpath_table(id int,genre string,price string);
Insert overwrite table xpath_table select xpath_int(xmlData, '/catalog/book/id/text()'), xpath_string(xmlData, '/catalog/book/genre/text()'), xpath_string(xmlData, '/catalog/book/price/text()') from XmlSample;
但是我遇到了异常-
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.Child.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:544)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public int org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(java.lang.String,java.lang.String) on object org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger@37fd3f of class org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger with arguments {<?xml version="1.0" encoding="UTF-8"?>:java.lang.String, /catalog/book/id/text():java.lang.String} of size 2
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1030)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:80)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1006)
... 20 more
Caused by: java.lang.RuntimeException: Invalid expression '/catalog/book/id/text()'
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:74)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.evalNumber(UDFXPathUtil.java:87)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathInteger.evaluate(UDFXPathInteger.java:35)
有人可以告诉我如何避免这些异常。
试试这个:
1.将每条记录合并为一行(删除 catalog
标记):
cat EmployeeDetails.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed 's|</book>|</book>\n|g' | sed 's/<catalog>//g' | grep -v '^\s*$' | sed '3d' > EmployeeDetails1.xml
2。创建一个目录,将转换后的xml文件复制到HDFS中:
hadoop fs -mkdir /usr/xml/
hadoop fs -put EmployeeDetails1.xml /usr/xml/EmployeeDetails.xml
3。创建 table 以在配置单元中加载 xml:
create table XmlSample(xmldata string);
4。将 HDFS 中的 xml 文件加载到 hive xml table:
load data inpath '/usr/xml/EmployeeDetails.xml' into table XmlSample;
5.在 hive 中创建一个 table 以从 xml table 中提取数据:
create table xpath_table(id int,genre string,price string);
6.将从 xml table 中提取的数据插入配置单元中的 table:
insert overwrite table xpath_table select xpath_int(xmldata,'book/id'), xpath_string(xmldata,'book/genre'), xpath_string(xmldata,'book/price') from XmlSample;
Note: I have just added step 1 and modified your step 6 format. These steps worked for me. Good luck :)