Hbase:使用JSON同时放置一行的多个版本
Hbase: put multiple versions of a row at the same time using JSON
从 Cloudera Hbase REST API docs 这是 XML 结构同时 PUT
多行。
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<CellSet>
<Row key="cm93NQo=">
<Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
<Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
</Row>
<Row key="cm93NQo=">
<Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
</Row>
</CellSet>
问:如何使用 JSON?
到目前为止我尝试过的:
- 使用
CellSet
键,出现如下错误:
Error 500 Unrecognized field "CellSet" (Class org.apache.hadoop.hbase.rest.model.CellSetModel), not marked as ignorable
{
"CellSet": {
"Row": [
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
},
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
},
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
}
]
}
}
- 没有
CellSet
键,没有错误并且每行只有一个版本:
{
"Row": [
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
},
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
},
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
}
]
}
当然,如果一行的多个版本具有相同的时间戳,则不能插入它们。在您的示例中,数据仅使用行键和列标识。我没有使用 Cloudera,也从未使用过 HBase REST api,但根据 source code on github,CellModel 允许设置单元格时间戳。所以我建议将其添加到您的请求中:
"Row": [
{
"key": "myRowKey",
"Cell": [
{
"column": "myColumn",
"$": "value1",
"timestamp" : 1473379200
},
{
"column": "myColumn",
"$": "value2",
"timestamp" : 1470000000
}
]
}
此外,在您的示例中有两行具有相同的键,请检查数据是否正确
从 Cloudera Hbase REST API docs 这是 XML 结构同时 PUT
多行。
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<CellSet>
<Row key="cm93NQo=">
<Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
<Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
</Row>
<Row key="cm93NQo=">
<Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell>
</Row>
</CellSet>
问:如何使用 JSON?
到目前为止我尝试过的:
- 使用
CellSet
键,出现如下错误:
Error 500 Unrecognized field "CellSet" (Class org.apache.hadoop.hbase.rest.model.CellSetModel), not marked as ignorable
{
"CellSet": {
"Row": [
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
},
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
},
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
}
]
}
}
- 没有
CellSet
键,没有错误并且每行只有一个版本:
{
"Row": [
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
},
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
},
{
"key": "cm93NQo=",
"Cell": [
{
"column": "Y2Y6ZQo=",
"$": "dmFsdWU1Cg=="
}
]
}
]
}
当然,如果一行的多个版本具有相同的时间戳,则不能插入它们。在您的示例中,数据仅使用行键和列标识。我没有使用 Cloudera,也从未使用过 HBase REST api,但根据 source code on github,CellModel 允许设置单元格时间戳。所以我建议将其添加到您的请求中:
"Row": [
{
"key": "myRowKey",
"Cell": [
{
"column": "myColumn",
"$": "value1",
"timestamp" : 1473379200
},
{
"column": "myColumn",
"$": "value2",
"timestamp" : 1470000000
}
]
}
此外,在您的示例中有两行具有相同的键,请检查数据是否正确