Hbase、区域服务器、存储文件大小、索引

Question

您是否对 Hbase 中的索引 table 使用压缩？如果是，您使用什么类型的压缩？

我注意到我的索引 table 的大小每天都很大，而且每天都在增长...添加新存储后，大小甚至更大。

我有例如 table A，大小为 108.3 G

在 /apps/hbase/data/data/default 中，索引 table 大小为 380.0 G，

并且在 /apps/hbase/data/archive/data/default 中，索引 table 大小为 1.2 T

你能告诉我如何处理索引 tables 的大小吗？

为什么HDFS上归档的数据这么大？ /apps/hbase/data/archive/data/default

是否可以通过某种方式管理 HDFS 上存档目录的大小？存档占用了我 HDFS 的 2/3 以上 space.

我也注意到，我三个table有一百多个'split regions'，其他table没有'split regions'。你知道可能是什么原因吗？

Answer 1

是的，我像这样使用 snappy...

 create 't1', { NAME => 'cf1', COMPRESSION => 'SNAPPY' }

Compression support Check

Use CompressionTest to verify snappy support is enabled and the libs can be loaded ON ALL NODES of your cluster:

$ hbase org.apache.hadoop.hbase.util.CompressionTest hdfs://host/path/to/hbase snappy

对于您上面的大多数问题..压缩会有所帮助。另请参阅

I have notice also, that I have on three tables more than a hundred 'split regions', other tables do not have 'split regions'. Do you know what could be the reason?

确保在有限数字之间预先拆分 table，例如 0-9。
运行在 table 个区域上压缩。

Answer 2

我在舞台环境中发现，/apps/hbase/data/archive/ 中数据量大的原因是由来自 cron 的运行的日常 hbase 快照引起的。

所以，现在我将重写脚本，只保留一两个table快照。

Hbase、区域服务器、存储文件大小、索引

Hbase, Region Servers, Storefile Size, Indexes

compression

indexing

hadoop

hbase

hortonworks-data-platform

Compression support Check