在 hbase 中有效地发送许多 get 请求

Question

我正在尝试在 Java 中创建一个通用方法来查询 hbase。

我目前写了一篇接受 3 个参数的文章

ARange（扫描table）
A Column（待归还）...和
一个Condition（即browser==Chrome）

所以语句（如果用 SQLish 语言编写）可能看起来像

SELECT OS FROM TABLE WHERE BROWSER==CHROME IN RANGE (5 WEEKS AGO -> 2 WEEKS AGO)

现在，我知道我没有正确使用 HBase（对 rowkey 等使用常见的列查询）但是为了实验，我想尝试一下，以帮助我学习。

所以我做的第一件事就是在 Scan 上设置一个 Range。（5 周到 2 周前），因为 rowkey 是 timestamp，所以效率很高。

然后我设置了一个SingleColumnValueFilter(browser = Chrome)（在range filter之后，这个还是蛮快的）

然后我将所有 rowkeys（来自扫描）存储到 array。

对于每个 rowkey（在数组中）我执行一个 GET 操作来获得相应的 OS.

我试过使用 MultiGet，这大大加快了处理速度。

然后我尝试使用正常的 GET 请求，每个请求生成一个新线程，所有运行并发，这将查询时间减半！但还是不够快。

我考虑过限制使用单个数据库连接的线程数。即 - 每个连接 100 个线程。

鉴于我的情况，执行这些 GET 的最有效方法是什么，还是我完全错误地处理了它？

非常感谢任何帮助。

编辑（这是我的线程 GET 尝试）

List<String> newresults = Collections.synchronizedList(new ArrayList<String>());

for (String rowkey : result) {
    spawnGetThread(rowkey, colname);
}

public void spawnGetThread(String rk, String cn) {
    new Thread(new Runnable() {
        public void run() {

            String rt = "";
            Get get = new Get(Bytes.toBytes(rk));
            get.addColumn(COL_FAM, cn);
            try {
                Result getResult = tb.get(get);
                rt = (Bytes.toString(getResult.value()));
            } catch (IOException e) {
            }
            newresults.add(rt);
        }
    }).start();
}

Answer 1

Given my circumstances, what is the most efficient way to perform these GETs, or am I totally approaching it incorrectly?

我会建议以下方式

如果您知道可以预先访问哪些行键，Get 就很好。

在这种情况下，您可以使用如下方法，它将 return 结果数组。

/**
     * Method getDetailRecords.
     * 
     * @param listOfRowKeys List<String>
     * @return Result[]
     * @throws IOException
     */
    private Result[] getDetailRecords(final List<String> listOfRowKeys) throws IOException {
        final HTableInterface table = HBaseConnection.getHTable(TBL_DETAIL);
        final List<Get> listOFGets = new ArrayList<Get>();
        Result[] results = null;
        try {
            for (final String rowkey : listOfRowKeys) {// prepare batch of get with row keys
   // System.err.println("get 'yourtablename', '" + saltIndexPrefix + rowkey + "'");
                final Get get = new Get(Bytes.toBytes(saltedRowKey(rowkey)));
                get.addColumn(COLUMN_FAMILY, Bytes.toBytes(yourcolumnname));
                listOFGets.add(get);
            }
            results = table.get(listOFGets);

        } finally {
            table.close();
        }
        return results;
    }

补充说明：行筛选器总是比列值筛选器快（后者执行完整 table 扫描）..

建议阅读 hbase-the-definitive 指南 -->Client API: Advanced Features

在 hbase 中有效地发送许多 get 请求

Sending many get requests efficiently in hbase

java

multithreading

hadoop

hbase