'pool_name' 在 CREATE TABLE 语句中是什么意思?

What does 'pool_name' mean in CREATE TABLE-statement?

CREATE TABLE-statement 末尾的 Impala 中,您可以按照我的理解设置复制因子:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
    ...
    [CACHED IN 'pool_name' [WITH REPLICATION = integer] | UNCACHED]

总之,我有点不明白pool_name指的是什么。这是HDFS中存放数据的路径吗?

不完全是,它实际上指的是使用hdfs cacheadmin -addPool...命令定义的HDFS池,参见hdfs command guide。反过来,一个池确实包含一堆 缓存指令 ,它们引用要缓存的 hdfs 路径。来自 apache 文档:

A cache pool is an administrative entity used to manage groups of cache directives. Cache pools have UNIX-like permissions, which restrict which users and groups have access to the pool. Write permissions allow users to add and remove cache directives to the pool. Read permissions allow users to list the cache directives in a pool, as well as additional metadata. Execute permissions are unused.

Cache pools are also used for resource management. Pools can enforce a maximum limit, which restricts the number of bytes that can be cached in aggregate by directives in the pool. Normally, the sum of the pool limits will approximately equal the amount of aggregate memory reserved for HDFS caching on the cluster. Cache pools also track a number of statistics to help cluster users determine what is and should be cached.

Pools also can enforce a maximum time-to-live. This restricts the maximum expiration time of directives being added to the pool.

有关如何在 Impala 中使用此 HDFS 功能的详细信息,请参阅 Impala Guide