如何从 redshift 表中仅卸载 1000 行

Question

我正在开发级别创建生产 redshift 数据库的副本。我知道如何将数据从我的生产 instance/cluster 卸载到 s3，然后将该数据复制到我的开发 instance/cluster，但前提是我一次卸载所有数据。相反，我想做的是从我的每个 table 中仅复制 1000 行左右，以减少 space 和我的 redshift 实例之间的传输时间。

例如

UNLOAD ('SELECT * FROM myschema.mytable LIMIT 1000') TO 's3://my-bucket' CREDENTIALS etcetcetc

有没有办法用 UNLOAD 做到这一点 LIMIT，或者我是否必须切换到批量插入式范例？

编辑： 我正在以编程方式卸载和复制一堆 table，所以我不想在任何基于键的限制中进行硬编码，以防万一我们添加新的 table 或更改 table 结构等

Answer 1

虽然 "LIMIT" 不是实际 "UNLOAD" 命令的一部分，但 Redshift documentation on UNLOAD provides a few alternatives:

Limit Clause

The SELECT query cannot use a LIMIT clause in the outer SELECT. For example, the following UNLOAD statement will fail:
unload ('select * from venue limit 10') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'; 
Instead, use a nested LIMIT clause. For example:
unload ('select * from venue where venueid in 
(select venueid from venue order by venueid desc limit 10)') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';
Alternatively, you could populate a table using SELECT…INTO or CREATE TABLE AS using a LIMIT clause, then unload from that table.

Answer 2

如果您的 table 是使用分配方式创建的（ 除了 "All" 分配方式 ），则不需要限制概念。

假设您创建了 table 分配样式“even”（这是默认分配样式）并且您有 4 个不同的子节点，那么当卸载时，Amazon S3 中每个 table 将生成总共 4 个文件。

Answer 3

当您从 Redshift table 卸载时，可以通过在 LIMIT 查询上添加 SELECT * FROM () 来添加 LIMIT 子句。

您的案例示例：

UNLOAD ('SELECT * FROM myschema.mytable LIMIT 1000') 
TO 's3://my-bucket' CREDENTIALS etcetcetc

变成

UNLOAD ('SELECT * FROM 
                       (SELECT * FROM myschema.mytable LIMIT 1000)'
       ) 
TO 's3://my-bucket' CREDENTIALS etcetcetc

如何从 redshift 表中仅卸载 1000 行

how to unload only 1000 rows from redshift tables

amazon-s3

amazon-redshift