S3 Select CSV Headers
S3 Select CSV Headers
我正在使用 S3 Select 从 S3 存储桶读取 csv 文件并输出为 CSV。在输出中我只看到行,但看不到 headers。如何获得包含 headers 的输出。
import boto3
s3 = boto3.client('s3')
r = s3.select_object_content(
Bucket='demo_bucket',
Key='demo.csv',
ExpressionType='SQL',
Expression="select * from s3object s",
InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
OutputSerialization={'CSV': {}},
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
CSV
Name, Age, Status
Rob, 25, Single
Sam, 26, Married
s3select 的输出
Rob, 25, Single
Sam, 26, Married
Amazon S3 Select 不会输出 headers.
在您的代码中,您可以只包含 print
命令以在循环结果之前输出 headers。
改变InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
TO InputSerialization={'CSV': {"FileHeaderInfo": "NONE"}},
然后,它将打印全部内容,包括header。
解释:
FileHeaderInfo
接受 "NONE|USE|IGNORE".
之一
使用 NONE
选项而不是 USE
,然后它也会打印 header,因为 NONE
告诉您还需要 header处理中。
希望对您有所帮助。
Red Boy 的解决方案不允许您在查询中使用列名,而是必须使用列索引。
这对我不利,所以我的解决方案是执行另一个查询以仅获取 headers 并将它们与实际查询结果连接起来。这是在 JavaScript 但同样适用于 Python:
const params = {
Bucket: bucket,
Key: "file.csv",
ExpressionType: 'SQL',
Expression: `select * from s3object s where s."date" >= '${fromDate}'`,
InputSerialization: {'CSV': {"FileHeaderInfo": "USE"}},
OutputSerialization: {'CSV': {}},
};
//s3 select doesn't return the headers, so need to run another query to only get the headers (see '{"FileHeaderInfo": "NONE"}')
const headerParams = {
Bucket: bucket,
Key: "file.csv",
ExpressionType: 'SQL',
Expression: "select * from s3object s limit 1", //this will only get the first record of the csv, and since we are not parsing headers, they will be included
InputSerialization: {'CSV': {"FileHeaderInfo": "NONE"}},
OutputSerialization: {'CSV': {}},
};
//concatenate header + data -- getObject is a method that handles the request
return await this.getObject(s3, headerParams) + await this.getObject(s3, params);
我正在使用 S3 Select 从 S3 存储桶读取 csv 文件并输出为 CSV。在输出中我只看到行,但看不到 headers。如何获得包含 headers 的输出。
import boto3
s3 = boto3.client('s3')
r = s3.select_object_content(
Bucket='demo_bucket',
Key='demo.csv',
ExpressionType='SQL',
Expression="select * from s3object s",
InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
OutputSerialization={'CSV': {}},
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
CSV
Name, Age, Status
Rob, 25, Single
Sam, 26, Married
s3select 的输出
Rob, 25, Single
Sam, 26, Married
Amazon S3 Select 不会输出 headers.
在您的代码中,您可以只包含 print
命令以在循环结果之前输出 headers。
改变InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
TO InputSerialization={'CSV': {"FileHeaderInfo": "NONE"}},
然后,它将打印全部内容,包括header。
解释:
FileHeaderInfo
接受 "NONE|USE|IGNORE".
使用 NONE
选项而不是 USE
,然后它也会打印 header,因为 NONE
告诉您还需要 header处理中。
希望对您有所帮助。
Red Boy 的解决方案不允许您在查询中使用列名,而是必须使用列索引。 这对我不利,所以我的解决方案是执行另一个查询以仅获取 headers 并将它们与实际查询结果连接起来。这是在 JavaScript 但同样适用于 Python:
const params = {
Bucket: bucket,
Key: "file.csv",
ExpressionType: 'SQL',
Expression: `select * from s3object s where s."date" >= '${fromDate}'`,
InputSerialization: {'CSV': {"FileHeaderInfo": "USE"}},
OutputSerialization: {'CSV': {}},
};
//s3 select doesn't return the headers, so need to run another query to only get the headers (see '{"FileHeaderInfo": "NONE"}')
const headerParams = {
Bucket: bucket,
Key: "file.csv",
ExpressionType: 'SQL',
Expression: "select * from s3object s limit 1", //this will only get the first record of the csv, and since we are not parsing headers, they will be included
InputSerialization: {'CSV': {"FileHeaderInfo": "NONE"}},
OutputSerialization: {'CSV': {}},
};
//concatenate header + data -- getObject is a method that handles the request
return await this.getObject(s3, headerParams) + await this.getObject(s3, params);