将 s3 的内容写入 CSV
Writing contents of s3 to CSV
我正在创建一个脚本,用于将我的 s3 数据抓取到我的本地机器上。通常我收到的数据是配置单元分区的数据。我收到 No such file or directory
错误,即使该文件确实存在。有人可以解释我做错了什么以及我应该如何以不同的方式处理这个问题吗?这是错误引用的代码段:
bucket = conn.get_bucket(bucket_name)
for sub in bucket.list(prefix = 'some_prefix'):
matched = re.search(re.compile(read_key_pattern), sub.name)
if matched:
with open(sub.name, 'rb') as fin:
reader = csv.reader(fin, delimiter = '\x01')
contents = [line for line in reader]
with open('output.csv', 'wb') as fout:
writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\')
writer.writerows.content
IOError: [Errno 2] No such file or directory: 'my_prefix/54c91e35-4dd0-4da6-a7b7-283dff0f4483-000000'
该文件存在,这是我要检索的正确文件夹和文件。
就像@roganjosh 说的,看起来你在测试名称匹配后还没有 downloaded the file。我在下面添加了评论,向您展示如何在 python 2:
中处理文件 in-memory
from io import StringIO # alternatively use BytesIO
import contextlib
bucket = conn.get_bucket(bucket_name)
# use re.compile outside of the for loop
# it has slightly better performance characteristics
matcher = re.compile(read_key_pattern)
for sub in bucket.list(prefix = 'some_prefix'):
# bucket.list returns an iterator over s3.Key objects
# so we can use `sub` directly as the Key object
matched = matcher.search(sub.name)
if matched:
<b># download the file to an in-memory buffer
with contextlib.closing(StringIO()) as fp:
sub.get_contents_to_file(fp)
fp.seek(0)</b>
# read straight from the memory buffer
reader = csv.reader(fp, delimiter = '\x01')
contents = [line for line in reader]
with open('output.csv', 'wb') as fout:
writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\')
writer.writerows.content
对于 python 3,您需要按照答案 for this question.
的评论中讨论的那样更改 with 语句
我正在创建一个脚本,用于将我的 s3 数据抓取到我的本地机器上。通常我收到的数据是配置单元分区的数据。我收到 No such file or directory
错误,即使该文件确实存在。有人可以解释我做错了什么以及我应该如何以不同的方式处理这个问题吗?这是错误引用的代码段:
bucket = conn.get_bucket(bucket_name)
for sub in bucket.list(prefix = 'some_prefix'):
matched = re.search(re.compile(read_key_pattern), sub.name)
if matched:
with open(sub.name, 'rb') as fin:
reader = csv.reader(fin, delimiter = '\x01')
contents = [line for line in reader]
with open('output.csv', 'wb') as fout:
writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\')
writer.writerows.content
IOError: [Errno 2] No such file or directory: 'my_prefix/54c91e35-4dd0-4da6-a7b7-283dff0f4483-000000'
该文件存在,这是我要检索的正确文件夹和文件。
就像@roganjosh 说的,看起来你在测试名称匹配后还没有 downloaded the file。我在下面添加了评论,向您展示如何在 python 2:
中处理文件 in-memory from io import StringIO # alternatively use BytesIO
import contextlib
bucket = conn.get_bucket(bucket_name)
# use re.compile outside of the for loop
# it has slightly better performance characteristics
matcher = re.compile(read_key_pattern)
for sub in bucket.list(prefix = 'some_prefix'):
# bucket.list returns an iterator over s3.Key objects
# so we can use `sub` directly as the Key object
matched = matcher.search(sub.name)
if matched:
<b># download the file to an in-memory buffer
with contextlib.closing(StringIO()) as fp:
sub.get_contents_to_file(fp)
fp.seek(0)</b>
# read straight from the memory buffer
reader = csv.reader(fp, delimiter = '\x01')
contents = [line for line in reader]
with open('output.csv', 'wb') as fout:
writer = csv.writer(fout, quotechar = '', quoting = csv.QUOTE_NONE, escapechar = '\')
writer.writerows.content
对于 python 3,您需要按照答案 for this question.
的评论中讨论的那样更改 with 语句