Cloud Function 将 CSV 发送到 Cloud Storage

Cloud Function Sending CSV to Cloud Storage

我有一个云函数,用于从 API 调用创建 CSV,然后将该 CSV 发送到云存储。

这是我的代码:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Create a csv and load it to google cloud storage
    new_list = new_list.to_csv('/tmp/temp.csv')
    def upload_blob(bucket_name, source_file_name, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        blob.upload_from_file(source_file_name)

    message = "Data for CSV file"    # ERROR HERE
    csv = open(new_list, "w")
    csv.write(message)
    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)

我尝试将文件设为 /tmp 格式以允许我将其写入存储,但没有取得太大成功。 API 调用非常有效,我可以在本地获取 CSV。上传到 Cloud Storage 是我收到错误的地方。

非常感谢任何帮助!

据我所知,您遇到了一些问题。

首先,如果提供文件路径或缓冲区作为参数,pd.to_csv 不会 return 任何东西。所以这一行写入文件,同时也将值 None 赋值给 new_list.

new_list = new_list.to_csv('/tmp/temp.csv')

要解决此问题,只需删除分配 - 您只需要 new_list.to_csv('/tmp/tmp.csv') 行。

第一个错误导致了以后的问题,因为您无法将 CSV 写入位置 None。相反,提供一个字符串作为 open 的参数。此外,如果您使用 open mode 'w',CSV 数据将被覆盖。你在这里要的格式是什么?您的意思是使用 'a' 附加到文件中吗?

message = "Data for CSV file"    # ERROR HERE
csv = open(new_list, "w")
csv.write(message)

最后,您将在需要字符串的地方提供一个文件对象,这次是 upload_blob 函数的 source_file_name 参数。


    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

我认为在这里您可以跳过文件打开,只需将文件路径作为第二个参数传递。

与其尝试在您的云函数中使用临时存储,不如尝试将您的数据框和 upload the result 转换为 Google 云存储。

考虑例如:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Convert your df to str: it is straightforward, just do not provide
    # any value for the first param path_or_buf
    csv_str = new_list.to_csv()

    # Then, upload it to cloud storage
    def upload_blob(bucket_name, data, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        # Note the use of upload_from_string here. Please, provide
        # the appropriate content type if you wish
        blob.upload_from_string(data, content_type='text/csv')

    upload_blob('data-exports', csv_str, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)