如何使用 PySpark 将一堆数据帧记录发送到 API
How to send a bunch of dataframe records to an API using PySpark
如何将数据帧元组批量发送到 API。
headers = {
'Content-Type': 'application/json',
'Accept': '*/*'
}
data = {"some_key": "some_value", "another_key": "another_value" }
r = requests.post('https://api.somewhere/batch', params={}, headers=headers, json=data)
如果 JSON 有效负载来自 PySpark 中的 DataFrame,我如何利用 Spark 来批处理这种当前的单线程方法?
您可以将数据帧转换为 JSON:
def batch_json(row):
# Anything you want to process with every row/partition
r = requests.post('https://api.somewhere/batch', params={}, headers=headers, json=row)
print(r.status_code)
df.toJSON().foreach(batch_json)
# OR
# "batch_json" cannot be used as it is, you will have to change it according to your need
df.toJSON().foreachPartition(batch_json)
快速测试代码:
def batch(row):
print(row)
df.toJSON().foreach(batch)
如何将数据帧元组批量发送到 API。
headers = {
'Content-Type': 'application/json',
'Accept': '*/*'
}
data = {"some_key": "some_value", "another_key": "another_value" }
r = requests.post('https://api.somewhere/batch', params={}, headers=headers, json=data)
如果 JSON 有效负载来自 PySpark 中的 DataFrame,我如何利用 Spark 来批处理这种当前的单线程方法?
您可以将数据帧转换为 JSON:
def batch_json(row):
# Anything you want to process with every row/partition
r = requests.post('https://api.somewhere/batch', params={}, headers=headers, json=row)
print(r.status_code)
df.toJSON().foreach(batch_json)
# OR
# "batch_json" cannot be used as it is, you will have to change it according to your need
df.toJSON().foreachPartition(batch_json)
快速测试代码:
def batch(row):
print(row)
df.toJSON().foreach(batch)