通过 API 递归下载产品 + 将其存储在 Postgres 中 -> 每次递归都会消耗更多 RAM。递归后如何清除RAM?
Download products via API in recursion + store it in Postgres -> every recursion consumes more RAM. How to clear RAM after recursion?
我需要通过 API 下载 400k 条记录并将其存储在本地数据库中。每个请求 returns 500 条记录 + 下一页的光标。我使用递归,我发现 python3
进程每次递归都会消耗更多 RAM(每次递归大约 50 MB,因此我需要 40 GB RAM 才能下载所有内容)。每次递归后清RAM的方法是什么?
import os
import psycopg2
import requests
from psycopg2.extras import execute_values
class SomeClass:
def __init__:
with psycopg2.connect(
host='localhost',
dbname=os.getenv("POSTGRES_DBNAME"),
user=os.getenv("POSTGRES_USER"),
password=os.getenv("POSTGRES_PASSWORD")
) as self.conn:
self.products_download_n_save()
def products_download_n_save(self, cursor=''):
basic_url = 'SOME_URL'
if not cursor:
url = basic_url
else:
url = f'{basic_url}?cursor={cursor}'
r = requests.get(url)
response = r.json()
products = response['products']
# Need to loop every product, because different product have different amount of fields
for product_d in products:
values = [[value for value in product_d.values()]]
columns = product_d.keys()
do_update_query = ','.join([f'{column} = excluded.{column}' for column in columns])
query = f"INSERT INTO {self.PRODUCTS_TABLE_NAME} ({','.join(columns)}) VALUES %s " \
f"on conflict(productCode) " \
f"DO UPDATE SET {do_update_query};"
with self.conn.cursor() as cursor:
execute_values(cursor, query, values)
self.conn.commit()
response_cursor = response.get('nextCursor', '')
if response_cursor:
self.products_download_n_save(response_cursor)
找到解决方案 - 将代码从递归移至 while response_cursor
循环。
我需要通过 API 下载 400k 条记录并将其存储在本地数据库中。每个请求 returns 500 条记录 + 下一页的光标。我使用递归,我发现 python3
进程每次递归都会消耗更多 RAM(每次递归大约 50 MB,因此我需要 40 GB RAM 才能下载所有内容)。每次递归后清RAM的方法是什么?
import os
import psycopg2
import requests
from psycopg2.extras import execute_values
class SomeClass:
def __init__:
with psycopg2.connect(
host='localhost',
dbname=os.getenv("POSTGRES_DBNAME"),
user=os.getenv("POSTGRES_USER"),
password=os.getenv("POSTGRES_PASSWORD")
) as self.conn:
self.products_download_n_save()
def products_download_n_save(self, cursor=''):
basic_url = 'SOME_URL'
if not cursor:
url = basic_url
else:
url = f'{basic_url}?cursor={cursor}'
r = requests.get(url)
response = r.json()
products = response['products']
# Need to loop every product, because different product have different amount of fields
for product_d in products:
values = [[value for value in product_d.values()]]
columns = product_d.keys()
do_update_query = ','.join([f'{column} = excluded.{column}' for column in columns])
query = f"INSERT INTO {self.PRODUCTS_TABLE_NAME} ({','.join(columns)}) VALUES %s " \
f"on conflict(productCode) " \
f"DO UPDATE SET {do_update_query};"
with self.conn.cursor() as cursor:
execute_values(cursor, query, values)
self.conn.commit()
response_cursor = response.get('nextCursor', '')
if response_cursor:
self.products_download_n_save(response_cursor)
找到解决方案 - 将代码从递归移至 while response_cursor
循环。