如何使用python multiprocessing 遍历http请求后得到全局求和?
How to use python multiprocessing to get a global sum after traversing through http requests?
我正在尝试编写一种遍历整个节点集合的算法,并 returns 它们的奖励总和。每个奖励只能计算一次。算法的输入将是 URL 作为节点的开头,例如 http://fake.url/a.
URL 的每个 get 请求都会 return 一个 JSON 像这样:
{
"children":[
"http://fake.url/b",
"http://fake.url/c"
],
"reward":1
}
这是我尝试过的方法:
import multiprocessing
import requests
import json
my_q = multiprocessing.Queue()
my_list =['http://fake.url/']
reward_sum = 0
def enqueue(q):
for data in my_list:
q.put(data)
def get_it(q):
while not q.empty():
item = q.get()
print(item)
response = requests.get(item)
kids = json.loads(response.content)
print(f'URL: {item} --> {kids["reward"]}')
for kid in kids['children']:
print(kid)
q.put(kid)
p1 = multiprocessing.Process(target=enqueue, args=(my_q,))
p2 = multiprocessing.Process(target=get_it, args=(my_q,))
p1.start()
p2.start()
p1.join()
p2.join()
以上有效:
- 我正在使用多处理。
- 我正在正确访问 children 和奖励。
- 我得到这样的输出:
http://fake.url/a
URL: http://fake.url/a --> 1
{'children': ['http://fake.url/b', 'http://fake.url/c'], 'reward': 1}
http://fake.url/b
http://fake.url/c
http://fake.url/b
URL: http://fake.url/b --> 2
{'children': ['http://fake.url/d', 'http://fake.url/e'], 'reward': 2}
http://fake.url/d
http://fake.url/e
http://fake.url/c
URL: http://fake.url/c --> 3
{'children': ['http://fake.url/f', 'http://fake.url/g'], 'reward': 3}
http://fake.url/f
http://fake.url/g
http://fake.url/d
URL: http://fake.url/d --> 4
{'reward': 4}
http://fake.url/e
URL: http://fake.url/e --> 5
{'reward': 5}
http://fake.url/f
URL: http://fake.url/f --> 6
{'children': ['http://fake.url/h'], 'reward': 6}
http://fake.url/h
http://fake.url/g
我有什么问题需要帮助:
- 如何在全局变量中跟踪总奖励金额?
- 如何跟踪全局“已见”集,这样我就不会在总奖励总和中添加重复项?
def get_it(q):
rewards_total = 0
seen = set()
while not q.empty():
item = q.get()
print(item)
if item in seen:
continue
seen.add(item)
response = requests.get(item)
kids = json.loads(response.content)
rewards_total += kids["reward"]
print(f'URL: {item} --> {kids["reward"]}')
for kid in kids['children']:
print(kid)
q.put(kid)
return rewards_total
我正在尝试编写一种遍历整个节点集合的算法,并 returns 它们的奖励总和。每个奖励只能计算一次。算法的输入将是 URL 作为节点的开头,例如 http://fake.url/a.
URL 的每个 get 请求都会 return 一个 JSON 像这样:
{
"children":[
"http://fake.url/b",
"http://fake.url/c"
],
"reward":1
}
这是我尝试过的方法:
import multiprocessing
import requests
import json
my_q = multiprocessing.Queue()
my_list =['http://fake.url/']
reward_sum = 0
def enqueue(q):
for data in my_list:
q.put(data)
def get_it(q):
while not q.empty():
item = q.get()
print(item)
response = requests.get(item)
kids = json.loads(response.content)
print(f'URL: {item} --> {kids["reward"]}')
for kid in kids['children']:
print(kid)
q.put(kid)
p1 = multiprocessing.Process(target=enqueue, args=(my_q,))
p2 = multiprocessing.Process(target=get_it, args=(my_q,))
p1.start()
p2.start()
p1.join()
p2.join()
以上有效:
- 我正在使用多处理。
- 我正在正确访问 children 和奖励。
- 我得到这样的输出:
http://fake.url/a
URL: http://fake.url/a --> 1
{'children': ['http://fake.url/b', 'http://fake.url/c'], 'reward': 1}
http://fake.url/b
http://fake.url/c
http://fake.url/b
URL: http://fake.url/b --> 2
{'children': ['http://fake.url/d', 'http://fake.url/e'], 'reward': 2}
http://fake.url/d
http://fake.url/e
http://fake.url/c
URL: http://fake.url/c --> 3
{'children': ['http://fake.url/f', 'http://fake.url/g'], 'reward': 3}
http://fake.url/f
http://fake.url/g
http://fake.url/d
URL: http://fake.url/d --> 4
{'reward': 4}
http://fake.url/e
URL: http://fake.url/e --> 5
{'reward': 5}
http://fake.url/f
URL: http://fake.url/f --> 6
{'children': ['http://fake.url/h'], 'reward': 6}
http://fake.url/h
http://fake.url/g
我有什么问题需要帮助:
- 如何在全局变量中跟踪总奖励金额?
- 如何跟踪全局“已见”集,这样我就不会在总奖励总和中添加重复项?
def get_it(q):
rewards_total = 0
seen = set()
while not q.empty():
item = q.get()
print(item)
if item in seen:
continue
seen.add(item)
response = requests.get(item)
kids = json.loads(response.content)
rewards_total += kids["reward"]
print(f'URL: {item} --> {kids["reward"]}')
for kid in kids['children']:
print(kid)
q.put(kid)
return rewards_total