Python 多处理:读取大文件并更新导入的字典
Python multiprocessing: Reading a large file and updating an imported dictionary
我需要读取一个大文件并相应地更新导入的字典,使用 multiprocessing
Pool
和 Manager
。这是我的代码:
from multiprocessing import Pool, Manager
manager = Manager()
d = manager.dict()
imported_dic = json.load(~/file.json) #loading a file containing a large dictionary
d.update(imported_dic)
def f(line):
data = line.split('\t')
uid = data[0]
tweet = data[2].decode('utf-8')
if #sth in tweet:
d[uid] += 1
p = Pool(4)
with open('~/test_1k.txt') as source_file:
p.map(f, source_file)
但是不能正常使用。知道我在这里做错了什么吗?
试试这个代码:
d = init_dictionary( ) # some your magic here
def f(line):
data = line.split('\t')
uid = data[0]
tweet = data[2].decode('utf-8')
if uid in d:
for n in d[uid].keys():
if n in tweet:
yield uid, n, 1
else:
yield uid, n, 0
p = Pool(4)
with open('~/test_1k.txt') as source_file:
for stat in p.map(f, source_file):
uid, n, r = stat
d[uid][n] += r
这是相同的解决方案,但没有共享字典。
我需要读取一个大文件并相应地更新导入的字典,使用 multiprocessing
Pool
和 Manager
。这是我的代码:
from multiprocessing import Pool, Manager
manager = Manager()
d = manager.dict()
imported_dic = json.load(~/file.json) #loading a file containing a large dictionary
d.update(imported_dic)
def f(line):
data = line.split('\t')
uid = data[0]
tweet = data[2].decode('utf-8')
if #sth in tweet:
d[uid] += 1
p = Pool(4)
with open('~/test_1k.txt') as source_file:
p.map(f, source_file)
但是不能正常使用。知道我在这里做错了什么吗?
试试这个代码:
d = init_dictionary( ) # some your magic here
def f(line):
data = line.split('\t')
uid = data[0]
tweet = data[2].decode('utf-8')
if uid in d:
for n in d[uid].keys():
if n in tweet:
yield uid, n, 1
else:
yield uid, n, 0
p = Pool(4)
with open('~/test_1k.txt') as source_file:
for stat in p.map(f, source_file):
uid, n, r = stat
d[uid][n] += r
这是相同的解决方案,但没有共享字典。