如何在 Python 中的线程之间共享数组索引?
How to share array index between threads in Python?
我有以下代码:
def task1():
for url in splitarr[0]:
print(url) #these are supposed to be scrape_induvidual_page() . print is just for debugging
def task2():
for url in splitarr[1]:
print(url)
def task3():
for url in splitarr[2]:
print(url)
def task4():
for url in splitarr[3]:
print(url)
def task5():
for url in splitarr[4]:
print(url)
def task6():
for url in splitarr[5]:
print(url)
def task7():
for url in splitarr[6]:
print(url)
def task8():
for url in splitarr[7]:
print(url)
splitarr=np.array_split(urllist, 8)
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task2, name='t2')
t3 = threading.Thread(target=task3, name='t3')
t4 = threading.Thread(target=task4, name='t4')
t5 = threading.Thread(target=task5, name='t5')
t6 = threading.Thread(target=task6, name='t6')
t7 = threading.Thread(target=task7, name='t7')
t8 = threading.Thread(target=task8, name='t8')
t1.start()
t2.start()
t3.start()
t4.start()
t5.start()
t6.start()
t7.start()
t8.start()
t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join()
t8.join()
而且它确实具有所需的输出,没有重复或任何东西
https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60fps-eng-flac-webdl-2160p-x264-zmachine-t1041079.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-nhd-x264-nhanc3-t127050.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html
但是,我觉得所有重复的 def taskx(): 代码有点多余
所以我试图通过使用单个任务来压缩代码:
x=0
def task1():
global x
for url in splitarr[x]:
print(url)
x=x+1
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task1, name='t2')
t3 = threading.Thread(target=task1, name='t3')
t4 = threading.Thread(target=task1, name='t4')
t5 = threading.Thread(target=task1, name='t5')
t6 = threading.Thread(target=task1, name='t6')
t7 = threading.Thread(target=task1, name='t7')
t8 = threading.Thread(target=task1, name='t8')
t1.start()
t2.start()
t3.start()
t4.start()
t5.start()
t6.start()
t7.start()
t8.start()
t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join()
t8.join()
但是,这会产生不需要的重复输出:
https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html
如何在多线程程序中使 x 正确递增?
for url in splitarr[x]:
为 splitarr[x]
中的序列创建一个迭代器。稍后增加 x 并不重要 - 迭代器已经构建。因为你在那里有一个打印,很可能所有线程都会在它仍然为零时获取 x
并迭代相同的序列。
一种解决方案是通过 threading.Thread
中的 args
参数将增量值传递给 task1。但是线程池更容易。
from multiprocessing.pool import ThreadPool
# generate test array
splitarr = []
for i in range(8):
splitarr.append([f"url_{i}_{j}" for j in range(4)])
def task(splitarr_column):
for url in splitarr_column:
print(url)
with ThreadPool(len(splitarr)) as pool:
result = pool.map(task, splitarr)
在此示例中,len(splitarr)
用于在 splitarr
中为每个序列创建一个线程。然后将这些序列中的每一个映射到 task
函数。由于我们创建了正确数量的线程来处理所有序列,因此它们一次全部 运行。映射完成后,with
子句退出,池关闭,线程终止。
我有以下代码:
def task1():
for url in splitarr[0]:
print(url) #these are supposed to be scrape_induvidual_page() . print is just for debugging
def task2():
for url in splitarr[1]:
print(url)
def task3():
for url in splitarr[2]:
print(url)
def task4():
for url in splitarr[3]:
print(url)
def task5():
for url in splitarr[4]:
print(url)
def task6():
for url in splitarr[5]:
print(url)
def task7():
for url in splitarr[6]:
print(url)
def task8():
for url in splitarr[7]:
print(url)
splitarr=np.array_split(urllist, 8)
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task2, name='t2')
t3 = threading.Thread(target=task3, name='t3')
t4 = threading.Thread(target=task4, name='t4')
t5 = threading.Thread(target=task5, name='t5')
t6 = threading.Thread(target=task6, name='t6')
t7 = threading.Thread(target=task7, name='t7')
t8 = threading.Thread(target=task8, name='t8')
t1.start()
t2.start()
t3.start()
t4.start()
t5.start()
t6.start()
t7.start()
t8.start()
t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join()
t8.join()
而且它确实具有所需的输出,没有重复或任何东西
https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60fps-eng-flac-webdl-2160p-x264-zmachine-t1041079.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-nhd-x264-nhanc3-t127050.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html
但是,我觉得所有重复的 def taskx(): 代码有点多余 所以我试图通过使用单个任务来压缩代码:
x=0
def task1():
global x
for url in splitarr[x]:
print(url)
x=x+1
t1 = threading.Thread(target=task1, name='t1')
t2 = threading.Thread(target=task1, name='t2')
t3 = threading.Thread(target=task1, name='t3')
t4 = threading.Thread(target=task1, name='t4')
t5 = threading.Thread(target=task1, name='t5')
t6 = threading.Thread(target=task1, name='t6')
t7 = threading.Thread(target=task1, name='t7')
t8 = threading.Thread(target=task1, name='t8')
t1.start()
t2.start()
t3.start()
t4.start()
t5.start()
t6.start()
t7.start()
t8.start()
t1.join()
t2.join()
t3.join()
t4.join()
t5.join()
t6.join()
t7.join()
t8.join()
但是,这会产生不需要的重复输出:
https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-1080p-h264-aac-5-1-tntvillage-t115783.html
https://kickasstorrents.to/big-buck-bunny-4k-uhd-hfr-60-fps-flac-webrip-2160p-x265-zmachine-t1041689.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/big-buck-bunny-2008-720p-bluray-x264-don-no-rars-t11623.html
https://kickasstorrents.to/tkillaahh-big-buck-bunny-dvd-720p-2lions-team-t87503.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html
https://kickasstorrents.to/big-buck-bunny-2008-brrip-720p-x264-mitzep-t172753.html
如何在多线程程序中使 x 正确递增?
for url in splitarr[x]:
为 splitarr[x]
中的序列创建一个迭代器。稍后增加 x 并不重要 - 迭代器已经构建。因为你在那里有一个打印,很可能所有线程都会在它仍然为零时获取 x
并迭代相同的序列。
一种解决方案是通过 threading.Thread
中的 args
参数将增量值传递给 task1。但是线程池更容易。
from multiprocessing.pool import ThreadPool
# generate test array
splitarr = []
for i in range(8):
splitarr.append([f"url_{i}_{j}" for j in range(4)])
def task(splitarr_column):
for url in splitarr_column:
print(url)
with ThreadPool(len(splitarr)) as pool:
result = pool.map(task, splitarr)
在此示例中,len(splitarr)
用于在 splitarr
中为每个序列创建一个线程。然后将这些序列中的每一个映射到 task
函数。由于我们创建了正确数量的线程来处理所有序列,因此它们一次全部 运行。映射完成后,with
子句退出,池关闭,线程终止。