如何运行一个线程函数,returns一个变量?
How to run a threaded function that returns a variable?
使用 Python 3.6,我想要完成的是创建一个函数,该函数在脚本的其余部分执行的同时,不断从网页中抓取 dynamic/changing 数据,并且能够引用从连续函数返回的数据。
我知道这可能是一个线程任务,但我对它还不是很了解。我可能认为的伪代码看起来像这样
def continuous_scraper():
# Pull data from webpage
scraped_table = pd.read_html(url)
return scraped_table
# start the continuous scraper function here, to run either indefinitely, or preferably stop after a predefined amount of time
scraped_table = thread(continuous_scraper)
# the rest of the script is run here, making use of the updating “scraped_table”
while True:
print(scraped_table[“Col_1”].iloc[0]
我建议查看 multiprocessing 库和 Pool
class。
文档中有多个示例说明如何使用它。
问题本身太笼统,无法简单回答。
这是一个相当简单的示例,使用了一些似乎每两秒更新一次的股票市场页面。
import threading, time
import pandas as pd
# A lock is used to ensure only one thread reads or writes the variable at any one time
scraped_table_lock = threading.Lock()
# Initially set to None so we know when its value has changed
scraped_table = None
# This bad-boy will be called only once in a separate thread
def continuous_scraper():
# Tell Python this is a global variable, so it rebinds scraped_table
# instead of creating a local variable that is also named scraped_table
global scraped_table
url = r"https://tradingeconomics.com/australia/stock-market"
while True:
# Pull data from webpage
result = pd.read_html(url, match="Dow Jones")[0]
# Acquire the lock to ensure thread-safety, then assign the new result
# This is done after read_html returns so it doesn't hold the lock for so long
with scraped_table_lock:
scraped_table = result
# You don't wanna flog the server, so wait 2 seconds after each
# response before sending another request
time.sleep(2)
# Make the thread daemonic, so the thread doesn't continue to run once the
# main script and any other non-daemonic threads have ended
scraper_thread = threading.Thread(target=continuous_scraper, daemon=True)
# start the continuous scraper function here, to run either indefinitely, or
# preferably stop after a predefined amount of time
scraper_thread.start()
# the rest of the script is run here, making use of the updating “scraped_table”
for _ in range(100):
print("Time:", time.time())
# Acquire the lock to ensure thread-safety
with scraped_table_lock:
# Check if it has been changed from the default value of None
if scraped_table is not None:
print(" ", scraped_table)
else:
print("scraped_table is None")
# You probably don't wanna flog your stdout, either, dawg!
time.sleep(0.5)
一定要阅读有关多线程编程和线程安全的内容。很容易出错。如果有错误,它通常只在罕见且看似随机的情况下出现,因此很难调试。
使用 Python 3.6,我想要完成的是创建一个函数,该函数在脚本的其余部分执行的同时,不断从网页中抓取 dynamic/changing 数据,并且能够引用从连续函数返回的数据。
我知道这可能是一个线程任务,但我对它还不是很了解。我可能认为的伪代码看起来像这样
def continuous_scraper():
# Pull data from webpage
scraped_table = pd.read_html(url)
return scraped_table
# start the continuous scraper function here, to run either indefinitely, or preferably stop after a predefined amount of time
scraped_table = thread(continuous_scraper)
# the rest of the script is run here, making use of the updating “scraped_table”
while True:
print(scraped_table[“Col_1”].iloc[0]
我建议查看 multiprocessing 库和 Pool
class。
文档中有多个示例说明如何使用它。
问题本身太笼统,无法简单回答。
这是一个相当简单的示例,使用了一些似乎每两秒更新一次的股票市场页面。
import threading, time
import pandas as pd
# A lock is used to ensure only one thread reads or writes the variable at any one time
scraped_table_lock = threading.Lock()
# Initially set to None so we know when its value has changed
scraped_table = None
# This bad-boy will be called only once in a separate thread
def continuous_scraper():
# Tell Python this is a global variable, so it rebinds scraped_table
# instead of creating a local variable that is also named scraped_table
global scraped_table
url = r"https://tradingeconomics.com/australia/stock-market"
while True:
# Pull data from webpage
result = pd.read_html(url, match="Dow Jones")[0]
# Acquire the lock to ensure thread-safety, then assign the new result
# This is done after read_html returns so it doesn't hold the lock for so long
with scraped_table_lock:
scraped_table = result
# You don't wanna flog the server, so wait 2 seconds after each
# response before sending another request
time.sleep(2)
# Make the thread daemonic, so the thread doesn't continue to run once the
# main script and any other non-daemonic threads have ended
scraper_thread = threading.Thread(target=continuous_scraper, daemon=True)
# start the continuous scraper function here, to run either indefinitely, or
# preferably stop after a predefined amount of time
scraper_thread.start()
# the rest of the script is run here, making use of the updating “scraped_table”
for _ in range(100):
print("Time:", time.time())
# Acquire the lock to ensure thread-safety
with scraped_table_lock:
# Check if it has been changed from the default value of None
if scraped_table is not None:
print(" ", scraped_table)
else:
print("scraped_table is None")
# You probably don't wanna flog your stdout, either, dawg!
time.sleep(0.5)
一定要阅读有关多线程编程和线程安全的内容。很容易出错。如果有错误,它通常只在罕见且看似随机的情况下出现,因此很难调试。