使用 Python 和 pymongo 的多线程
Multi threading using Python and pymongo
你好,我想制作一个程序,它将 class 发布正面和负面的推文 class 验证关于一家公司的推文已经保存在 mongodb 中并且一旦 class 化, 根据当时的结果更新一个整数。
我已经编写了使这成为可能的代码,但我想对程序进行多线程处理,但我在 python 中没有这方面的经验,并且一直在尝试按照教程进行操作,但不幸的是程序只是启动和退出而不经过任何代码。
如果有人能帮我解决这个问题,我将不胜感激。该程序的代码和预期的多线程如下。
from textblob.classifiers import NaiveBayesClassifier
import pymongo
import datetime
from threading import Thread
train = [
('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg'),
(':)', 'pos'),
(':(', 'neg'),
('gr8', 'pos'),
('gr8t', 'pos'),
('lol', 'pos'),
('bff', 'neg'),
]
test = [
'The beer was good.',
'I do not enjoy my job',
"I ain't feeling dandy today.",
"I feel amazing!",
'Gary is a friend of mine.',
"I can't believe I'm doing this.",
]
filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Yahoo', 'Apple', 'Google', 'Amazon', 'EBay', 'Diageo',
'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
'Investec', 'WWE', 'Time Warner', 'Santander Group']
# Create pos/neg counter variables for each company using dicts
vars = {}
for word in filterKeywords:
vars[word + "SentimentOverall"] = 0
# Initialising the classifier
cl = NaiveBayesClassifier(train)
class TrainingClassification():
def __init__(self):
#creating the mongodb connection
try:
conn = pymongo.MongoClient('localhost', 27017)
print "Connected successfully!!!"
global db
db = conn.TwitterDB
except pymongo.errors.ConnectionFailure, e:
print "Could not connect to MongoDB: %s" % e
thread1 = Thread(target=self.apple_thread, args=())
thread1.start()
thread1.join()
print "thread finished...exiting"
def apple_thread(self):
appleSentimentText = []
for record in db.Apple.find():
if record.get('created_at'):
created_at = record.get('created_at')
dt = datetime.strptime(created_at, '%a %b %d %H:%M:%S +0000 %Y')
if record.get('text') and dt > datetime.today():
appleSentimentText.append(record.get("text"))
for targetText in appleSentimentText:
classificationApple = cl.classify(targetText)
if classificationApple == "pos":
vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] + 1
elif classificationApple == "neg":
vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] - 1
您的代码的主要问题在于:
thread1.start()
thread1.join()
当您在线程上调用 join 时,它的效果是使当前 运行ning 线程(在您的例子中是主线程)等待线程(这里是线程 1)完成。所以你可以看到你的代码实际上不会更快。它只是启动一个线程并等待它。由于线程的创建,它实际上会稍微慢一些。
下面是进行多线程处理的正确方法:
thread1.start()
thread2.start()
thread1.join()
thread2.join()
在此代码中,线程 1 和线程 2 都将 运行 并行。
重要提示:请注意,在 Python 中,它是 "simulated" 并行化。因为Python的核心不是线程安全的(主要是因为它做垃圾回收的方式),它使用了GIL(Global Interpreter Lock),因此一个进程中的所有线程运行 only only 1核。
如果您热衷于使用真正的并行化(例如,如果您的 2 个线程是 CPU 范围而不是 I/O 范围),那么请查看多处理模块。
你好,我想制作一个程序,它将 class 发布正面和负面的推文 class 验证关于一家公司的推文已经保存在 mongodb 中并且一旦 class 化, 根据当时的结果更新一个整数。
我已经编写了使这成为可能的代码,但我想对程序进行多线程处理,但我在 python 中没有这方面的经验,并且一直在尝试按照教程进行操作,但不幸的是程序只是启动和退出而不经过任何代码。
如果有人能帮我解决这个问题,我将不胜感激。该程序的代码和预期的多线程如下。
from textblob.classifiers import NaiveBayesClassifier
import pymongo
import datetime
from threading import Thread
train = [
('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg'),
(':)', 'pos'),
(':(', 'neg'),
('gr8', 'pos'),
('gr8t', 'pos'),
('lol', 'pos'),
('bff', 'neg'),
]
test = [
'The beer was good.',
'I do not enjoy my job',
"I ain't feeling dandy today.",
"I feel amazing!",
'Gary is a friend of mine.',
"I can't believe I'm doing this.",
]
filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Yahoo', 'Apple', 'Google', 'Amazon', 'EBay', 'Diageo',
'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
'Investec', 'WWE', 'Time Warner', 'Santander Group']
# Create pos/neg counter variables for each company using dicts
vars = {}
for word in filterKeywords:
vars[word + "SentimentOverall"] = 0
# Initialising the classifier
cl = NaiveBayesClassifier(train)
class TrainingClassification():
def __init__(self):
#creating the mongodb connection
try:
conn = pymongo.MongoClient('localhost', 27017)
print "Connected successfully!!!"
global db
db = conn.TwitterDB
except pymongo.errors.ConnectionFailure, e:
print "Could not connect to MongoDB: %s" % e
thread1 = Thread(target=self.apple_thread, args=())
thread1.start()
thread1.join()
print "thread finished...exiting"
def apple_thread(self):
appleSentimentText = []
for record in db.Apple.find():
if record.get('created_at'):
created_at = record.get('created_at')
dt = datetime.strptime(created_at, '%a %b %d %H:%M:%S +0000 %Y')
if record.get('text') and dt > datetime.today():
appleSentimentText.append(record.get("text"))
for targetText in appleSentimentText:
classificationApple = cl.classify(targetText)
if classificationApple == "pos":
vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] + 1
elif classificationApple == "neg":
vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] - 1
您的代码的主要问题在于:
thread1.start()
thread1.join()
当您在线程上调用 join 时,它的效果是使当前 运行ning 线程(在您的例子中是主线程)等待线程(这里是线程 1)完成。所以你可以看到你的代码实际上不会更快。它只是启动一个线程并等待它。由于线程的创建,它实际上会稍微慢一些。
下面是进行多线程处理的正确方法:
thread1.start()
thread2.start()
thread1.join()
thread2.join()
在此代码中,线程 1 和线程 2 都将 运行 并行。
重要提示:请注意,在 Python 中,它是 "simulated" 并行化。因为Python的核心不是线程安全的(主要是因为它做垃圾回收的方式),它使用了GIL(Global Interpreter Lock),因此一个进程中的所有线程运行 only only 1核。 如果您热衷于使用真正的并行化(例如,如果您的 2 个线程是 CPU 范围而不是 I/O 范围),那么请查看多处理模块。