Pymongo,select 随机序列?
Pymongo, select random sequences?
我有一个 mongo 这样的数据库集合
{ "_id" : ObjectId("5e2d4b6479799acab037af68"),
"timestamp" : 1577152302, #hourliy
"login" : 'A4FC9',
# 240 columns more
}
我想 select 来自 mongo 数据库的随机序列,其中包含 24 个先前的记录。
数据库没有每个小时的记录,有时它们会被遗漏,所以我无法制作随机时间戳并在时间戳范围内搜索之前的 24 条记录。
目前我正在这样做,但它真的很慢,我想提高性能
db = mongodb...
def sequence( size ):
# first I search for a random row in my db and fetch the timestamb
random = next( db.aggregate( [{ '$sample': { 'size' : 1 }}] )['timestamp']
#then a search the 24 previous
next_rows = db.find( {'timestamp': {'$lte': random }} ).sort("timestamp",-1).limit( size )
return next_rows
SEQUENCES = 100
batch = list()
for x in range( SEQUENCES ):
rand_sequence = sequence( 24 )
batch.append( rand_sequence )
这 returns 100 个序列,从随机时间戳开始,有 24 个先前记录。
获取所有数据需要30分钟
有没有办法只用一个查询就可以做到这一点?
另外如果有其他方法请指教
我做到了!!
from random import randrange
db = mongodb...
random = next( db.aggregate( [{ '$sample': { 'size' : 1 }}] )['timestamp']
#then a search the 50000 previous
maxrows = 50000
SEQUENCES = 100
SEQUENCE_LENGTH = 100
next_rows = db.find( {'timestamp': {'$lte': random }} ).sort("timestamp",-1).limit(maxrows)
# this makes an array of random indexes to select from
chosed = [ randrange( 0, maxrows - SEQUENCE_LENGTH ) for x in range( SEQUENCE_LENGTH ) ]
#then i create a list to store all the sequences
sequences_list = [ [] for x in range( len(chosed) ) ]
# then i append all the values in squences_list if the index is previous to SEQUENCE_LENGTH
for i, x in enumerate( next_rows ):
for j, y in enumerate(chosed):
if y >= i and y - SEQUENCE_LENGTH < i:
#append the values correponding to j position on the list
sequences_list[j].append( x )
我有一个 mongo 这样的数据库集合
{ "_id" : ObjectId("5e2d4b6479799acab037af68"),
"timestamp" : 1577152302, #hourliy
"login" : 'A4FC9',
# 240 columns more
}
我想 select 来自 mongo 数据库的随机序列,其中包含 24 个先前的记录。
数据库没有每个小时的记录,有时它们会被遗漏,所以我无法制作随机时间戳并在时间戳范围内搜索之前的 24 条记录。
目前我正在这样做,但它真的很慢,我想提高性能
db = mongodb...
def sequence( size ):
# first I search for a random row in my db and fetch the timestamb
random = next( db.aggregate( [{ '$sample': { 'size' : 1 }}] )['timestamp']
#then a search the 24 previous
next_rows = db.find( {'timestamp': {'$lte': random }} ).sort("timestamp",-1).limit( size )
return next_rows
SEQUENCES = 100
batch = list()
for x in range( SEQUENCES ):
rand_sequence = sequence( 24 )
batch.append( rand_sequence )
这 returns 100 个序列,从随机时间戳开始,有 24 个先前记录。 获取所有数据需要30分钟 有没有办法只用一个查询就可以做到这一点?
另外如果有其他方法请指教
我做到了!!
from random import randrange
db = mongodb...
random = next( db.aggregate( [{ '$sample': { 'size' : 1 }}] )['timestamp']
#then a search the 50000 previous
maxrows = 50000
SEQUENCES = 100
SEQUENCE_LENGTH = 100
next_rows = db.find( {'timestamp': {'$lte': random }} ).sort("timestamp",-1).limit(maxrows)
# this makes an array of random indexes to select from
chosed = [ randrange( 0, maxrows - SEQUENCE_LENGTH ) for x in range( SEQUENCE_LENGTH ) ]
#then i create a list to store all the sequences
sequences_list = [ [] for x in range( len(chosed) ) ]
# then i append all the values in squences_list if the index is previous to SEQUENCE_LENGTH
for i, x in enumerate( next_rows ):
for j, y in enumerate(chosed):
if y >= i and y - SEQUENCE_LENGTH < i:
#append the values correponding to j position on the list
sequences_list[j].append( x )