Pymongo,select 随机序列?

Pymongo, select random sequences?

我有一个 mongo 这样的数据库集合

{ "_id" : ObjectId("5e2d4b6479799acab037af68"), 
  "timestamp" : 1577152302, #hourliy
  "login" : 'A4FC9',
  # 240 columns more
}

我想 select 来自 mongo 数据库的随机序列,其中包含 24 个先前的记录。

数据库没有每个小时的记录,有时它们会被遗漏,所以我无法制作随机时间戳并在时间戳范围内搜索之前的 24 条记录。

目前我正在这样做,但它真的很慢,我想提高性能


db = mongodb...

def sequence( size ):
     # first I search for a random row in my db and fetch the timestamb
     random = next( db.aggregate( [{ '$sample': { 'size'  : 1 }}] )['timestamp']

     #then a search the 24 previous
     next_rows = db.find( {'timestamp': {'$lte': random }} ).sort("timestamp",-1).limit( size )
     return next_rows

SEQUENCES = 100

batch = list()
for x in range( SEQUENCES ):
    rand_sequence = sequence( 24 )
    batch.append( rand_sequence ) 

这 returns 100 个序列,从随机时间戳开始,有 24 个先前记录。 获取所有数据需要30分钟 有没有办法只用一个查询就可以做到这一点?

另外如果有其他方法请指教

我做到了!!

from random import randrange

db = mongodb...

random = next( db.aggregate( [{ '$sample': { 'size'  : 1 }}] )['timestamp']

#then a search the 50000 previous

maxrows = 50000
SEQUENCES = 100
SEQUENCE_LENGTH = 100

next_rows = db.find( {'timestamp': {'$lte': random }} ).sort("timestamp",-1).limit(maxrows)

# this makes an array of random indexes to select from
chosed = [ randrange( 0, maxrows - SEQUENCE_LENGTH  ) for x in range( SEQUENCE_LENGTH ) ]

#then i create a list to store all the sequences
sequences_list = [ [] for x in range( len(chosed) ) ]

# then i append all the values in squences_list if the index is previous to SEQUENCE_LENGTH 
for i, x in enumerate( next_rows ):
    for j, y in enumerate(chosed):
        if y >= i and y - SEQUENCE_LENGTH < i:
            #append the values correponding to j position on the list
            sequences_list[j].append( x )