django taggit similar_objects 查询非常慢
django taggit similar_objects very slow query
我正在尝试 select 3 个最近发布的项目,其中包含与当前项目相似的任何标签(以及其他一些过滤器)
找不到有效的方法,数据库中有很多 'item'。
from taggit_autosuggest.managers import TaggableManager
class Item(models.Model):
publish_date = DateField()
tags = TaggableManager()
sites = ManyToManyField(Site)
def my_view():
...
current_item = #get current item
related_items = Item.active_objects.filter(
sites=current_site,
id__in=[x.id for x in current_item.tags.similar_objects()]
).order_by('-publish_date')[:3]
...
但这会导致 similar_objects() 方法出现相当大的性能问题。 current_item 拥有
的标签越多,指数越差
# Query_time: 20.613503 Lock_time: 0.000182 Rows_sent: 83 Rows_examined: 7566504
SELECT `taggit_taggeditem`.`content_type_id`, `taggit_taggeditem`.`object_id`, COUNT(`taggit_taggeditem`.`id`) AS `n` FROM `taggit_taggeditem` WHERE (NOT (`taggit_taggeditem`.`object_id` = 205636 AND `taggit_taggeditem`.`content_type_id`
= 11 ) AND (`taggit_taggeditem`.`tag_id`) IN (SELECT DISTINCT `taggit_tag`.`id` FROM `taggit_tag` INNER JOIN `taggit_taggeditem` ON ( `taggit_tag`.`id` = `taggit_taggeditem`.`tag_id` ) WHERE (`taggit_taggeditem`.`object_id` = 205636 AND
`taggit_taggeditem`.`content_type_id` = 11 ))) GROUP BY `taggit_taggeditem`.`content_type_id`, `taggit_taggeditem`.`object_id` ORDER BY `n` DESC;
我也试过不使用相似对象方法
related_items = Item.active_objects.filter(
sites=current_site,
tags__in=current_item.tags.all()).exclude(slug=slug).order_by('-publish_date').distinct()[:3]
context['tagged'] = tags.order_by('-publish_date').distinct()[:3]
一直更糟(有些查询长达 120 秒,糟糕)
'nice' 方法是什么?!
我想你可以这样做:
related_items = current_item.tags.similar_objects().filter(
sites=current_site,
).order_by('-publish_date')[:3]
尽管我认为您必须在该过滤器中再次包含 active_objects
背后的逻辑。
我的假设是获取标签并使用标签-> 项目关系比搜索所有项目更有效。
所以我们构建一个所有 TaggedItem 的查询集,获取所有对象的 ID,然后执行我们的过滤器。
from taggit.models import TaggedItem
related_items = TaggedItem.objects.none()
for tag in current_item.tags.all():
#build queryset of all TaggedItems
related_items |= tag.taggit_taggeditem_items.all()
#TaggedItem doesn't have a direct link to the object, have to grab ids
ids = related_items.values_list('object_id', flat=True)
return Item.objects.filter(id__in=ids, sites=current_site).exclude(id=item.id).order_by('-publish_date')[:3]
我正在使用以下内容(我想要重叠百分比,因此 hocus-pocus 和 job_tags_len)。
job = Job.objects.get(id=job_id)
job_tags = list(job.tags.names())
job_tags_len = 100 / len(job_tags)
filtered = Worker.objects.filter(tags__name__in=job_tags).annotate(overlap=Count('id') * job_tags_len).order_by('-overlap').distinct()
我正在尝试 select 3 个最近发布的项目,其中包含与当前项目相似的任何标签(以及其他一些过滤器) 找不到有效的方法,数据库中有很多 'item'。
from taggit_autosuggest.managers import TaggableManager
class Item(models.Model):
publish_date = DateField()
tags = TaggableManager()
sites = ManyToManyField(Site)
def my_view():
...
current_item = #get current item
related_items = Item.active_objects.filter(
sites=current_site,
id__in=[x.id for x in current_item.tags.similar_objects()]
).order_by('-publish_date')[:3]
...
但这会导致 similar_objects() 方法出现相当大的性能问题。 current_item 拥有
的标签越多,指数越差# Query_time: 20.613503 Lock_time: 0.000182 Rows_sent: 83 Rows_examined: 7566504
SELECT `taggit_taggeditem`.`content_type_id`, `taggit_taggeditem`.`object_id`, COUNT(`taggit_taggeditem`.`id`) AS `n` FROM `taggit_taggeditem` WHERE (NOT (`taggit_taggeditem`.`object_id` = 205636 AND `taggit_taggeditem`.`content_type_id`
= 11 ) AND (`taggit_taggeditem`.`tag_id`) IN (SELECT DISTINCT `taggit_tag`.`id` FROM `taggit_tag` INNER JOIN `taggit_taggeditem` ON ( `taggit_tag`.`id` = `taggit_taggeditem`.`tag_id` ) WHERE (`taggit_taggeditem`.`object_id` = 205636 AND
`taggit_taggeditem`.`content_type_id` = 11 ))) GROUP BY `taggit_taggeditem`.`content_type_id`, `taggit_taggeditem`.`object_id` ORDER BY `n` DESC;
我也试过不使用相似对象方法
related_items = Item.active_objects.filter(
sites=current_site,
tags__in=current_item.tags.all()).exclude(slug=slug).order_by('-publish_date').distinct()[:3]
context['tagged'] = tags.order_by('-publish_date').distinct()[:3]
一直更糟(有些查询长达 120 秒,糟糕)
'nice' 方法是什么?!
我想你可以这样做:
related_items = current_item.tags.similar_objects().filter(
sites=current_site,
).order_by('-publish_date')[:3]
尽管我认为您必须在该过滤器中再次包含 active_objects
背后的逻辑。
我的假设是获取标签并使用标签-> 项目关系比搜索所有项目更有效。 所以我们构建一个所有 TaggedItem 的查询集,获取所有对象的 ID,然后执行我们的过滤器。
from taggit.models import TaggedItem
related_items = TaggedItem.objects.none()
for tag in current_item.tags.all():
#build queryset of all TaggedItems
related_items |= tag.taggit_taggeditem_items.all()
#TaggedItem doesn't have a direct link to the object, have to grab ids
ids = related_items.values_list('object_id', flat=True)
return Item.objects.filter(id__in=ids, sites=current_site).exclude(id=item.id).order_by('-publish_date')[:3]
我正在使用以下内容(我想要重叠百分比,因此 hocus-pocus 和 job_tags_len)。
job = Job.objects.get(id=job_id)
job_tags = list(job.tags.names())
job_tags_len = 100 / len(job_tags)
filtered = Worker.objects.filter(tags__name__in=job_tags).annotate(overlap=Count('id') * job_tags_len).order_by('-overlap').distinct()