Mysql Django Rest Framework 的多重连接性能
Mysql Multiple Join Perfromance with Django Rest Framework
我一直在努力解决一个问题,我相信每个人在某个时候都会遇到这个问题。我现在有一个包含 150k 产品的小型数据库。 (随着我写这篇文章的增加。)
我正在为 api 使用 DRF,并且一直在为我有很多产品的类别性能而苦苦挣扎。
I.E 我有一个名为 Dresses 的类别,其中包含 34633
产品。
我的数据库是如何设计的,我在它下面有几个关系。
产品有类别、属性、颜色、尺码、相关产品M2M
查询
Count Query 809.83
ms
SELECT COUNT(*)
FROM (
SELECT DISTINCT `catalog_products`.`id` AS Col1
FROM `catalog_products`
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183)
) subquery
Result Query 2139.52
ms
SELECT DISTINCT `catalog_products`.`id`, `catalog_products`.`sku`,
`catalog_products`.`title`, `catalog_products`.`old_price`,
`catalog_products`.`price`, `catalog_products`.`sale`,
`catalog_products`.`original_categories`,
`catalog_products`.`original_conv_color`, `catalog_products`.`original_sizes`
FROM `catalog_products`
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183)
ORDER BY `catalog_products`.`title` ASC LIMIT 48
如您所见,查询的时间非常长,但这是我应用过滤器时的棘手部分,即 select 颜色过滤器和大小时间开始减少。
应用过滤器的查询
Count Query 264.63
ms
SELECT COUNT(*) FROM (
SELECT DISTINCT `catalog_products`.`id` AS Col1
FROM `catalog_products`
INNER JOIN `catalog_products_color` ON (`catalog_products`.`id` =
`catalog_products_color`.`products_id`)
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
INNER JOIN `catalog_sizethrough` ON (`catalog_products`.`id` =
`catalog_sizethrough`.`product_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_color`.`color_id` = 1
AND `catalog_products_category`.`categories_id` = 183
AND `catalog_sizethrough`.`size_id` IN (262)
AND `catalog_sizethrough`.`stock` = 1)
) subquery
Result Query 351.43
ms
SELECT DISTINCT `catalog_products`.`id`, `catalog_products`.`sku`,
`catalog_products`.`title`, `catalog_products`.`old_price`,
`catalog_products`.`price`, `catalog_products`.`sale`,
`catalog_products`.`original_categories`,
`catalog_products`.`original_conv_color`,
`catalog_products`.`original_sizes`
FROM `catalog_products`
INNER JOIN `catalog_products_color` ON (`catalog_products`.`id` =
`catalog_products_color`.`products_id`)
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
INNER JOIN `catalog_sizethrough` ON (`catalog_products`.`id` =
`catalog_sizethrough`.`product_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_color`.`color_id` = 1
AND `catalog_products_category`.`categories_id` = 183
AND `catalog_sizethrough`.`size_id` IN (262)
AND `catalog_sizethrough`.`stock` = 1)
ORDER BY `catalog_products`.`title` ASC
LIMIT 48
我已经尝试了很多方法来解决这个问题,但无法解决这个问题我需要提高我的页面加载速度,但是由于查询花费的时间更长,这对用户来说不太好。
我已经使用了 Eager loading,所以它不会再帮助改进,除非你有任何补充。
代码
Serializer
class ProductsListSerializer(serializers.ModelSerializer):
images = ImagesSerializer(many=True, source='get_first_two_images')
related_color = serializers.SerializerMethodField()
def get_related_color(self, obj):
return obj.related_color.count()
class Meta:
fields = (
'id',
'sku',
"title",
"old_price",
"price",
"sale",
"images",
"original_categories",
"related_color",
"original_conv_color",
"original_sizes",
)
model = Products
@staticmethod
def setup_eager_loading(queryset):
queryset = queryset.only('id', 'sku', 'title', 'old_price', 'price', 'sale', 'original_categories', 'original_conv_color', 'original_sizes').prefetch_related('images', 'related_color')
return queryset
View
class ProductsViewSet(viewsets.ReadOnlyModelViewSet):
queryset = Products.objects.all()
permission_classes = [DjangoModelPermissionsOrAnonReadOnly]
filter_backends = (filters.SearchFilter, DjangoFilterBackend, filters.OrderingFilter, CustomFilter, SizeFilter)
filter_fields = ('slug', 'code', 'sku', 'color', 'attributes', 'category', 'original_color')
min_max_fields = ('price', 'sale')
search_fields = ('title', 'original_color', 'original_categories', 'original_conv_color', 'original_sizes')
ordering_fields = ('sale', 'price', 'created_at')
pagination_class = StandardResultsSetPagination
def get_queryset(self):
if self.action == 'list':
queryset = self.get_serializer_class().setup_eager_loading(self.queryset.filter(deleted=0,in_stock=1))
return queryset
return self.queryset
def get_serializer_class(self):
if self.action == 'list':
return ProductsListSerializer
if self.action == 'retrieve':
return ProductsSerializer
return ProductsSerializer
只是一个建议
查看您的查询代码
确保你在
上有合适的复合索引
table catalog_products index on (deleted, in_stock, id )
table catalog_products_category index on ( categories_id, products_id, id )
并避免代码周围无用的 () ..
SELECT COUNT(*)
FROM (
SELECT DISTINCT `catalog_products`.`id` AS Col1
FROM `catalog_products`
INNER JOIN `catalog_products_category`
ON `catalog_products`.`id` = `catalog_products_category`.`products_id`
WHERE `catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183
) subquery
SELECT DISTINCT `catalog_products`.`id`
, `catalog_products`.`sku`
, `catalog_products`.`title`
, `catalog_products`.`old_price`
, `catalog_products`.`price`
, `catalog_products`.`sale`
, `catalog_products`.`original_categories`
, `catalog_products`.`original_conv_color`
, `catalog_products`.`original_sizes`
FROM `catalog_products`
INNER JOIN `catalog_products_category`
ON `catalog_products`.`id` = `catalog_products_category`.`products_id`
WHERE `catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183
ORDER BY `catalog_products`.`title` ASC LIMIT 48
最后的建议请记住,order by 对排序有相当大的影响,并且对结果引入限制的事实意味着,然而,必须仅根据在极限处指示的数量。
老实说,您的查询的优化似乎很有可能。我确定这是使用正确索引的问题。
我不知道每个 table 的列选择性的所有细节(这很重要)所以我假设 categories_id = 183
实际上会过滤掉大部分的行;我可能是错的。我将假设所有相关的 tables(catalog_products_category
、catalog_products_color
和 catalog_sizethrough
)具有相似的选择性。
如果是这样,那么我会推荐以下索引来加快搜索速度:
create index ix1 on catalog_products_category (categories_id, products_id);
create index ix2 on catalog_products_color (color_id, products_id);
create index ix3 on catalog_sizethrough (size_id, stock, products_id);
create index ix4 on catalog_products (deleted, in_stock, id);
试试看。如果您的查询仍然很慢,请post最慢的执行计划来解释它。
我一直在努力解决一个问题,我相信每个人在某个时候都会遇到这个问题。我现在有一个包含 150k 产品的小型数据库。 (随着我写这篇文章的增加。)
我正在为 api 使用 DRF,并且一直在为我有很多产品的类别性能而苦苦挣扎。
I.E 我有一个名为 Dresses 的类别,其中包含 34633
产品。
我的数据库是如何设计的,我在它下面有几个关系。
产品有类别、属性、颜色、尺码、相关产品M2M
查询
Count Query
809.83
ms
SELECT COUNT(*)
FROM (
SELECT DISTINCT `catalog_products`.`id` AS Col1
FROM `catalog_products`
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183)
) subquery
Result Query
2139.52
ms
SELECT DISTINCT `catalog_products`.`id`, `catalog_products`.`sku`,
`catalog_products`.`title`, `catalog_products`.`old_price`,
`catalog_products`.`price`, `catalog_products`.`sale`,
`catalog_products`.`original_categories`,
`catalog_products`.`original_conv_color`, `catalog_products`.`original_sizes`
FROM `catalog_products`
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183)
ORDER BY `catalog_products`.`title` ASC LIMIT 48
如您所见,查询的时间非常长,但这是我应用过滤器时的棘手部分,即 select 颜色过滤器和大小时间开始减少。
应用过滤器的查询
Count Query
264.63
ms
SELECT COUNT(*) FROM (
SELECT DISTINCT `catalog_products`.`id` AS Col1
FROM `catalog_products`
INNER JOIN `catalog_products_color` ON (`catalog_products`.`id` =
`catalog_products_color`.`products_id`)
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
INNER JOIN `catalog_sizethrough` ON (`catalog_products`.`id` =
`catalog_sizethrough`.`product_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_color`.`color_id` = 1
AND `catalog_products_category`.`categories_id` = 183
AND `catalog_sizethrough`.`size_id` IN (262)
AND `catalog_sizethrough`.`stock` = 1)
) subquery
Result Query
351.43
ms
SELECT DISTINCT `catalog_products`.`id`, `catalog_products`.`sku`,
`catalog_products`.`title`, `catalog_products`.`old_price`,
`catalog_products`.`price`, `catalog_products`.`sale`,
`catalog_products`.`original_categories`,
`catalog_products`.`original_conv_color`,
`catalog_products`.`original_sizes`
FROM `catalog_products`
INNER JOIN `catalog_products_color` ON (`catalog_products`.`id` =
`catalog_products_color`.`products_id`)
INNER JOIN `catalog_products_category` ON (`catalog_products`.`id` =
`catalog_products_category`.`products_id`)
INNER JOIN `catalog_sizethrough` ON (`catalog_products`.`id` =
`catalog_sizethrough`.`product_id`)
WHERE (`catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_color`.`color_id` = 1
AND `catalog_products_category`.`categories_id` = 183
AND `catalog_sizethrough`.`size_id` IN (262)
AND `catalog_sizethrough`.`stock` = 1)
ORDER BY `catalog_products`.`title` ASC
LIMIT 48
我已经尝试了很多方法来解决这个问题,但无法解决这个问题我需要提高我的页面加载速度,但是由于查询花费的时间更长,这对用户来说不太好。 我已经使用了 Eager loading,所以它不会再帮助改进,除非你有任何补充。
代码
Serializer
class ProductsListSerializer(serializers.ModelSerializer):
images = ImagesSerializer(many=True, source='get_first_two_images')
related_color = serializers.SerializerMethodField()
def get_related_color(self, obj):
return obj.related_color.count()
class Meta:
fields = (
'id',
'sku',
"title",
"old_price",
"price",
"sale",
"images",
"original_categories",
"related_color",
"original_conv_color",
"original_sizes",
)
model = Products
@staticmethod
def setup_eager_loading(queryset):
queryset = queryset.only('id', 'sku', 'title', 'old_price', 'price', 'sale', 'original_categories', 'original_conv_color', 'original_sizes').prefetch_related('images', 'related_color')
return queryset
View
class ProductsViewSet(viewsets.ReadOnlyModelViewSet):
queryset = Products.objects.all()
permission_classes = [DjangoModelPermissionsOrAnonReadOnly]
filter_backends = (filters.SearchFilter, DjangoFilterBackend, filters.OrderingFilter, CustomFilter, SizeFilter)
filter_fields = ('slug', 'code', 'sku', 'color', 'attributes', 'category', 'original_color')
min_max_fields = ('price', 'sale')
search_fields = ('title', 'original_color', 'original_categories', 'original_conv_color', 'original_sizes')
ordering_fields = ('sale', 'price', 'created_at')
pagination_class = StandardResultsSetPagination
def get_queryset(self):
if self.action == 'list':
queryset = self.get_serializer_class().setup_eager_loading(self.queryset.filter(deleted=0,in_stock=1))
return queryset
return self.queryset
def get_serializer_class(self):
if self.action == 'list':
return ProductsListSerializer
if self.action == 'retrieve':
return ProductsSerializer
return ProductsSerializer
只是一个建议 查看您的查询代码 确保你在
上有合适的复合索引table catalog_products index on (deleted, in_stock, id )
table catalog_products_category index on ( categories_id, products_id, id )
并避免代码周围无用的 () ..
SELECT COUNT(*)
FROM (
SELECT DISTINCT `catalog_products`.`id` AS Col1
FROM `catalog_products`
INNER JOIN `catalog_products_category`
ON `catalog_products`.`id` = `catalog_products_category`.`products_id`
WHERE `catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183
) subquery
SELECT DISTINCT `catalog_products`.`id`
, `catalog_products`.`sku`
, `catalog_products`.`title`
, `catalog_products`.`old_price`
, `catalog_products`.`price`
, `catalog_products`.`sale`
, `catalog_products`.`original_categories`
, `catalog_products`.`original_conv_color`
, `catalog_products`.`original_sizes`
FROM `catalog_products`
INNER JOIN `catalog_products_category`
ON `catalog_products`.`id` = `catalog_products_category`.`products_id`
WHERE `catalog_products`.`deleted` = 0
AND `catalog_products`.`in_stock` = 1
AND `catalog_products_category`.`categories_id` = 183
ORDER BY `catalog_products`.`title` ASC LIMIT 48
最后的建议请记住,order by 对排序有相当大的影响,并且对结果引入限制的事实意味着,然而,必须仅根据在极限处指示的数量。
老实说,您的查询的优化似乎很有可能。我确定这是使用正确索引的问题。
我不知道每个 table 的列选择性的所有细节(这很重要)所以我假设 categories_id = 183
实际上会过滤掉大部分的行;我可能是错的。我将假设所有相关的 tables(catalog_products_category
、catalog_products_color
和 catalog_sizethrough
)具有相似的选择性。
如果是这样,那么我会推荐以下索引来加快搜索速度:
create index ix1 on catalog_products_category (categories_id, products_id);
create index ix2 on catalog_products_color (color_id, products_id);
create index ix3 on catalog_sizethrough (size_id, stock, products_id);
create index ix4 on catalog_products (deleted, in_stock, id);
试试看。如果您的查询仍然很慢,请post最慢的执行计划来解释它。