MySQL 优化:单个查询中的多个模式(最常见的值)

MySQL Optimization: Multiple Modes (Most Common Value) in a Single Query

我的房地产网站的一项功能允许用户订阅特定市场并通过电子邮件接收定期更新(称为 "market analyses")。分析需要将一些值计算为众数(最常见的值)。在做一些研究后,我了解到 MySQL does not have a MODE() function 特别是因为可能有多种模式并且可能没有任何模式,而且因为如果没有至少两个值,你甚至无法获得单一模式专栏。

Which leads me to this query

SELECT AVG(Price) as AveragePrice,
AVG(BedroomsTotal) as AverageNumberOfBedrooms,
AVG(BathroomsTotal) as AverageNumberOfBathrooms,
AVG(SquareFeetTotal) as AverageSquareFeetTotal,
AVG(LotSize) as AverageLotSize,
AVG(AssociationFee) as AverageAssociationFee,
(SELECT PropertyType FROM (SELECT PropertyType, count(PropertyType) AS magnitude
FROM listings
GROUP BY PropertyType
ORDER BY magnitude DESC
LIMIT 1) as mpt) as MajorityPropertyType,
(SELECT magnitude FROM (SELECT PropertyType, count(PropertyType) AS magnitude
FROM listings
GROUP BY PropertyType
ORDER BY magnitude DESC
LIMIT 1) as mptc) as MajorityPropertyTypeCount,
(SELECT ArchitecturalStyle FROM (SELECT ArchitecturalStyle, count(ArchitecturalStyle) AS magnitude
FROM listings
GROUP BY ArchitecturalStyle
ORDER BY magnitude DESC
LIMIT 1) as mas) as MajorityArchitecturalStyle,
(SELECT magnitude FROM (SELECT ArchitecturalStyle, count(ArchitecturalStyle) AS magnitude
FROM listings
GROUP BY ArchitecturalStyle
ORDER BY magnitude DESC
LIMIT 1) as masc) as MajorityArchitecturalStyleCount,
AVG(YearBuilt) as AverageYearBuilt,
(SELECT PropertyCondition FROM (SELECT PropertyCondition, count(PropertyCondition) AS magnitude
FROM listings
GROUP BY PropertyCondition
ORDER BY magnitude DESC
LIMIT 1) as mpc) as MajorityPropertyCondition,
(SELECT magnitude FROM (SELECT PropertyCondition, count(PropertyCondition) AS magnitude
FROM listings
GROUP BY PropertyCondition
ORDER BY magnitude DESC
LIMIT 1) as mpcc) as MajorityPropertyConditionCount
FROM srep.active_listings 
WHERE concat(City, ', ', StateOrProvince)
LIKE "Boston, MA";

这个查询工作得很好,但问题是它需要 10 秒来执行,查询成本为 11,000,而且它甚至不包含应该在 [=12= 中的一小部分条件语句] 条款。还有 18 个其他条件语句需要包含。

问题:

How can I optimize this query? Should I be using a newer version of MySQL? Should I be using a different database altogether?

当前执行计划

结果

使用另一种语言(如Python)获取列的模式。下面是一个使用 Web API 的示例。您必须先安装 mysqlclientflask 包,然后才能运行此代码。

App.py

import MySQLdb
import MySQLdb.cursors
from statistics import mode
from flask import Flask, jsonify

app = Flask(__name__)

RES = {}

@app.route('/')
def bar():
  conn = MySQLdb.connect('localhost', user='root', cursorclass=MySQLdb.cursors.DictCursor)
  cursor = conn.cursor()
  sql = 'SELECT ArchitecturalStyle FROM srep.active_listings'
  cursor.execute(sql)
  data = cursor.fetchall()
  row = [obj['ArchitecturalStyle'] for obj in data]
  RES["ArchitecturalStyle"] = mode(row)
  return jsonify(RES)

Test run using all 10 attributes

如您所见,获得相同结果所需的时间是 Python 的 1/10(与 MySQL 相比)。

一个改进是去掉一半的子查询:

    (   SELECT  PropertyType
            FROM  
            (
                SELECT  PropertyType, count(PropertyType) AS magnitude
                    FROM  listings
                    GROUP BY  PropertyType
                    ORDER BY  magnitude DESC
                    LIMIT  1) as mpt
    ) as MajorityPropertyType,

-->

    (   SELECT  PropertyType
                    FROM  listings
                    GROUP BY  PropertyType
                    ORDER BY  COUNT(*) DESC
                    LIMIT  1
    ) as MajorityPropertyType

此特定查询需要 INDEX(PropertyType)(除非它已经是 PRIMARY KEY)。

另一个改进是避免在函数调用中隐藏索引列:

WHERE concat(City, ', ', StateOrProvince) LIKE "Boston, MA"

-->

WHERE City = 'Boston' AND StateOrProvince = 'MA'

与组合 INDEX(City, StateOrProvince) 一起(按任意顺序)。这将避免扫描整个 table,而是只查看 Boston MA 行。

即使有一个 MODE 函数,它也可能不会更快 -- 它基本上必须执行您的代码执行的操作。