什么会导致基于 month() 和 year() 的 mysql 查询比直接查询其等效数字更快?

What would cause a mysql query based on month() and year() to be faster than querying against its numeric equivalent directly?

我试图通过在插入时将 YEAR() 和 MONTH() 函数替换为它们的数字等价物来加速 MySql 5.7 中的 SQL 查询。具体来说,我为此添加了列 reportMonth、reportYear 和 bigint(20)。

有趣的是,这种方法要慢得多。为什么? 运行 函数较少的查询不是应该更快吗?

这大约需要 12 秒才能完成。 (使用 YEAR() 和 MONTH() 函数)

SELECT 
   ProductTitle AS 'ProductTitle',  
   YEAR(ReportPeriodEndDay) AS 'Year',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 1  THEN OrderedRevenue END) AS 'Jan',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 2  THEN OrderedRevenue END) AS 'Feb',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 3  THEN OrderedRevenue END) AS 'Mar',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 4  THEN OrderedRevenue END) AS 'Apr',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 5  THEN OrderedRevenue END) AS 'May',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 6  THEN OrderedRevenue END) AS 'Jun',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 7  THEN OrderedRevenue END) AS 'Jul',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 8  THEN OrderedRevenue END) AS 'Aug',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 9  THEN OrderedRevenue END) AS 'Sep',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 10 THEN OrderedRevenue END) AS 'Oct',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 11 THEN OrderedRevenue END) AS 'Nov',
   SUM(CASE WHEN MONTH(ReportPeriodEndDay) = 12 THEN OrderedRevenue END) AS 'Dec',
   SUM(OrderedRevenue) AS 'TOTAL'
 FROM 
   `sales_diagnostic_summary_orderedrevenuelevel`
 GROUP BY ProductTitle, Year
 WITH ROLLUP;

EXPLAIN

的结果
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, sales_diagnostic_summary_orderedrevenuelevel, , ALL, , , , , 745140, 100.00, Using temporary; Using filesort

这需要超过 120 秒(使用等效数字)

SELECT 
   ProductTitle AS 'ProductTitle',  
   reportYear AS 'Year',
   SUM(CASE WHEN reportMonth = 1  THEN OrderedRevenue END) AS 'Jan',
   SUM(CASE WHEN reportMonth = 2  THEN OrderedRevenue END) AS 'Feb',
   SUM(CASE WHEN reportMonth = 3  THEN OrderedRevenue END) AS 'Mar',
   SUM(CASE WHEN reportMonth = 4  THEN OrderedRevenue END) AS 'Apr',
   SUM(CASE WHEN reportMonth = 5  THEN OrderedRevenue END) AS 'May',
   SUM(CASE WHEN reportMonth = 6  THEN OrderedRevenue END) AS 'Jun',
   SUM(CASE WHEN reportMonth = 7  THEN OrderedRevenue END) AS 'Jul',
   SUM(CASE WHEN reportMonth = 8  THEN OrderedRevenue END) AS 'Aug',
   SUM(CASE WHEN reportMonth = 9  THEN OrderedRevenue END) AS 'Sep',
   SUM(CASE WHEN reportMonth = 10 THEN OrderedRevenue END) AS 'Oct',
   SUM(CASE WHEN reportMonth = 11 THEN OrderedRevenue END) AS 'Nov',
   SUM(CASE WHEN reportMonth = 12 THEN OrderedRevenue END) AS 'Dec',
   SUM(OrderedRevenue) AS 'TOTAL'
 FROM 
   `sales_diagnostic_summary_orderedrevenuelevel`
 GROUP BY ProductTitle, Year
 WITH ROLLUP;

EXPLAIN

的结果
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, sales_diagnostic_summary_orderedrevenuelevel, , ALL, , , , , 745140, 100.00, Using filesort

Table 映射通过 DESCRIBE

# Field, Type, Null, Key, Default, Extra
ASIN, text, YES, MUL, , 
ProductTitle, text, YES, , , 
OrderedRevenue, double, YES, , , 
OrderedRevenuePercentOfTotal, double, YES, , , 
OrderedRevenuePriorPeriod, double, YES, , , 
OrderedRevenueLastYear, double, YES, , , 
OrderedUnits, double, YES, , , 
OrderedUnitsPercentOfTotal, double, YES, , , 
OrderedUnitsPriorPeriod, double, YES, , , 
OrderedUnitsLastYear, double, YES, , , 
SubcategorySalesRank, bigint(20), YES, , , 
SubcategoryBetterWorse, double, YES, , , 
AverageSalesPrice, double, YES, , , 
AverageSalesPricePriorPeriod, double, YES, , , 
ChangeInGVPriorPeriod, double, YES, , , 
ChangeInGVLastYear, double, YES, , , 
RepOOS, double, YES, , , 
RepOOSPercentOfTotal, double, YES, , , 
RepOOSPriorPeriod, double, YES, , , 
LBBPrice, double, YES, , , 
ReportPeriodStartDay, datetime, YES, , , 
ReportPeriodEndDay, datetime, YES, , , 
ReportDownloadDate, datetime, YES, , , 
ReportPeriod, text, YES, , , 
ReportFilename, text, YES, , , 
marketplace, text, YES, , , 
vendorId, text, YES, , , 
reportYear, bigint(20), YES, MUL, , 
reportMonth, bigint(20), YES, MUL, , 
reportWeek, bigint(20), YES, , , 
reportQuarter, bigint(20), YES, , , 
reportDayOfWeek, bigint(20), YES, , , 
reportDayOfYear, bigint(20), YES, , , 

似乎 一些优化链接到 YEAR 函数,DESCRIBE 对此一无所知(这是合乎逻辑的)。

我的实现方式是,当 YEAR 函数被调用时,如果它发现 MONTH 也被调用,它会对月份值进行额外的装箱。然后,这部分工作已经完成,并且比通过一个不相关领域的 CASE 更好(因为它被称为 reportMonth 并不能使它相关)。

由于每年不超过 12 个月,这似乎是一个值得的优化 - 它不会使用太多内存并且潜在的回报是可观的。

如果每个产品的销售额很大,您可以尝试按 reportYear 和 reportMonth 直接分组,然后 运行 将 CASE 旋转为包装 SELECT。类似于:

SELECT 
    ProductTitle,  
    reportYear as `Year`,
    SUM(IF (reportMonth = 1, OrderedRevenue, 0) AS 'Jan',
    ...
    SUM(IF (reportMonth = 12, OrderedRevenue, 0) AS 'Dec',
    SUM(OrderedRevenue) AS 'TOTAL'
FROM (
    SELECT productTitle, 
        reportYear, 
        reportMonth, 
        SUM(OrderedRevenue) AS OrderedRevenue
    FROM
        `sales_diagnostic_summary_orderedrevenuelevel`
        GROUP BY ProductTitle, reportYear, reportMonth
) AS firstGrouping;

很有可能,有索引

CREATE INDEX myIndex ON
sales_diagnostic_summary_orderedrevenuelevel(ProductTitle, 
   reportYear, reportMonth, OrderedRevenue);

虽然在 UPDATE/DELETE/INSERTs 期间花费了一些东西,但在这种 SELECT 期间应该有所改善。您可能想尝试 DATE 版本的 double-select 和 indexing on for size.

此外,我认为没有任何理由将年、月和周存储为 BIGINT。它不会在性能或存储方面产生太大差异,但我仍然闻起来有点难闻。