如何在聚合查询中使用索引?
How is an index used in an query with aggregation?
给定这样的查询
SELECT franchise, MAX(worth)
FROM figurines
GROUP BY franchise
什么样的索引可以加快查询速度,数据库将如何使用该索引?
如果需要更多详细信息,请假设列 franchise
的基数相对较低,而 worth
的基数非常高。
我个人正在使用 mysql,但我正在寻找对算法的一般理解,而不是特定于供应商的实现细节。
场景一:无索引(阅读全文table)
foreach(page in table.pages)
{
foreach(row in page.rows)
{
Compare and accumulate franchise and worth from row
}
}
-- Total IO = table.pages
情形 2:仅特许经营指数
foreach(page in index.pages)
{
foreach(indexRow in page.rows)
{
tableRow = table.fetchRow(indexRow); // + 1 page of IO for each row
Compare and accumulate franchise from indexRow and worth from tableRow
}
}
-- Total IO = index.pages + table.rows
-- this is likely to be greater than Scenario 1...
-- so optimizer should prefer that plan instead.
场景 3:按顺序覆盖指数(特许经营权、价值)。
foreach(page in index.pages)
{
foreach(row in page.rows)
{
Compare and accumulate franchise and worth from row
}
}
-- Total IO = index.pages
-- Assuming that index is thinner than table, a win!
场景 4:使用场景 3 中索引的已知特许经营列表的不同查询
foreach(franchise in franchises)
{
SELECT MAX(worth) FROM figurines WHERE franchise = franchise
}
...
foreach(franchise in franchises)
{
search into the index looking for the last record with this franchise
// this is usually less than 10 pages of IO in my experience.
}
-- Total IO = count of franchise * 10
-- super win!
场景 4 不同,因为它记录的是查找而不是扫描。
给定这样的查询
SELECT franchise, MAX(worth)
FROM figurines
GROUP BY franchise
什么样的索引可以加快查询速度,数据库将如何使用该索引?
如果需要更多详细信息,请假设列 franchise
的基数相对较低,而 worth
的基数非常高。
我个人正在使用 mysql,但我正在寻找对算法的一般理解,而不是特定于供应商的实现细节。
场景一:无索引(阅读全文table)
foreach(page in table.pages)
{
foreach(row in page.rows)
{
Compare and accumulate franchise and worth from row
}
}
-- Total IO = table.pages
情形 2:仅特许经营指数
foreach(page in index.pages)
{
foreach(indexRow in page.rows)
{
tableRow = table.fetchRow(indexRow); // + 1 page of IO for each row
Compare and accumulate franchise from indexRow and worth from tableRow
}
}
-- Total IO = index.pages + table.rows
-- this is likely to be greater than Scenario 1...
-- so optimizer should prefer that plan instead.
场景 3:按顺序覆盖指数(特许经营权、价值)。
foreach(page in index.pages)
{
foreach(row in page.rows)
{
Compare and accumulate franchise and worth from row
}
}
-- Total IO = index.pages
-- Assuming that index is thinner than table, a win!
场景 4:使用场景 3 中索引的已知特许经营列表的不同查询
foreach(franchise in franchises)
{
SELECT MAX(worth) FROM figurines WHERE franchise = franchise
}
...
foreach(franchise in franchises)
{
search into the index looking for the last record with this franchise
// this is usually less than 10 pages of IO in my experience.
}
-- Total IO = count of franchise * 10
-- super win!
场景 4 不同,因为它记录的是查找而不是扫描。