SQL 服务器索引包括问题

SQL Server indexing includes questions

我在工作应用程序中处理一些错误的 SQL 调用时遇到了麻烦。我一直在阅读索引、调整和基准测试。这是我收集的一些规则(如果这听起来正确,请告诉我):

现在我在查找信息时遇到困难的一件事是,如果查询 select 查询不属于任何索引但使用的是 where 语句的列怎么办?是否使用了索引并且叶节点命中了 table 并查看了它的关联行?

例如:table

Id col1 col2 col3

CREATE INDEX my_index
ON my_table (col1)

SELECT Id, col1, col2, col3
FROM my_table
WHERE col1 >= 3 AND col1 <= 6

这里用的是my_index吗?如果是,它如何解析 Id、col2、col3?它是否指向 table 行并获取值?

为了回答您的问题,是的,使用了 my_index。是的,您的索引将指向 table 行并在那里选择 id、col2 和 col3 值。这就是索引的作用。

关于你的'rules'

  • 规则 1 有道理。除了我通常不 'include' 索引中的其他列这一事实。如上所述,索引将返回 table 并快速检索您需要的行。

  • 规则2,我不是很懂。您创建索引,SQL 服务器将决定使用或不使用哪些索引。你真的不必担心。

  • 规则3,顺序并没有什么区别。

希望对您有所帮助。

来自dba.stackexchange.com

There are a few concepts and terms that are important to understand when dealing with indexes. Seeks, scans, and lookups are some of the ways that indexes will be utilized through select statements. Selectivity of key columns is integral to determining how effective an index can be.

A seek happens when the SQL Server Query Optimizer determines that the best way to find the data you have requested is by scanning a range within an index. Seeks typically happen when a query is "covered" by an index, which means the seek predicates are in the index key and the displayed columns are either in the key or included. A scan happens when the SQL Server Query Optimizer determines that the best way to find the data is to scan the entire index and then filter the results. A lookup typically occurs when an index does not include all requested columns, either in the index key or in the included columns. The query optimizer will then use either the clustered key (against a clustered index) or the RID (against a heap) to "lookup" the other requested columns.

Typically, seek operations are more efficient than scans, due to physically querying a smaller data set. There are situations where this is not the case, such as a very small initial data set, but that goes beyond the scope of your question.

Now, you asked how to determine how effective an index is, and there are a few things to keep in mind. A clustered index's key columns are called a clustering key. This is how records are made unique in the context of a clustered index. All nonclustered indexes will include the clustered key by default, in order to perform lookups when necessary. All indexes will be inserted to, updated to, or deleted from for every respective DML statement. That having been said, it is best to balance performance gains in select statements against performance hits in insert, delete, and update statements.

In order to determine how effective an index is, you must determine the selectivity of your index keys. Selectivity can be defined as a percentage of distinct records to total records. If I have a [person] table with 100 total records and the [first_name] column contains 90 distinct values, we can say that the [first_name] column is 90% selective. The higher the selectivity, the more efficient the index key. Keeping selectivity in mind, it is best to put your most selective columns first in your index key. Using my previous [person] example, what if we had a [last_name] column that was 95% selective? We would want to create an index with [last_name], [first_name] as the index key.

I know this was a bit long-winded answer, but there really are a lot of things that go into determining how effective an index will be, and a lot things you must weigh any performance gains against.