|
Missing or
incorrect application of indexes within a MS SQL server 2000 db can decrease
performance . There is a plethora of information about the correct ways to
apply indexes, but I would like to focus on scenarios of improper use and then
look how the problems can be averted.First let’s look at some basic
information. MS SQL Server 2000 supports two types of indexes:
Clustered
indexes define the physical sorting of a database table’s rows . For this
reason, each database table may have only one clustered index. If a PRIMARY KEY
constraint is created for a database table and no clustered index currently
exists for that table, SQL Server automatically creates a clustered index on
the primary key. Think of a telephone directory when thinking of clustered
indexes
Non-clustered indexes are created
outside of the database table and contain a sorted list of references to the
table itself. MS SQL Server 2000 supports a maximum of 249 non-clustered
indexes per table. However, it’s important to keep in mind that non-clustered
indexes slow down the data modification and insertion process, so indexes
should be kept to a minimum. Think of a book where the index is in one place
and the pages in another.
Second,MS SQL Server 2000 indexes
are B-Trees. There is a single root page at the top of the tree, branching out
into N number of pages at each intermediate level until it reaches the bottom,
or leaf level, of the index. The index tree is traversed by following pointers
from the upper-level pages down through the lower-level pages. In addition,
each index level is a distinct page chain. There may be many intermediate
levels in an index. The number of levels is dependent on the index key width,
the type of index, and the number of rows/ pages in the table. The key point
for this articles is: the number of levels is important in relation to index
performance
Using this information , if a
developer had to set up an exact match query , they would most likely use a
non-clustered index. Because the index contains entries describing the exact
location in the table of the data values being searched for in the queries. For
example , if a search had the the WHERE clause “customer_id = @customer_id”
then SQL Server looks through the index , to find the exact entry that points
to exact page/row.
Another example, if an application frequently executes a
query to retrieve records between a range of dates, a clustered index would
most likely be the first choice. It can quickly locate the row containing the
beginning date, and then retrieve all adjacent rows in the table until the last
date is reached. Also, if there are columns used frequently to sort the data
retrieved from a table, it can be advantageous to cluster (physically sort) the
table on those columns to save the cost of a sort each time the columns are
queried.
Let’s look at some scenarios where a
poor choice is made for index choice and see what changes could be made to
increase performance.
First Example,A table exists with a
column called “yesNo” of data type CHAR with the INSERT statement allowing only
a “1” or “0”. There are 27,000 rows in the table. A non-clustered
index is added, in the belief that having an index is better than not having
one. Generally the Query Optmizer will
not use a non clustered index if the column values do not add up to about 95%
uniqueness. Uniqueness , meaning that at least 95% of the total table column
rows are unique. The consequence is unnecessary use of disk space used.
Second Example,a query is set up with
a BETWEEN clause. The BETWEEN clause is
pointing to a column called “DateOfOp”. A non clustered index is set up. SQL
Server will find the lower bound item by using the B-Tree (as mentioned above)
and use the pointer to get to the actual data page. Continuing, SQL Server
will walk through the index's leaf pages to the upper bound of the query.
The problem here is that because our data may be spread out over lots of
different data pages, the physical I/O could be high. Depending on the selectivity of our query (as
determined by the statistics that SQL Server gathers), the query optimizer may
decide it is cheaper to still do a table scan to answer the query.
Based on these observations, it
might be useful to make this column into a clustered index. In other words, we
could use the B-Tree to get to the first record, and from there read
sequentially through the data pages to the last record. Clustered Indexes
produce an Ordered Heap (on the assumption that a Heap is a table without
a clustered index) The data is physically stored in the order of the clustering
key. Finally, SQL Server builds a B-Tree structure on top of the newly
structured data with pointers to the data page. Keep in mind that when
considering to apply a clustered index ask these questions 1)Is the clustering
key something that is building sequentially 2)How is the data to be searched?
Third example,this table looks at
“myTable” which has 27,000 rows. The
developer has written a query that will generally return 20,000 rows on a WHERE
clause that points to a non- clustered index. The Query Optimizer considers a
non- clustered index to cost at minimum one page I/O per returned row. Now, it
takes very few page I/O before it becomes more efficient to scan the
whole table. The recommendation would be to either change the index to
clustered (keeping in mind it is only
possible to have one clusetered index per table) or change the size of the
resultset (if possible) .
In conclusion,the Query Optimizer
can rapidly and effectively analyze hundreds, or even thousands of index and
join possibilities. Having a greater number of narrow indexes provides the
Query Optimizer with more possibilities to choose from, which usually helps
performance. Having a lesser number of wide, multi-column indexes provides the
optimizer with fewer possibilities to choose from, which may hurt performance.
You
shouldn't have more indexes than are necessary to achieve adequate read
performance because of the overhead involved in updating those indexes. However
even most update-oriented operations require far more reading than writing.
Therefore keep experimenting with new indexes , if they might help. You can
always adjust it later.
|