|
Recently, I had the following problem to consider.
Two similar SELECT statements , the one difference being the WHERE clause.
If you notice the statement 1 uses a variable whereas statement 2 uses date
literals.
The problem related to the difference in performance. Statement 2
consistently performed better , by a margin of 10 seconds on average.
How could this be?I ran the same queries
on a 2 different machines , cleared caches but still the difference remained
consistent.
(statement 1)
Declare @Date as datetime
set @Date = '01-04-06'
SELECT prod as code
FROM orders
WHERE (ship_date between @Date and dateadd(d,6,@Date))
GROUP BY prod
(statement 2)
SELECT prod as code
FROM orders
WHERE (ship_date between '01-04-06' and dateadd(d,6,'01-04-06'))
GROUP BY prod
The first thing I did to diagnose was check the execution plan.
I noticed that INDEX SCAN was being used rather that INDEX SEEK.
This pointed to low selectivity in the data , meaning that the data posessed
a relatively low level of uniqueness. Essentially the index in place had a
low level of
helpfulness.
Usually a SEEK is more effective than a SCAN , but there are exceptions. For
example , if the table is small and memory cached.In this case the table had
over 1 million records
(As an aside if all else fails when it comes to hitting indexes , you can
state the actual index to hit directly withing the SQL statement , such as :
SELECT prod as code
FROM orders with (index (ix_my_index index))
WHERE (ship_date between '01-04-06' and dateadd(d,6,'01-04-06'))
GROUP BY prod
The second thing I did (a SQL 2005 feature) was to utilise the OPTIMIZE FOR
.The statement then became:
SELECT prod as code
FROM orders
WHERE (ship_date between '01-04-06' and dateadd(d,6,'01-04-06'))
GROUP BY prod
OPTION (OPTIMIZE FOR (@Date = '20060405'))
The literal used in the hint just needs to be one that suggests to the
optimizer what should be the desired query plan.
Essentially one that generates a similar rowcount estimate to values you are
likely to use for the @Date value.
Also, a colleague pointed out to me that using the date format 'dd-mm-yy'
has different interpretation whereas 'YYYYMMDD' only has 1 interpretation
Eventually, I didn't get the queries equal but only on avearge a couple of
seconds difference.
|