Introduction
When having a performance issue, the first thing the DBA needs is to define what the problem is. The first thing I ask when someone says, “it’s running slow…” is to respond, “can you please give me a list of the top 10 worst queries.” Usually, the response is, “I don’t know exactly what they are…”
This note will explain how to isolate the queries by letting the computer tell you where the problems are.
The process is simple, it encompasses the following methodology:
- Turn on SQL Server Profiler
- Run it for a few hours filtering on long duration or high reads
- Save the profiler trace into a temporary table
- Run a few queries against the data
- Prioritize them as a working list to attack
The key concept: Long running queries hammer the disk and cause poor cache hit ratios. If too many users run them, the disk subsystem can suffer because a few users are monopolizing all the resources.
Collecting the Data
Typically, I’ll start up profiler and run it for 2 or 3 hours to capture a representative sample of data. Then I’ll use this information to make my decisions. This data collected will serve as a baseline for whether I got better or worse as I tune.
- Start up SQL Server Profiler. Collect on these two events:
- RPC:Completed
- SQL:BatchComplete
- These two will show queries that have completed.
- Filter on columns:
- Duration and/or,
- The criteria should start off with 30,000
- The unit of measure is milliseconds, hence, 30,000 = 30 seconds.
- Reads
- The criteria should start with 10,000
- The unit of measure is 8K. 10,000 reads = 81,920,000 bytes of IO. If you are doing 81M of IO, you probably have a query that needs investigating!
Let the trace run for a while. Then stop is and “Save As” a profiler trace file. Once it’s in a file, the DBA can start analyzing the data.
Rolling Up Queries
Usually, the easiest way to analyze the information is from within SQL Server. Import the trace file and then run queries against it to find the problems.
The trace file itself has the issues in it. We’ve already filtered for long running queries. Now, we just need to organize the data a bit.
First import the trace file using the following SQL Server function call:
use tempdb
go
SELECTIDENTITY(int, 1, 1)AS RowNumber,*INTO profiler_analysis
FROM::fn_trace_gettable('c:\tmp\profilerdata.trc',default)
go
Next, get an idea of what you are looking at. For example, how much IO occurred for the monitoring run? What was the overall duration for all the long running queries?
selectsum(Reads)*8192. 'Bytes Read'from profiler where Reads isnotNULL;
go
Bytes Read
---------------------------------------
277,818,179,584
(1 row(s) affected)
selectsum(Duration)/1000. 'Number of Seconds'from profiler_analysis where Duration isnotNULL;
go
Number of Seconds
---------------------------------------
8914.941000
(1 row(s) affected)
selectsum(Duration)/1000./3600.'Number of Hours'from profiler_analysis where Duration isnotNULL;
go
Number of Hours
---------------------------------------
2.47637250000
(1 row(s) affected)
The following query shows the total amount of Reads by user:
selectconvert(char(20),LoginName)'User Name',sum(Reads)*8192. 'Total Bytes Read'
from profiler_analysis
where Reads isnotNULL
groupby LoginName
orderbysum(Reads)desc
go
UserNameTotal Bytes Read
-------------------- ---------------------------------------
jde 178276974592
sa 53321981952
usera 20445822976
userb 10917101568
userc 5227069440
userd 2638151680
usere2081947648
userf 2063392768
userg 1147445248
userh 670384128
useri 406921216
userj316260352
userk 169639936
userl 55287808
userm43941888
usern 19152896
usero 9584640
userp 4866048
userq 2252800
(19 row(s) affected)
The following query shows the total amount of seconds by user:
selectconvert(char(20),LoginName)'User Name',sum(Duration)/1000. 'Seconds Run'
from profiler_analysis
where Duration isnotNULL
groupby LoginName
orderbysum(Duration)desc
go
UserNameSeconds Run
-------------------- ---------------------------------------
jde5456.860000
JDEService1999.540000
sa313.579000
usera240.462000
userb176.452000
userc135.483000
userd115.636000
usere100.881000
userf90.918000
userg76.247000
userh52.656000
useri40.941000
userj37.466000
userk28.084000
userl19.438000
userm11.656000
usern11.329000
usero4.673000
userp2.640000
(19 row(s) affected)
Finally, these two queries show the DBA the top 10 queries for Reads and Duration:
selecttop 10 RowNumber, Duration, Reads, LoginName
from profiler_analysis
orderby Reads desc
go
RowNumberDurationReadsLoginName
----------- -------------------- -------------------- -----------
4852572303886609sa
239876901370174usera
8531018101264835userb
6291423701264577jde
71188901264197JDE
7478596801035sa
28913970740066sa
2647063661617sa
6658576356531jde
1933483313031userb
(10 row(s) affected)
selecttop 10 RowNumber, Duration, Reads, LoginName
from profiler_analysis
orderby Duration desc
go
RowNumberDurationReadsLoginName
----------- -------------------- -------------------- -----------
50333521323JDEService
502333026631JDEService
4852572303886609sa
528 224200108896jde
8312035902JDEService
347184183103651jde
53218140014JDEService
62717505677320jde
411153933307751JDE
82315274623JDEService
(10 row(s) affected)
To find the actual query for RowNumber 485 run a select statement and get the TextData column which will have the statement. The following analysis shows that the high IO and duration were due to an index being built:
select RowNumber, Duration, Reads,TextData
from profiler_analysis
where RowNumber = 485
go
RowNumberDurationReadsTextData
----------- -------------------- -------------------- ---------------
4852572303886609
create index x on CRPDTA.F4111( ILITM, ILLOCN, ILLOTN, ILKCOO, ILDOCO, ILDCTO, ILLNID )
The query for RowNumber 155 shows that it did 162,386 Reads for 1,330,266,112 bytes probably because the user put a wild card on the front and back of the criteria:F4220WHERE (F4220.SWLOTN LIKE @P1)'%208547%' And forced a table scan.
select RowNumber, Duration, Reads,TextData
from profiler_analysis
where RowNumber =155
go
RowNumberDurationReads
----------- -------------------- --------------------
15510186162386
(1 row(s) affected)
declare @P1 intset @P1=180151263 declare @P2 intset @P2=1 declare @P3 intset @P3=1 declare @P4 intset @P4=5 exec sp_cursoropen @P1 output, N'SELECT F4220.SWSHPJ, F4220.SWLITM, F4220.SWDCTO, F4220.SWSRL2, F4220.SWDSC2, F4220.SWSRL1, F4220.SWLOTN, F4220.SWLOCN, F4220.SWAITM, F4220.SWSFXO, F4220.SWDOCO, F4220.SWAN8, F4220.SWITM, F4220.SWMCU, F4220.SWDSC1, F4220.SWLNID, F4220.SWORDJ, F4220.SWKCOO, F4220.SWVEND, F4220.SWSHANFROM PRODDTA.F4220 F4220WHERE (F4220.SWLOTN LIKE @P1)ORDER BY F4220.SWLITM ASC, F4220.SWLOTN ASC, F4220.SWSRL1 ASC ', @P2 output, @P3 output, @P4 output, N'@P1 varchar(8000) ','%208547%'select @P1, @P2, @P3, @P4
Summary
The above queries collect data and roll it up into a digestable format instead of a pile of random bits. Once the DBA has the performance data they can prioritize the issues and figure out a plan of attack. For example, change the configuration of the application so users cannot put in leading wildcards that force a table scan.
Knowing your system and where it’s issues are forms the backbone of being able to isolate and resolve performance problems. The good news is it’s to do with a small tool box.
The key concept is to let SQL Server collect the information, and using a few simple queries shown above, show you where the exact problems exist.