Quantcast
Channel: Database+Disk+Performance™
Viewing all articles
Browse latest Browse all 20

Finding the Top 10 Long Running Queries

$
0
0

Introduction


When having a performance issue, the first thing the DBA needs is to define what the problem is. The first thing I ask when someone says, “it’s running slow…” is to respond, “can you please give me a list of the top 10 worst queries.” Usually, the response is, “I don’t know exactly what they are…”


This note will explain how to isolate the queries by letting the computer tell you where the problems are.


The process is simple, it encompasses the following methodology:


  1. Turn on SQL Server Profiler
  2. Run it for a few hours filtering on long duration or high reads
  3. Save the profiler trace into a temporary table
  4. Run a few queries against the data
  5. Prioritize them as a working list to attack

The key concept: Long running queries hammer the disk and cause poor cache hit ratios. If too many users run them, the disk subsystem can suffer because a few users are monopolizing all the resources.


Collecting the Data


Typically, I’ll start up profiler and run it for 2 or 3 hours to capture a representative sample of data. Then I’ll use this information to make my decisions. This data collected will serve as a baseline for whether I got better or worse as I tune.


  1. Start up SQL Server Profiler. Collect on these two events:
    1. RPC:Completed
    2. SQL:BatchComplete
    3. These two will show queries that have completed.
  2. Filter on columns:
    1. Duration and/or,
      1. The criteria should start off with 30,000
        1. The unit of measure is milliseconds, hence, 30,000 = 30 seconds.
    1. Reads
      1. The criteria should start with 10,000
        1. The unit of measure is 8K. 10,000 reads = 81,920,000 bytes of IO. If you are doing 81M of IO, you probably have a query that needs investigating!


Let the trace run for a while. Then stop is and “Save As” a profiler trace file. Once it’s in a file, the DBA can start analyzing the data.


Rolling Up Queries


Usually, the easiest way to analyze the information is from within SQL Server. Import the trace file and then run queries against it to find the problems.


The trace file itself has the issues in it. We’ve already filtered for long running queries. Now, we just need to organize the data a bit.


First import the trace file using the following SQL Server function call:


use tempdb

go

SELECTIDENTITY(int, 1, 1)AS RowNumber,*INTO profiler_analysis

FROM::fn_trace_gettable('c:\tmp\profilerdata.trc',default)

go


Next, get an idea of what you are looking at. For example, how much IO occurred for the monitoring run? What was the overall duration for all the long running queries?


selectsum(Reads)*8192. 'Bytes Read'from profiler where Reads isnotNULL;

go



Bytes Read

---------------------------------------

277,818,179,584

(1 row(s) affected)


selectsum(Duration)/1000. 'Number of Seconds'from profiler_analysis where Duration isnotNULL;

go



Number of Seconds

---------------------------------------

8914.941000

(1 row(s) affected)



selectsum(Duration)/1000./3600.'Number of Hours'from profiler_analysis where Duration isnotNULL;

go



Number of Hours

---------------------------------------

2.47637250000

(1 row(s) affected)



The following query shows the total amount of Reads by user:


selectconvert(char(20),LoginName)'User Name',sum(Reads)*8192. 'Total Bytes Read'

from profiler_analysis

where Reads isnotNULL

groupby LoginName

orderbysum(Reads)desc

go



UserNameTotal Bytes Read

-------------------- ---------------------------------------

jde 178276974592

sa 53321981952

usera 20445822976

userb 10917101568

userc 5227069440

userd 2638151680

usere2081947648

userf 2063392768

userg 1147445248

userh 670384128

useri 406921216

userj316260352

userk 169639936

userl 55287808

userm43941888

usern 19152896

usero 9584640

userp 4866048

userq 2252800

(19 row(s) affected)




The following query shows the total amount of seconds by user:



selectconvert(char(20),LoginName)'User Name',sum(Duration)/1000. 'Seconds Run'

from profiler_analysis

where Duration isnotNULL

groupby LoginName

orderbysum(Duration)desc

go



UserNameSeconds Run

-------------------- ---------------------------------------

jde5456.860000

JDEService1999.540000

sa313.579000

usera240.462000

userb176.452000

userc135.483000

userd115.636000

usere100.881000

userf90.918000

userg76.247000

userh52.656000

useri40.941000

userj37.466000

userk28.084000

userl19.438000

userm11.656000

usern11.329000

usero4.673000

userp2.640000

(19 row(s) affected)



Finally, these two queries show the DBA the top 10 queries for Reads and Duration:



selecttop 10 RowNumber, Duration, Reads, LoginName

from profiler_analysis

orderby Reads desc

go



RowNumberDurationReadsLoginName

----------- -------------------- -------------------- -----------

4852572303886609sa

239876901370174usera

8531018101264835userb

6291423701264577jde

71188901264197JDE

7478596801035sa

28913970740066sa

2647063661617sa

6658576356531jde

1933483313031userb

(10 row(s) affected)



selecttop 10 RowNumber, Duration, Reads, LoginName

from profiler_analysis

orderby Duration desc

go



RowNumberDurationReadsLoginName

----------- -------------------- -------------------- -----------

50333521323JDEService

502333026631JDEService

4852572303886609sa

528 224200108896jde

8312035902JDEService

347184183103651jde

53218140014JDEService

62717505677320jde

411153933307751JDE

82315274623JDEService

(10 row(s) affected)



To find the actual query for RowNumber 485 run a select statement and get the TextData column which will have the statement. The following analysis shows that the high IO and duration were due to an index being built:



select RowNumber, Duration, Reads,TextData

from profiler_analysis

where RowNumber = 485

go



RowNumberDurationReadsTextData

----------- -------------------- -------------------- ---------------

4852572303886609

create index x on CRPDTA.F4111( ILITM, ILLOCN, ILLOTN, ILKCOO, ILDOCO, ILDCTO, ILLNID )




The query for RowNumber 155 shows that it did 162,386 Reads for 1,330,266,112 bytes probably because the user put a wild card on the front and back of the criteria:F4220WHERE (F4220.SWLOTN LIKE @P1)'%208547%' And forced a table scan.



select RowNumber, Duration, Reads,TextData

from profiler_analysis

where RowNumber =155

go



RowNumberDurationReads

----------- -------------------- --------------------

15510186162386

(1 row(s) affected)

declare @P1 intset @P1=180151263 declare @P2 intset @P2=1 declare @P3 intset @P3=1 declare @P4 intset @P4=5 exec sp_cursoropen @P1 output, N'SELECT F4220.SWSHPJ, F4220.SWLITM, F4220.SWDCTO, F4220.SWSRL2, F4220.SWDSC2, F4220.SWSRL1, F4220.SWLOTN, F4220.SWLOCN, F4220.SWAITM, F4220.SWSFXO, F4220.SWDOCO, F4220.SWAN8, F4220.SWITM, F4220.SWMCU, F4220.SWDSC1, F4220.SWLNID, F4220.SWORDJ, F4220.SWKCOO, F4220.SWVEND, F4220.SWSHANFROM PRODDTA.F4220 F4220WHERE (F4220.SWLOTN LIKE @P1)ORDER BY F4220.SWLITM ASC, F4220.SWLOTN ASC, F4220.SWSRL1 ASC ', @P2 output, @P3 output, @P4 output, N'@P1 varchar(8000) ','%208547%'select @P1, @P2, @P3, @P4




Summary


The above queries collect data and roll it up into a digestable format instead of a pile of random bits. Once the DBA has the performance data they can prioritize the issues and figure out a plan of attack. For example, change the configuration of the application so users cannot put in leading wildcards that force a table scan.


Knowing your system and where it’s issues are forms the backbone of being able to isolate and resolve performance problems. The good news is it’s to do with a small tool box.

The key concept is to let SQL Server collect the information, and using a few simple queries shown above, show you where the exact problems exist.


Powered By Qumana

Viewing all articles
Browse latest Browse all 20

Trending Articles