Friday, 23 September 2016

SQL QUERY NIGHTMARE

Last week, we faced a basic and yet the only major problem that sql developers/admin had to deal with and yes your guess is right, It was with SQL QUERY RUNNING SLOW.  I will be explaining the problem below, however I want everyone reading this to help me find answers because I still haven't found one!

One more amazing aspect of the issue was that it worked fairly quickly in Development but was running like a tortoise in Production even though the number of records were practically the same.

I'll start by throwing some light on to the execution plan, so that we narrow down to the code which was the culprit here. The code is enclosed into a stored proc. On contacting my dba to enquire about the issue, I got the following replay:-



I must hand it to our dba to provide such a detailed response. 

So my job was now to look at the code and modify it. What I did was to shrink the calender table into a temp table consisting of few rows that would speed up the operation. Below you would see the culprit code and a rectified version of it.

Culprit Code :-

-- Customer Table and latest advertiser with complete hierarichy details                                              
        INSERT  INTO reporting.Obtain_CustAM_Hier
                SELECT  cod.AdvertiserId ,
                        cod.SalesConsultantId AS end_employee_sid ,
                        COALESCE(st10.end_org_team_key, stmax.END_ORG_TEAM_KEY) end_org_team_key ,
                        COALESCE(st10.end_org_area_key, stmax.END_ORG_AREA_KEY) end_org_area_key ,
                        COALESCE(st10.end_org_region_key,
                                 stmax.END_ORG_REGION_KEY) end_org_region_key ,
                        COALESCE(st10.end_org_channel_key,
                                 stmax.END_ORG_CHANNEL_KEY) end_org_channel_key ,
                        COALESCE(st10.Cycle_Qtr, latest_cod.FIN_QTR_KEY) Cycle_Qtr ,
                        cod.FromDateKey ,
                        cod.ToDateKey
                FROM    #ADOWNER cod with (nolock)
                        INNER JOIN ( SELECT DISTINCT
                                            DT.FIN_QTR_KEY ,
                                            AdvertiserId ,
                                            MAX(fromdatekey) FROMDATEKEY ,
                                            MAX(ToDateKey) TODATEKEY
                                     FROM   ( SELECT    ADV.* ,
                                                        DT.FIN_QTR_KEY END_QTR
                                              FROM      ( SELECT
                                                              AdvertiserId ,
                                                              DT.FIN_QTR_KEY STRT_QTR ,
                                                              MAX(FromDateKey) AS fromdatekey ,
                                                              MAX(ToDateKey) AS ToDateKey
                                                          FROM
                                                              #ADOWNER AO
                                                              INNER JOIN CustomerMeasurementTool.workflow.Calendar DT ON DT.DATE_KEY = AO.FromDateKey
                                                              AND FIN_QTR_KEY >= 20131
                                                          GROUP BY AdvertiserId ,
                                                              DT.FIN_QTR_KEY
                                                        ) ADV
                                                        LEFT JOIN CustomerMeasurementTool.workflow.Calendar DT ON DT.DATE_KEY = ADV.ToDateKey
                                            ) AO
                                            LEFT JOIN CustomerMeasurementTool.workflow.Calendar DT ON DT.DATE_KEY BETWEEN fromdatekey AND ToDateKey
                                     WHERE  CALENDAR_DATE <= CONVERT(DATETIME, CONVERT(VARCHAR, GETDATE()
                                            - 1, 101))
                                     GROUP BY FIN_QTR_KEY ,
                                            AdvertiserId
                                   ) latest_cod ON cod.AdvertiserId = latest_cod.AdvertiserId
                                                   AND cod.FromDateKey = latest_cod.FROMDATEKEY
                        LEFT JOIN #ALL_CONSULATANTS_Prev st10 ON cod.SalesConsultantId = st10.end_employee_sid
                                                              AND latest_cod.FIN_QTR_KEY = st10.Cycle_Qtr
                        LEFT JOIN ( SELECT  EHD.EMPLOYEE_SID AS END_EMPLOYEE_SID ,
                                            EHD.ORG_TEAM_KEY AS END_ORG_TEAM_KEY ,
                                            EHD.ORG_AREA_KEY AS END_ORG_AREA_KEY ,
                                            EHD.ORG_REGION_KEY AS END_ORG_REGION_KEY ,
                                            EHD.ORG_CHANNEL_KEY AS END_ORG_CHANNEL_KEY
                                    FROM    [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] EHD with (nolock)
                                            INNER JOIN ( SELECT
                                                              EMPLOYEE_SID ,
                                                              MAX(BEGIN_DATE_KEY) AS BEGIN_DATE_KEY
                                                         FROM [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] with (nolock)
                                                         WHERE
                                                              POSITION_KEY > 0
                                                         GROUP BY EMPLOYEE_SID
                                                       ) EMP ON EHD.EMPLOYEE_SID = EMP.EMPLOYEE_SID
                                                              AND EHD.BEGIN_DATE_KEY = EMP.BEGIN_DATE_KEY
                                  ) stmax ON cod.SalesConsultantId = stmax.END_EMPLOYEE_SID; 
        
      

CHANGED CODE :-

I have changed the yellowed part of the code and insert new code higlighted in green as :-


-- Temp Tables to build a mini version of the Calender Table.
select * into #Mini_Cal from CustomerMeasurementTool.workflow.Calendar where FIN_QTR_KEY >= 20131

SELECT DISTINCT
                                            DT.FIN_QTR_KEY ,
                                            AdvertiserId ,
                                            MAX(fromdatekey) FROMDATEKEY ,
                                            MAX(ToDateKey) TODATEKEY into #latest_cod
                                     FROM   ( SELECT    ADV.* ,
                                                        DT.FIN_QTR_KEY END_QTR
                                              FROM      ( SELECT
                                                              AdvertiserId ,
                                                              DT.FIN_QTR_KEY STRT_QTR ,
                                                              MAX(FromDateKey) AS fromdatekey ,
                                                              MAX(ToDateKey) AS ToDateKey
                                                          FROM
                                                              #ADOWNER AO
                                                              INNER JOIN #Mini_Cal DT ON DT.DATE_KEY = AO.FromDateKey
                                                             
                                                          GROUP BY AdvertiserId ,
                                                              DT.FIN_QTR_KEY
                                                        ) ADV
                                                        LEFT JOIN #Mini_Cal DT ON DT.DATE_KEY = ADV.ToDateKey
                                            ) AO
                                            LEFT JOIN #Mini_Cal DT ON DT.DATE_KEY BETWEEN fromdatekey AND ToDateKey
                                     WHERE  CALENDAR_DATE <= CONVERT(DATETIME, CONVERT(VARCHAR, GETDATE()
                                            - 1, 101))
                                     GROUP BY FIN_QTR_KEY ,
                                            AdvertiserId

     
-- Customer Table and latest advertiser with complete hierarichy details                                              
        INSERT  INTO reporting.Obtain_CustAM_Hier
                SELECT  cod.AdvertiserId ,
                        cod.SalesConsultantId AS end_employee_sid ,
                        COALESCE(st10.end_org_team_key, stmax.END_ORG_TEAM_KEY) end_org_team_key ,
                        COALESCE(st10.end_org_area_key, stmax.END_ORG_AREA_KEY) end_org_area_key ,
                        COALESCE(st10.end_org_region_key,
                                 stmax.END_ORG_REGION_KEY) end_org_region_key ,
                        COALESCE(st10.end_org_channel_key,
                                 stmax.END_ORG_CHANNEL_KEY) end_org_channel_key ,
                        COALESCE(st10.Cycle_Qtr, latest_cod.FIN_QTR_KEY) Cycle_Qtr ,
                        cod.FromDateKey ,
                        cod.ToDateKey
                FROM    #ADOWNER cod with (nolock)
                        INNER JOIN  #latest_cod latest_cod ON cod.AdvertiserId = latest_cod.AdvertiserId
                                                   AND cod.FromDateKey = latest_cod.FROMDATEKEY
                        LEFT JOIN #ALL_CONSULATANTS_Prev st10 ON cod.SalesConsultantId = st10.end_employee_sid
                                                              AND latest_cod.FIN_QTR_KEY = st10.Cycle_Qtr
                        LEFT JOIN ( SELECT  EHD.EMPLOYEE_SID AS END_EMPLOYEE_SID ,
                                            EHD.ORG_TEAM_KEY AS END_ORG_TEAM_KEY ,
                                            EHD.ORG_AREA_KEY AS END_ORG_AREA_KEY ,
                                            EHD.ORG_REGION_KEY AS END_ORG_REGION_KEY ,
                                            EHD.ORG_CHANNEL_KEY AS END_ORG_CHANNEL_KEY
                                    FROM    [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] EHD with (nolock)
                                            INNER JOIN ( SELECT
                                                              EMPLOYEE_SID ,
                                                              MAX(BEGIN_DATE_KEY) AS BEGIN_DATE_KEY
                                                         FROM [BIStage].[dbo].[EMPLOYEE_HIST_DIM_V] with (nolock)
                                                         WHERE
                                                              POSITION_KEY > 0
                                                         GROUP BY EMPLOYEE_SID
                                                       ) EMP ON EHD.EMPLOYEE_SID = EMP.EMPLOYEE_SID
                                                              AND EHD.BEGIN_DATE_KEY = EMP.BEGIN_DATE_KEY

                                  ) stmax ON cod.SalesConsultantId = stmax.END_EMPLOYEE_SID;



Apologies for such a messy code, however if you are able to follow or understand the code (by going through highlighted rows) you will find that I have done following things:-
1. Limited the Calendar table.
2. Taken the complicated code out and put it into a temp table.


The above changes did the trick and the stored proc ran within the SLA of 6 minutes (was taking 3 hours on that fatal day).

However the query which remained unanswered is why the stored proc ran much faster in our DEV environment than compared to PROD environment.

I know I may not have given the complete info, please ask questions to solve this conundrum!


15 comments:

  1. Hi, First in comparing test to production, are the databases similar in size?

    Usually, when identical databases perform so drastically different for the same query, I would suspect that indexing and statistics related to those indexes are probably in need of an update. With a rebuild of all indexes and fresh statistics, let us know how your database query is performing.

    Good luck.
    Jeff Bennett
    Missouri, US

    ReplyDelete
    Replies
    1. Hi Benett....thanks for your comment. I was able to fine tune the query as you can see in the description. My main reason for writing this blog was to find out what was the issue. Coming back to your suggestion, development database has the stats refresh every week whereas in prod its on daily basis.

      Didnt try rebuilding the indexes though.

      Delete
  2. Is parameter sniffing a possible culprit?

    ReplyDelete
    Replies
    1. Hi Jon. There were no input parameters so that ruled it out.

      Delete
  3. Is this a new query in Prod or has it run fine before?

    Have you compared the query plans for Prod and Development, what about the io statistics for the query?

    Have you checked that the prod and development databases are in the same compatibility mode / using the same cardinality estimates.

    You say that the databases are about the same size but is the data similar?

    ReplyDelete
    Replies
    1. 1. Old query

      2. They were identically the same. In prod cpu io went to 100% which caused the stored proc to run very slow and you eventually have to terminate the execution. Whereas in dev io too reached 100 % but momentarily.

      3. Didnt check that would do tommorow, whats the impact for this please?

      4. its a transactional data. prod has a bit more data than dev, however not essentially the same

      Delete
  4. I counted a lot scans:
    > Outer Joins.
    > A function call in a WHERE clause.
    > Searches on greater than, less than. Note that all inequalities will always cause scans.

    I don't have the time to dissect your query but you can assume that a big query with a lot of scans will take some time to execute and consume much if not most of the RAM on your server.

    I think you can further improve the performance of your revised query.

    I recommend:
    > Break the query into smaller pieces.
    > Use the sub-queries to populate Temp tables before you execute the main query. In other words, take the sub-queries our of the main query. Join on the temp tables in the main query. Sometimes, you can use Table variables for this. There are rules that matter with Table variables; I don't have the time to cover them here. Used improperly, Table variables cause scans.
    > Try to eliminate outer joins wherever possible.
    > Try to eliminate any data comparison that is not an equality.
    > Eliminate all function calls from WHERE, ON, and HAVING statements.
    > Think about every component of your query and how many locks it must hold to complete. Do everything you can to reduce the number of locks held and their duration.
    > Get rid of the DISTINCT statement if you can. It causes additional work and execution time.

    One last thing. When databases are small and servers are not under load, every query tends to execute quickly. My guess is this might describe your test server.

    The production server is likely a very different environment. The database(s) are likely much bigger. It's a contentious environment where queries are competing for server resources and access to the same data. Of course, it's going to take longer to execute than it did on a test server.

    In the production environment, you want to minimize the locking and blocking times by dividing your query up into smaller segments that "nibble" at the database rather than take one large bite. Then, at the end, you can assemble the desired result by using one large query that JOINS on your temp tables. They're your temp tables so your query will not block other queries, nor can it be blocked.

    Hope this helps.

    ReplyDelete
    Replies
    1. Thanks for the detailed explanation. As I explained in the description the data in the dev and prod databases is almost the same.

      All I want to see why a particular stored proc runs for 5 minutes in dev however takes ages to run in prod even though nothing has changed in the prod and the same code was running without any problems since so long.

      Delete
  5. Another difference that could have in production environment and developer, is the traceflag's.

    In case of difference on traceflag 4199 I suggest you to see if the version of SQL Server was the same in both and all hotfixes was installed. This leads you to different plans.

    Another shot could be sp_configure: e.g. cost threshold for parallelism

    And the last, connection options: Ansi_nulls, Ansi_Warnings..

    ReplyDelete
  6. Are the server specs the same?
    How about the max parallelism?
    Any other settings on SQL different between servers?
    Any resource governor in place on live?

    ReplyDelete
  7. Have you compared execution plans between 2 environments? Any differences?
    Something as simple as disk fragmentation could cause the differences, or how many concurrent processes are running on both environments.

    ReplyDelete
  8. I appreciate your work on Sql Server It's such a wonderful read on MSBI course. Keep sharing stuffs like this. I am also educating people on similar MSBI training so if you are interested to know more you can watch this MSBI tutorial:-

    https://www.youtube.com/watch?v=tFG-VkaSvhI

    and regarding the Certification Process and step by step instructions please check on this LInk :-

    https://www.youtube.com/watch?v=yf__UkGxQ8c

    and if you are a beginner in MSBI then its important to get started from the scratch so check this link for better understanding:-

    https://www.youtube.com/watch?v=OzmdY0zCw4g
    https://www.youtube.com/watch?v=EdF9tZliIok

    ReplyDelete

SSIS Issues : A Day of Learning

Seldom are the days when you run into complex issues but resolve them in the shortest interval of time. Thanks to Larry Page and my fello...