Tuesday, 6 June 2017

Be Aware of CARTESIAN PRODUCT When Using Join Keys in SQL

I got a massive satisfaction last week when I was able to bring down the execution time of a sql stored procedure from > 3 hours to mere 2 minutes. It all boiled down to a missing joining key that led to a cartesian product within the tables.

Here is how the basic structure of the query (there were plenty of joining columns, however for our convenience lets consider 2 columns only) looked like

Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1


And this query ran forever when given a particular set of parameters. I started with the approach of joining the parent table with each subsequent table until I met the road block. For example:-

Step1 Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
was returning rows

Step 2 
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                             
was returning rows

Step 3
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1
ran forever

As you can clearly notice, this query was missing a joining condition with column k2. On including the joining condition for column k2, query executed 1000 times quickly.

In the end, it turned out to be a bug or a blunder. But the takeaway is that whenever you have a query involving joins of multiple tables, always make sure that it does not lead to a cartesian product.

Happy coding!

5 comments:

  1. Could you please post some data with this also? It will give more clear picture.

    ReplyDelete
    Replies
    1. Apologies mate, cant share data as it is a NDA project. The parent table i.e. table1 had 100k records.

      Delete
  2. Lokesh,
    It is right that code runs longers because table is filtered go k1 column. But i don't think that it is a Cartesian product. Please put some light on this.

    ReplyDelete
  3. I don't see the Cartesian product on this either. Usually when I see Cartesian is because the non-ANSI join syntax is used and the join criteria for a table is completely left out. For example

    select count(*)
    from t1, t2, t3
    where t1.c1 = t2.c1

    Notice there is no join criteria for t3. I see this problem more with Oracle than SQL as many Oracle developers have been slower to adopt the ANSI syntax.

    ReplyDelete
  4. This really has covered a great insight on joining Keys can led to Cartisan product in table will led to reduction in Execution Time. I found myself lucky to visit your page and came across this insightful read on Sql Query tutorial. Please allow me to share similar work on Sql training course:-

    https://www.youtube.com/watch?v=phvcwekT9ZA

    ReplyDelete

SSIS Issues : A Day of Learning

Seldom are the days when you run into complex issues but resolve them in the shortest interval of time. Thanks to Larry Page and my fello...