Be Aware of CARTESIAN PRODUCT When Using Join Keys in SQL

I got a massive satisfaction last week when I was able to bring down the execution time of a sql stored procedure from > 3 hours to mere 2 minutes. It all boiled down to a missing joining key that led to a cartesian product within the tables.

Here is how the basic structure of the query (there were plenty of joining columns, however for our convenience lets consider 2 columns only) looked like

Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1


And this query ran forever when given a particular set of parameters. I started with the approach of joining the parent table with each subsequent table until I met the road block. For example:-

Step1 Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
was returning rows

Step 2 
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                             
was returning rows

Step 3
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1
ran forever

As you can clearly notice, this query was missing a joining condition with column k2. On including the joining condition for column k2, query executed 1000 times quickly.

In the end, it turned out to be a bug or a blunder. But the takeaway is that whenever you have a query involving joins of multiple tables, always make sure that it does not lead to a cartesian product.

Happy coding!

Comments

  1. Could you please post some data with this also? It will give more clear picture.

    ReplyDelete
    Replies
    1. Apologies mate, cant share data as it is a NDA project. The parent table i.e. table1 had 100k records.

      Delete
  2. Lokesh,
    It is right that code runs longers because table is filtered go k1 column. But i don't think that it is a Cartesian product. Please put some light on this.

    ReplyDelete
  3. I don't see the Cartesian product on this either. Usually when I see Cartesian is because the non-ANSI join syntax is used and the join criteria for a table is completely left out. For example

    select count(*)
    from t1, t2, t3
    where t1.c1 = t2.c1

    Notice there is no join criteria for t3. I see this problem more with Oracle than SQL as many Oracle developers have been slower to adopt the ANSI syntax.

    ReplyDelete
  4. This really has covered a great insight on joining Keys can led to Cartisan product in table will led to reduction in Execution Time. I found myself lucky to visit your page and came across this insightful read on Sql Query tutorial. Please allow me to share similar work on Sql training course:-

    https://www.youtube.com/watch?v=phvcwekT9ZA

    ReplyDelete

Post a Comment

Popular posts from this blog

SQL QUERY NIGHTMARE

Visual Studio Git Error | "terminal prompts disabled"

Issues Integrating Azure Data Factory with GITHUB | IN spite of admin rights on repository