Be Aware of CARTESIAN PRODUCT When Using Join Keys in SQL
I got a massive satisfaction last week when I was able to bring down the execution time of a sql stored procedure from > 3 hours to mere 2 minutes. It all boiled down to a missing joining key that led to a cartesian product within the tables.
Here is how the basic structure of the query (there were plenty of joining columns, however for our convenience lets consider 2 columns only) looked like
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
left join table4 t4 on t1.k1 = t4.k1
And this query ran forever when given a particular set of parameters. I started with the approach of joining the parent table with each subsequent table until I met the road block. For example:-
Step1 Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
was returning rows
Step 2
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
was returning rows
Step 3
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
left join table4 t4 on t1.k1 = t4.k1
ran forever
As you can clearly notice, this query was missing a joining condition with column k2. On including the joining condition for column k2, query executed 1000 times quickly.
In the end, it turned out to be a bug or a blunder. But the takeaway is that whenever you have a query involving joins of multiple tables, always make sure that it does not lead to a cartesian product.
Happy coding!
Here is how the basic structure of the query (there were plenty of joining columns, however for our convenience lets consider 2 columns only) looked like
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
left join table4 t4 on t1.k1 = t4.k1
And this query ran forever when given a particular set of parameters. I started with the approach of joining the parent table with each subsequent table until I met the road block. For example:-
Step1 Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
was returning rows
Step 2
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
was returning rows
Step 3
Select * from
table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
left join table4 t4 on t1.k1 = t4.k1
ran forever
As you can clearly notice, this query was missing a joining condition with column k2. On including the joining condition for column k2, query executed 1000 times quickly.
In the end, it turned out to be a bug or a blunder. But the takeaway is that whenever you have a query involving joins of multiple tables, always make sure that it does not lead to a cartesian product.
Happy coding!
Could you please post some data with this also? It will give more clear picture.
ReplyDeleteApologies mate, cant share data as it is a NDA project. The parent table i.e. table1 had 100k records.
DeleteLokesh,
ReplyDeleteIt is right that code runs longers because table is filtered go k1 column. But i don't think that it is a Cartesian product. Please put some light on this.
I don't see the Cartesian product on this either. Usually when I see Cartesian is because the non-ANSI join syntax is used and the join criteria for a table is completely left out. For example
ReplyDeleteselect count(*)
from t1, t2, t3
where t1.c1 = t2.c1
Notice there is no join criteria for t3. I see this problem more with Oracle than SQL as many Oracle developers have been slower to adopt the ANSI syntax.
This really has covered a great insight on joining Keys can led to Cartisan product in table will led to reduction in Execution Time. I found myself lucky to visit your page and came across this insightful read on Sql Query tutorial. Please allow me to share similar work on Sql training course:-
ReplyDeletehttps://www.youtube.com/watch?v=phvcwekT9ZA