Inner Join

Inner join, outer joy

Intro🔗

Joining tables is the most basic skill that is needed when someone works with SQL. The most common ones are the INNER JOIN, the LEFT OUTER JOIN and sometimes the RIGH OUTER JOIN. This post is about the INNER JOIN, and it's variants.

DB2, traditional join🔗

IIRCC when Iseries DB2 went from V5.3 to V5.4, the order in which the tables where declared suddenly mattered. And we had to change the following (bad) code.

file: i5_bad.sql

SELECT *
FROM main a,
	subsub c,
	sub b
WHERE
		a.id = b.main_id
	AND b.id = c.sub_id

Into the better:

file: i5_better.sql

SELECT *
FROM main a,
	sub b,
	subsub c
WHERE
		a.id = b.main_id
	AND b.id = c.sub_id

Of course this is a very simple example. The queries that we made (cough, generated by php, couch), were much more complicated: many tables, Group By's, COALESCE's, CASE, .. And because the database was multi-tennant, the relational model of these tables was more complex. To join 2 tables, we had to match 2-4 columns.

Here is a more realistic example; we'll just show the joining conditions and leave the rest simple.

file: real.sql

SELECT *
FROM ororta a,
	bebeta b,
	opopta c,
	plplta d,
	repota e
WHERE
		a.ororaa = b.ororbb
	AND a.cocoaa = b.cocobb
	AND b.opopbb = c.opopcc
	AND b.cocobb = c.cococc
	AND d.orordd = b.ororbb
	AND d.opopdd = b.opopbb
	AND b.ororbb = e.ororee
	AND e.opopee = b.opopee
	AND b.typebb = 'AU'
	AND b.statee <> 'L'
	AND d.typdd = 'P'
	AND e.typeee = 'P'
	AND a.cocoaa = 15
	AND a.stataa IN ( 'Z', 'A' )
	AND a.qtyaaa - a.delaaa > 0

That is horrible to read and a nigthmare for maintenance. Let's improve that a bit by using better aliases then 'a', 'b', 'c', 'd', 'e'.

file: improved.sql

SELECT *
FROM ororta order,
	bebeta treatment,
	opopta operation,
	plplta planning,
	repota reporting
WHERE
		order.ororaa = treatment.ororbb
	AND order.cocoaa = treatment.cocobb
	AND treatment.opopbb = operation.opopcc
	AND treatment.cocobb = operation.cococc
	AND planning.orordd = treatment.ororbb
	AND planning.opopdd = treatment.opopbb
	AND treatment.ororbb = reporting.ororee
	AND reporting.opopee = treatment.opopbb
	AND treatment.typebb = 'AU'
	AND treatment.statbb <> 'L'
	AND planning.typedd = 'P'
	AND reporting.typeee = 'P'
	AND order.cocoaa = 15
	AND order.stataa IN ( 'Z', 'A' )
	AND order.qtyaaa - order.delaaa > 0

Now that we know what table is what, lets move the conditions from the WHERE clause into a INNER JOIN

file: inner.sql

SELECT *
FROM ororta order
 INNER JOIN bebeta treatment ON
	   order.ororaa = treatment.ororbb
	AND order.cocoaa = treatment.cocobb
	AND treatment.typebb = 'AU'
	AND treatment.statbb <> 'L'
 INNER JOIN opopta operation ON
	AND treatment.opopbb = operation.opopcc
	AND treatment.cocobb = operation.cococc
 INNER JOIN plplta planning ON
	AND planning.orordd = treatment.ororbb
	AND planning.opopdd = treatment.opopbb
	AND planning.typedd = 'P'
 INNER JOIN repota reporting ON
	AND treatment.ororbb = reporting.ororee
	AND reporting.opopdd = treatment.opopbb
	AND reporting.typeee = 'P'
WHERE
	AND order.cocoaa = 15
	AND order.stataa IN ( 'Z', 'A' )
	AND order.qtyaaa - order.delaaa > 0

Wow, we can see immediately what table is by what criteria. Nevertheless, it is still difficult to read as our mind needs to parse the aliases on both sides of the '='.

file: readable.sql

SELECT *
FROM ororta order
 INNER JOIN bebeta treatment ON
	   treatment.ororbb = order.ororaa
	AND treatment.cocobb = order.cocoaa
	AND treatment.typebb = 'AU'
	AND treatment.statbb <> 'L'
 INNER JOIN opopta operation ON
	AND operation.opopcc = treatment.opopbb
	AND operation.cococc = treatment.cocobb
 INNER JOIN plplta planning ON
	AND planning.orordd = treatment.ororbb
	AND planning.opopdd = treatment.opopbb
	AND planning.typedd = 'P'
 INNER JOIN repota reporting ON
	AND reporting.ororee = treatment.ororbb
	AND reporting.opopee = treatment.opopbb
	AND reporting.typeee = 'P'
WHERE
	AND order.cocoaa = 15
	AND order.stataa IN ( 'Z', 'A' )
	AND order.qtyaaa - order.delaaa > 0

Now we have query that is is rather easy to read and to maintain. I lik this way of organizing queries, and if you are consistent, the step to 'LEFT/RIGHT OUTER JOIN` is realy small.

Side effect: Speedup🔗

By moving a complex, query from the WHERE clause to INNER JOIN, we sometimes measured speed ups from around 13 seconds to 3 seconds. Of course, with better indexes there would probably be even more improvements possible. It could be that this is a DB2 specific thing and that the optimizer could make a better execution plan. I do not know, the report got a bit faster.

Disadvantage🔗

The real problem that I see is that the oracle die-hards have problems with the positioning of the critaria fields because of the (+) (-) feature.