Inner Join

Inner join, outer joy

Intro🔗

Joining tables is the most basic skill that is needed when someone works with SQL. The most common ones are the INNER JOIN, the LEFT OUTER JOIN and sometimes the RIGH OUTER JOIN. This post is about the INNER JOIN, and it's variants.

DB2, traditional join🔗

IIRCC when Iseries DB2 went from V5.3 to V5.4, the order in which the tables where declared suddenly mattered. And we had to change the following (bad) code.

file: i5_bad.sql

SELECT *
FROM main a,
    subsub c,
    sub b
WHERE
        a.id = b.main_id
    AND b.id = c.sub_id

Into the better:

file: i5_better.sql

SELECT *
FROM main a,
    sub b,
    subsub c
WHERE
        a.id = b.main_id
    AND b.id = c.sub_id

Of course this is a very simple example. The queries that we made (cough, generated by php, couch), were much more complicated: many tables, Group By's, COALESCE's, CASE, .. And because the database was multi-tennant, the relational model of these tables was more complex. To join 2 tables, we had to match 2-4 columns.

Here is a more realistic example; we'll just show the joining conditions and leave the rest simple.

file: real.sql

SELECT *
FROM ororta a,
    bebeta b,
    opopta c,
    plplta d,
    repota e
WHERE
        a.ororaa = b.ororbb
    AND a.cocoaa = b.cocobb
    AND b.opopbb = c.opopcc
    AND b.cocobb = c.cococc
    AND d.orordd = b.ororbb
    AND d.opopdd = b.opopbb
    AND b.ororbb = e.ororee
    AND e.opopee = b.opopee
    AND b.typebb = 'AU'
    AND b.statee <> 'L'
    AND d.typdd = 'P'
    AND e.typeee = 'P'
    AND a.cocoaa = 15
    AND a.stataa IN ( 'Z', 'A' )
    AND a.qtyaaa - a.delaaa > 0

That is horrible to read and a nigthmare for maintenance. Let's improve that a bit by using better aliases then 'a', 'b', 'c', 'd', 'e'.

file: improved.sql

SELECT *
FROM ororta order,
    bebeta treatment,
    opopta operation,
    plplta planning,
    repota reporting
WHERE
        order.ororaa = treatment.ororbb
    AND order.cocoaa = treatment.cocobb
    AND treatment.opopbb = operation.opopcc
    AND treatment.cocobb = operation.cococc
    AND planning.orordd = treatment.ororbb
    AND planning.opopdd = treatment.opopbb
    AND treatment.ororbb = reporting.ororee
    AND reporting.opopee = treatment.opopbb
    AND treatment.typebb = 'AU'
    AND treatment.statbb <> 'L'
    AND planning.typedd = 'P'
    AND reporting.typeee = 'P'
    AND order.cocoaa = 15
    AND order.stataa IN ( 'Z', 'A' )
    AND order.qtyaaa - order.delaaa > 0

Now that we know what table is what, lets move the conditions from the WHERE clause into a INNER JOIN

file: inner.sql

SELECT *
FROM ororta order
 INNER JOIN bebeta treatment ON
       order.ororaa = treatment.ororbb
    AND order.cocoaa = treatment.cocobb
    AND treatment.typebb = 'AU'
    AND treatment.statbb <> 'L'
 INNER JOIN opopta operation ON
    AND treatment.opopbb = operation.opopcc
    AND treatment.cocobb = operation.cococc
 INNER JOIN plplta planning ON
    AND planning.orordd = treatment.ororbb
    AND planning.opopdd = treatment.opopbb
    AND planning.typedd = 'P'
 INNER JOIN repota reporting ON
    AND treatment.ororbb = reporting.ororee
    AND reporting.opopdd = treatment.opopbb
    AND reporting.typeee = 'P'
WHERE
    AND order.cocoaa = 15
    AND order.stataa IN ( 'Z', 'A' )
    AND order.qtyaaa - order.delaaa > 0

Wow, we can see immediately what table is by what criteria. Nevertheless, it is still difficult to read as our mind needs to parse the aliases on both sides of the '='.

file: readable.sql

SELECT *
FROM ororta order
 INNER JOIN bebeta treatment ON
       treatment.ororbb = order.ororaa
    AND treatment.cocobb = order.cocoaa
    AND treatment.typebb = 'AU'
    AND treatment.statbb <> 'L'
 INNER JOIN opopta operation ON
    AND operation.opopcc = treatment.opopbb
    AND operation.cococc = treatment.cocobb
 INNER JOIN plplta planning ON
    AND planning.orordd = treatment.ororbb
    AND planning.opopdd = treatment.opopbb
    AND planning.typedd = 'P'
 INNER JOIN repota reporting ON
    AND reporting.ororee = treatment.ororbb
    AND reporting.opopee = treatment.opopbb
    AND reporting.typeee = 'P'
WHERE
    AND order.cocoaa = 15
    AND order.stataa IN ( 'Z', 'A' )
    AND order.qtyaaa - order.delaaa > 0

Now we have query that is is rather easy to read and to maintain. I lik this way of organizing queries, and if you are consistent, the step to 'LEFT/RIGHT OUTER JOIN` is realy small.

Side effect: Speedup🔗

By moving a complex, query from the WHERE clause to INNER JOIN, we sometimes measured speed ups from around 13 seconds to 3 seconds. Of course, with better indexes there would probably be even more improvements possible. It could be that this is a DB2 specific thing and that the optimizer could make a better execution plan. I do not know, the report got a bit faster.

Disadvantage🔗

The real problem that I see is that the oracle die-hards have problems with the positioning of the critaria fields because of the (+) (-) feature.