Wednesday, June 23, 2010

SQL Overlap Test

Many times I come across SQL where the developer is trying to check for overlap between two ranges (usually date ranges, but sometimes numbers). For example, to meet the requirement "select all employees whose hired from and to dates overlap with the project start and end dates".

The developer sketches out all the possible overlap scenarios and finds four:
1) End of range A overlaps start of range B:

2) Start of range A overlaps end of range B:

3) Range A falls entirely within range B:

4) Range B falls entirely within range A:
This leads to SQL like this (assuming all values are not null):
where (a.start < b.start 
and a.end between b.start and b.end)
or (a.start between b.start
and b.end and a.end > b.end)
or (a.start between b.start and b.end
and a.end between b.start and b.end)
or (b.start between a.start and a.end
and b.end between a.start and a.end)
If, as is often the case, the end dates are allowed to be null, meaning "forever", then the SQL becomes yet more complex. In some cases I have seen attempts at this where the developer has got it wrong and missed out one of the cases altogether.

In fact it is much easier to look at the cases where A and B do not overlap, because there are only two such cases:
1) Range A ends before range B starts:

2) Range A starts after range B ends:
This leads to the much simpler SQL:
where not (a.end < b.start or a.start > b.end)
which can be rearranged to the even simpler (though perhaps less intuitive):
where a.end >= b.start 
and a.start <= b.end
Even if we have to allow for null end dates this is now very simple:
where nvl(a.end,b.start) >= b.start 
and a.start <= nvl(b.end,a.start)
I don't claim that any of the above is original, I am sure this algorithm appears in many SQL and other books. But I see variants of the long-winded version (sometimes bug-ridden ones) so often I thought it worth documenting here so I can point to it in future.