Many people (myself included) start of as programmers, and only later start to work with databases. After a while, the developer notices two things about databases:
- Tables have a fixed set of columns; adding a column to a table involves changing any application code that accesses it.
- There are lots of “lookup” tables that typically have a code and a description.
Being a natural problem solver, and knowing that code re-use is a “good thing”, the developer thinks he can do better. Quite often, there is no one more experienced around to advise against, and so the developer implements his ideas; this is so common that both ideas have a name: the One True Lookup Table (OTLT) and the Entity-Attribute-Value (EAV) model.
One True Lookup Table (OTLT)
The idea: instead of having lots of “lookup” tables like these:
create table order_status (status_code varchar2(10), status_desc varchar2(40) );
create table country (country_code varchar2(3), country_name varchar2(30) );
create table priority (priority_no number(1), priority_desc varchar2(40) );
…
… why not just have ONE lookup table like this?:
create table lookup (lookup_type varchar2(10), lookup_code varchar2(20), lookup_desc varchar2(100) );
Great! Now we only need one “maintain lookup” screen instead of 3 (or 33 or whatever).
The trouble is, the developer doesn’t consider the disadvantages of this. Most importantly, we can no longer use foreign key constraints to protect the integrity of the data – unless one of the following conditions is true:
- lookup_code is unique within table lookup
- every child table uses 2 columns referencing (lookup_type,lookup_code) for its foreign keys
In most cases, neither of the above applies, and responsibility for data integrity resides solely in the application code. Show me such a database, and I’ll show you some data where the code does not match anything in the lookup table.
Entity-Attribute-Value (EAV)
The idea: instead of “hard-coding” columns into the table like this:
create table emp (empno integer primary key, ename varchar2(20), sal number, job varchar2(10));
insert into emp (empno, ename, sal, job) values (1234,’ANDREWS’,1000,’CLERK’);
… why not have total flexibility like this:
create table emp (empno integer primary key );
create table emp_value (empno references emp, code varchar2(20), value varchar2(100));
insert into emp (empno) values (1234);
insert into emp_values (‘NAME’,’ANDREWS’);
insert into emp_values (‘SAL’,’1000’);
insert into emp_values (‘JOB’,’CLERK’);
Great! Now we are free to invent new “attributes” at any time, without having to alter the table or the application!
However, consider a simple query: “show the names of all employees who are clerks and earn less than 2000”.
With the standard emp table:
select ename from emp where job=’CLERK’ and sal < 2000;
With the EAV tables:
select ev1.name
from emp_values ev1, emp_values ev2 emp_values ev3
where ev1.code = ‘NAME’
and ev1.empno = ev2.empno
and ev2.code = ‘JOB’
and ev2.value = ‘CLERK’
and ev1.empno = ev3.empno
and ev3.code = ‘SAL’
and TO_NUMBER(ev3.value) < 2000;
Not only is that much harder to follow, it is likely to be much slower to process too. And this is about the most simple of queries!
Conclusion
OTLT and EAV are “generic” approaches that are seductive to programmers, but are actually a bad idea for most databases. Resist the temptation!