Some very occasional musings on matters concerning Oracle specifically or databases in general.
Tuesday, November 23, 2004
Vadim Tropashko on Complex Constraints
http://www.dbazine.com/tropashko8.shtml
As well as approaching the subject from first principles, the article also suggest further applications of such constraints, e.g. "foreign keys" to more than one parent table, and enforcing uniqueness of column values across more than one table.
Thursday, October 21, 2004
OTLT and EAV: the two big design mistakes all beginners make
Many people (myself included) start of as programmers, and only later start to work with databases. After a while, the developer notices two things about databases:
- Tables have a fixed set of columns; adding a column to a table involves changing any application code that accesses it.
- There are lots of “lookup” tables that typically have a code and a description.
Being a natural problem solver, and knowing that code re-use is a “good thing”, the developer thinks he can do better. Quite often, there is no one more experienced around to advise against, and so the developer implements his ideas; this is so common that both ideas have a name: the One True Lookup Table (OTLT) and the Entity-Attribute-Value (EAV) model.
One True Lookup Table (OTLT)
The idea: instead of having lots of “lookup” tables like these:
create table order_status (status_code varchar2(10), status_desc varchar2(40) );
create table country (country_code varchar2(3), country_name varchar2(30) );
create table priority (priority_no number(1), priority_desc varchar2(40) );
…
… why not just have ONE lookup table like this?:
create table lookup (lookup_type varchar2(10), lookup_code varchar2(20), lookup_desc varchar2(100) );
Great! Now we only need one “maintain lookup” screen instead of 3 (or 33 or whatever).
The trouble is, the developer doesn’t consider the disadvantages of this. Most importantly, we can no longer use foreign key constraints to protect the integrity of the data – unless one of the following conditions is true:
- lookup_code is unique within table lookup
- every child table uses 2 columns referencing (lookup_type,lookup_code) for its foreign keys
In most cases, neither of the above applies, and responsibility for data integrity resides solely in the application code. Show me such a database, and I’ll show you some data where the code does not match anything in the lookup table.
Entity-Attribute-Value (EAV)
The idea: instead of “hard-coding” columns into the table like this:
create table emp (empno integer primary key, ename varchar2(20), sal number, job varchar2(10));
insert into emp (empno, ename, sal, job) values (1234,’ANDREWS’,1000,’CLERK’);
… why not have total flexibility like this:
create table emp (empno integer primary key );
create table emp_value (empno references emp, code varchar2(20), value varchar2(100));
insert into emp (empno) values (1234);
insert into emp_values (‘NAME’,’ANDREWS’);
insert into emp_values (‘SAL’,’1000’);
insert into emp_values (‘JOB’,’CLERK’);
Great! Now we are free to invent new “attributes” at any time, without having to alter the table or the application!
However, consider a simple query: “show the names of all employees who are clerks and earn less than 2000”.
With the standard emp table:
select ename from emp where job=’CLERK’ and sal < 2000;
With the EAV tables:
select ev1.name
from emp_values ev1, emp_values ev2 emp_values ev3
where ev1.code = ‘NAME’
and ev1.empno = ev2.empno
and ev2.code = ‘JOB’
and ev2.value = ‘CLERK’
and ev1.empno = ev3.empno
and ev3.code = ‘SAL’
and TO_NUMBER(ev3.value) < 2000;
Not only is that much harder to follow, it is likely to be much slower to process too. And this is about the most simple of queries!
Conclusion
OTLT and EAV are “generic” approaches that are seductive to programmers, but are actually a bad idea for most databases. Resist the temptation!
Friday, October 15, 2004
"Pivot" Queries
DEPTNO JOB HEADCOUNT
---------- --------- ----------
10 CLERK 1
10 MANAGER 1
20 ANALYST 2
20 CLERK 2
20 MANAGER 1
30 CLERK 1
30 MANAGER 1
30 SALESMAN 4
… people would like to know how to produce this:
DEPTNO ANALYST CLERK MANAGER SALESMAN
---------- ---------- ---------- ---------- ----------
10 1 1
20 2 2 1
30 1 1 4
In Oracle, the normal way is to write a query using DECODE like this:
select deptno
, count(DECODE(job,'ANALYST', 1)) as "ANALYST"
, count(DECODE(job,'CLERK', 1)) as "CLERK"
, count(DECODE(job,'MANAGER', 1)) as "MANAGER"
, count(DECODE(job,'SALESMAN', 1)) as "SALESMAN"
from emp
group by deptno
order by deptno;
The DECODE ensures that the jobs go in the right columns.
Now this type of query is quite easy to write (when you know how), but is also quite laborious; also, if the list of jobs changes then the query needs to be changed too.
To make producing such queries a “piece of cake” I have built a package called pivot that can be used to generate the SQL like this:
begin
pivot.print_pivot_query
( 'deptno'
, 'job'
, 'emp'
, '1'
, agg_types => 'count'
);
end;
/
select deptno
, count(DECODE(job,'ANALYST', 1)) as "ANALYST"
, count(DECODE(job,'CLERK', 1)) as "CLERK"
, count(DECODE(job,'MANAGER', 1)) as "MANAGER"
, count(DECODE(job,'SALESMAN', 1)) as "SALESMAN"
from emp
group by deptno
order by deptno
PL/SQL procedure successfully completed.
The package code follows. Note that it requires the “parse” package that I posted earlier.
CREATE OR REPLACE PACKAGE pivot IS
TYPE ref_cursor IS REF CURSOR;
PROCEDURE print_pivot_query
( group_cols IN VARCHAR2 -- Comma-separated list of column(s), including table alias if applicable,
-- to be used to group the records
, pivot_col IN VARCHAR2 -- The name of the column to be pivoted, including table alias if applicable
, tables IN VARCHAR2 -- Comma-separated list of table(s), with table alias if applicable
, value_cols IN VARCHAR2 -- Comma-separated list of column(s), including table alias if applicable
-- for which aggregates are to be shown at each intersection
, pivot_values IN VARCHAR2 DEFAULT NULL -- Comma-separated list of values of pivot column to be used;
-- if omitted, all values are used (determined dynamically)
, agg_types IN VARCHAR2 DEFAULT NULL -- Comma-separated list of aggregate types, corresponding to value_cols;
-- if omitted, SUM is the default
, where_clause IN VARCHAR2 DEFAULT NULL -- where clause of query
, order_by IN VARCHAR2 DEFAULT NULL -- order by clause; if omitted, output is ordered by group_cols
);
FUNCTION pivot_query
( group_cols IN VARCHAR2
, pivot_col IN VARCHAR2
, tables IN VARCHAR2
, value_cols IN VARCHAR2
, pivot_values IN VARCHAR2 DEFAULT NULL
, agg_types IN VARCHAR2 DEFAULT NULL
, where_clause IN VARCHAR2 DEFAULT NULL
, order_by IN VARCHAR2 DEFAULT NULL
) RETURN VARCHAR2;
FUNCTION pivot_cursor
( group_cols IN VARCHAR2
, pivot_col IN VARCHAR2
, tables IN VARCHAR2
, value_cols IN VARCHAR2
, pivot_values IN VARCHAR2 DEFAULT NULL
, agg_types IN VARCHAR2 DEFAULT NULL
, where_clause IN VARCHAR2 DEFAULT NULL
, order_by IN VARCHAR2 DEFAULT NULL
) RETURN ref_cursor;
END;
/
CREATE OR REPLACE PACKAGE BODY pivot IS
g_mode varchar2(10);
g_sql varchar2(32767);
PROCEDURE pr
( p_text in varchar2
)
IS
v_text VARCHAR2(32767) := p_text;
BEGIN
if g_mode = 'PRINT' then
WHILE LENGTH(v_text) > 255 LOOP
DBMS_OUTPUT.PUT_LINE( SUBSTR(v_text,1,255) );
v_text := SUBSTR(v_text,256);
END LOOP;
DBMS_OUTPUT.PUT_LINE( v_text );
else
g_sql := g_sql || ' ' || p_text;
end if;
END pr;
/*
|| Generates the SQL statement for a pivot query based on parameters
|| Example:
|| create_pivot_query
|| ( group_cols => 'd.dname'
|| , pivot_col => 'e.job'
|| , tables => 'emp e, dept d'
|| , value_cols => 'e.sal,e.age'
|| , agg_types => 'min,max'
|| , where_clause => 'e.deptno = d.deptno'
|| );
|| Generates a query like:
|| select d.dname
|| , min(DECODE(e.job,'ANALYST', e.sal, 0 )) as "min_ANALYST_esal"
|| , max(DECODE(e.job,'ANALYST', e.age, 0 )) as "max_ANALYST_eage"
|| , min(DECODE(e.job,'CLERK', e.sal, 0 )) as "min_CLERK_esal"
|| , max(DECODE(e.job,'CLERK', e.age, 0 )) as "max_CLERK_eage"
|| , min(DECODE(e.job,'MANAGER', e.sal, 0 )) as "min_MANAGER_esal"
|| , max(DECODE(e.job,'MANAGER', e.age, 0 )) as "max_MANAGER_eage"
|| , min(DECODE(e.job,'PRESIDENT', e.sal, 0 )) as "min_PRESIDENT_esal"
|| , max(DECODE(e.job,'PRESIDENT', e.age, 0 )) as "max_PRESIDENT_eage"
|| , min(DECODE(e.job,'SALESMAN', e.sal, 0 )) as "min_SALESMAN_esal"
|| , max(DECODE(e.job,'SALESMAN', e.age, 0 )) as "max_SALESMAN_eage"
|| from emp e, dept d
|| where e.deptno = d.deptno
|| group by d.dname
|| order by d.dname
||
|| i.e. the parameters are used like this:
|| select
|| , (DECODE(,, , 0 ))
|| from
|| where
|| group by
|| order by
||
*/
PROCEDURE define_pivot_query
( group_cols IN VARCHAR2
, pivot_col IN VARCHAR2
, tables IN VARCHAR2
, value_cols IN VARCHAR2
, pivot_values IN VARCHAR2 DEFAULT NULL
, agg_types IN VARCHAR2 DEFAULT NULL
, where_clause IN VARCHAR2 DEFAULT NULL
, order_by IN VARCHAR2 DEFAULT NULL
)
IS
type ref_cursor is ref cursor;
rc ref_cursor;
pv_tab parse.varchar2_table;
val_tab parse.varchar2_table;
agg_tab parse.varchar2_table;
num_pvs integer := 0;
num_vals integer := 0;
num_aggs integer := 0;
alias varchar2(100);
BEGIN
g_sql := NULL;
-- Determine pivot values: use list if given, otherwise construct query to find them
if pivot_values is not null then
parse.delimstring_to_table( pivot_values, pv_tab, num_pvs );
else
open rc for 'select distinct ' || pivot_col || ' from ' || tables || ' where ' || nvl(where_clause,'1=1');
loop
num_pvs := num_pvs+1;
fetch rc into pv_tab(num_pvs);
exit when rc%notfound;
end loop;
close rc;
num_pvs := num_pvs-1;
end if;
parse.delimstring_to_table( value_cols, val_tab, num_vals );
-- Determine aggregate functions (default is SUM)
if agg_types is not null then
parse.delimstring_to_table( agg_types, agg_tab, num_aggs );
end if;
if num_aggs <> num_vals then
for i in num_aggs+1..num_vals loop
agg_tab(i) := 'sum';
end loop;
end if;
pr('select '||group_cols);
for pv in 1..num_pvs loop
pv_tab(pv) := trim(pv_tab(pv));
for val in 1..num_vals loop
val_tab(val) := trim(val_tab(val));
if num_vals = 1 then
alias := substr(pv_tab(pv),1,30);
else
alias := substr(agg_tab(val) || '_' || TRANSLATE(pv_tab(pv),'x. -()<>','x') || '_' || TRANSLATE(val_tab(val),'x. -()<>','x'),1,30);
end if;
pr( ', '||agg_tab(val)||'(DECODE(' || pivot_col || ',''' || pv_tab(pv) || ''', ' || val_tab(val) || '))'
|| ' as "' || alias || '"' );
end loop;
end loop;
pr('from ' || tables );
if where_clause is not null then
pr('where ' || where_clause);
end if;
pr('group by ' || group_cols);
pr('order by ' || NVL(order_by,group_cols));
END define_pivot_query;
PROCEDURE print_pivot_query
( group_cols IN VARCHAR2
, pivot_col IN VARCHAR2
, tables IN VARCHAR2
, value_cols IN VARCHAR2
, pivot_values IN VARCHAR2 DEFAULT NULL
, agg_types IN VARCHAR2 DEFAULT NULL
, where_clause IN VARCHAR2 DEFAULT NULL
, order_by IN VARCHAR2 DEFAULT NULL
)
IS
BEGIN
g_mode := 'PRINT';
define_pivot_query
( group_cols
, pivot_col
, tables
, value_cols
, pivot_values
, agg_types
, where_clause
, order_by
);
END;
FUNCTION pivot_query
( group_cols IN VARCHAR2
, pivot_col IN VARCHAR2
, tables IN VARCHAR2
, value_cols IN VARCHAR2
, pivot_values IN VARCHAR2 DEFAULT NULL
, agg_types IN VARCHAR2 DEFAULT NULL
, where_clause IN VARCHAR2 DEFAULT NULL
, order_by IN VARCHAR2 DEFAULT NULL
) RETURN VARCHAR2
IS
BEGIN
g_mode := 'TEXT';
define_pivot_query
( group_cols
, pivot_col
, tables
, value_cols
, pivot_values
, agg_types
, where_clause
, order_by
);
RETURN g_sql;
END;
FUNCTION pivot_cursor
( group_cols IN VARCHAR2
, pivot_col IN VARCHAR2
, tables IN VARCHAR2
, value_cols IN VARCHAR2
, pivot_values IN VARCHAR2 DEFAULT NULL
, agg_types IN VARCHAR2 DEFAULT NULL
, where_clause IN VARCHAR2 DEFAULT NULL
, order_by IN VARCHAR2 DEFAULT NULL
) RETURN ref_cursor
IS
rc ref_cursor;
BEGIN
g_mode := 'CURSOR';
define_pivot_query
( group_cols
, pivot_col
, tables
, value_cols
, pivot_values
, agg_types
, where_clause
, order_by
);
OPEN rc FOR g_sql;
RETURN rc;
END;
END;
/
Parsing delimited fields in a character string
The following package facilitates this:
CREATE OR REPLACE PACKAGE parse AS
/*
|| Package of utility procedures for parsing delimited or fixed position strings into tables
|| of individual values, and vice versa.
*/
TYPE varchar2_table IS TABLE OF VARCHAR2(32767) INDEX BY BINARY_INTEGER;
PROCEDURE delimstring_to_table
( p_delimstring IN VARCHAR2
, p_table OUT varchar2_table
, p_nfields OUT INTEGER
, p_delim IN VARCHAR2 DEFAULT ','
);
PROCEDURE table_to_delimstring
( p_table IN varchar2_table
, p_delimstring OUT VARCHAR2
, p_delim IN VARCHAR2 DEFAULT ','
);
END parse;
/
CREATE OR REPLACE PACKAGE BODY parse AS
PROCEDURE delimstring_to_table
( p_delimstring IN VARCHAR2
, p_table OUT varchar2_table
, p_nfields OUT INTEGER
, p_delim IN VARCHAR2 DEFAULT ','
)
IS
v_string VARCHAR2(32767) := p_delimstring;
v_nfields PLS_INTEGER := 1;
v_table varchar2_table;
v_delimpos PLS_INTEGER := INSTR(p_delimstring, p_delim);
v_delimlen PLS_INTEGER := LENGTH(p_delim);
BEGIN
WHILE v_delimpos > 0
LOOP
v_table(v_nfields) := SUBSTR(v_string,1,v_delimpos-1);
v_string := SUBSTR(v_string,v_delimpos+v_delimlen);
v_nfields := v_nfields+1;
v_delimpos := INSTR(v_string, p_delim);
END LOOP;
v_table(v_nfields) := v_string;
p_table := v_table;
p_nfields := v_nfields;
END delimstring_to_table;
PROCEDURE table_to_delimstring
( p_table IN varchar2_table
, p_delimstring OUT VARCHAR2
, p_delim IN VARCHAR2 DEFAULT ','
)
IS
v_nfields PLS_INTEGER := p_table.COUNT;
v_string VARCHAR2(32767);
BEGIN
FOR i IN 1..v_nfields
LOOP
v_string := v_string || p_table(i);
IF i != v_nfields THEN
v_string := v_string || p_delim;
END IF;
END LOOP;
p_delimstring := v_string;
END table_to_delimstring;
END parse;
/
This is how you might use it with a standard comma-delimited string:
SQL> declare
2 v_tab parse.varchar2_table;
3 v_nfields integer;
4 v_string varchar2(1000) := '1000,Smith,John,13-May-1970';
5 begin
6 parse.delimstring_to_table (v_string, v_tab, v_nfields);
7 for i in 1..v_nfields loop
8 dbms_output.put_line('Field('||i||') = '||v_tab(i));
9 end loop;
10 end;
11 /
Field(1) = 1000
Field(2) = Smith
Field(3) = John
Field(4) = 13-May-1970
PL/SQL procedure successfully completed.
Enforcing complex constraints in Oracle
- Primary Key: Uniquely identifies a row in the table
- Unique: Other columns that must be unique
- Foreign Key: Column value must match value in another table
- Check: Simple single-table, single-row data rules.
“start_date <= end_date”
“check_ind in (‘Y’,’N’)”
“amount between 0 and 99999.99”
However, many more complex business rules cannot be implemented via the above constraints. For example:
- Employee Project Assignment start date and end date must fall between Project start date and end date
- An employee may not have 2 Employee Project Assignments that overlap date-wise
- A Project cannot have more than 10 Employees Assignments
- An Employee cannot book time to a Task on a Project to which he is not assigned
- The manager of a Department must belong to the Department.
- Code in the application screens
- Code in stored procedures (APIs) called from the application screens
- Database triggers
- Code in application screens can be bypassed, and also the same rule must often be implemented in many places (e.g. in both the Project screen and the Assignment screen).
- Code in stored procedures must also often be implemented in many places (e.g. in both the Project package and the Assignment package). Also, to prevent bypassing of the rules, all updates must be done via the package, which limits the functionality available to the user (cannot write ad hoc updates)
- Code in triggers must also often be implemented in many places (e.g. in both the Project triggers and the Assignment triggers). Also, triggers can often become complex due to work-arounds to avoid mutating tables issues etc.
- Any procedural solution must explicitly lock records to prevent corruption in a multi-user environment – e.g. if a user is amending an Employee Assignment, then no other user may be allowed to amend Employee Assignments for the same employee, or to amend the Project dates.
This began with an on-line discussion I had with Tom Kyte about enforcing complex constraints here: http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:21389386132607
Objective
Ideally, given a complex rule such as one of the examples above, we would like to be able to create a complex check constraint (or “assertion” in ANSI SQL terms) such as:
CONSTRAINT c ON TABLE project p
CHECK (NOT EXISTS (SELECT null FROM emp_proj_assign ep
WHERE ep.projno = p.projno
AND (ep.start_date <> p.end_date)));
Such a constraint (if it could be implemented) would raise an error in any of the following situations:
- User attempts to insert an emp_proj_assign record with dates outside the Project dates
- User attempts to update an emp_proj_assign record with dates outside the Project dates
- User attempts to update a Project record with dates that do not encompass all the associated emp_proj_assign dates
Solution: use Materialized Views
Materialized Views (a.k.a. Snapshots) are actually tables that are maintained by Oracle, whose contents correspond to the result of a query – i.e. like a view, but “materialized” because the result is actually stored in a table.
These provide the mechanism we need to implement the complex constraint as follows:
- Create a materialized view to select data that violates the desired constraint (e.g. assignments where the dates are outside the associated project dates). The MV must be defined with REFRESH COMPLETE ON COMMIT so that it is updated before the end of the transaction.
- Create a check constraint on the materialized view that always evaluates to FALSE – e.g. CHECK (1=0)
Issues
- Oracle 8i cannot support REFRESH ON COMMIT on materialized views of the complexity required for some rules. 9i can handle some but not all (does not allow self-joins in MVs).. 10G can handle self-joins, but does not seem to allow subqueries. So this approach cannot be used for all rules.
- Efficiency: needs to be benchmarked. Is a FAST refresh preferable to a COMPLETE refresh? With COMPLETE we have a query that looks at all rows of the MV’s base tables at the end of every transaction that affects those tables.
- Cannot create a REFRESH ON COMMIT materialized view with a HAVING clause. For such cases (see example 3 below), the materialized view cannot include the constraint violation (e.g. HAVING COUNT(*) > 10), and so the check constraint must do it (e.g. CHECK (cnt <= 10)). Note that in this case the materialized view will consume space in the database.
Based on the following tables:
create table project
( projno int primary key
, start_date date not null
, end_date date not null
);
create table emp_proj_assign
( empno int not null
, projno int not null
, start_date date not null
, end_date date not null
, primary key (empno, start_date)
);
1) Rule: An employee cannot have overlapping project assignments.
This is implemented as follows:
create materialized view emp_proj_mv1
refresh complete on commit as
select 1 dummy
from emp_proj_assign ep1, emp_proj_assign ep2
where ep1.empno = ep2.empno
and ep1.start_date <= ep2.end_date
and ep1.end_date >= ep2.start_date;
alter table emp_proj_mv1
add constraint emp_proj_mv1_chk
check (1=0) deferrable;
2) An employee's project assignment dates must fall between the project start
and end dates
create materialized view emp_proj_mv2
refresh complete on commit as
select 1 dummy
from emp_proj_assign ep, project p
where ep.projno = p.projno
and (ep.start_date <> p.end_date);
alter table emp_proj_mv2
add constraint emp_proj_mv2_chk
check (1=0) deferrable;
3) A project may not have more than 10 assignments (in total, ever!):
create materialized view emp_proj_mv3
build immediate
refresh complete on commit as
select projno, count(*) cnt
from emp_proj_assign
group by projno;
alter table emp_proj_mv3
add constraint emp_proj_mv3_chk
check (cnt <= 10)
deferrable;
4) A project cannot have more than 10 employees assigned at the same time.
(I have not yet worked this one out!)