Friday, October 15, 2004

Enforcing complex constraints in Oracle

Oracle supports various kinds of declarative integrity constraints:
  • Primary Key: Uniquely identifies a row in the table
  • Unique: Other columns that must be unique
  • Foreign Key: Column value must match value in another table
  • Check: Simple single-table, single-row data rules.
Examples of possible check constraints are:
“start_date <= end_date”

“check_ind in (‘Y’,’N’)”
“amount between 0 and 99999.99”

However, many more complex business rules cannot be implemented via the above constraints. For example:

  • Employee Project Assignment start date and end date must fall between Project start date and end date
  • An employee may not have 2 Employee Project Assignments that overlap date-wise
  • A Project cannot have more than 10 Employees Assignments
  • An Employee cannot book time to a Task on a Project to which he is not assigned
  • The manager of a Department must belong to the Department.
These are usually enforced (if at all) procedurally, by one of the following methods:
  • Code in the application screens
  • Code in stored procedures (APIs) called from the application screens
  • Database triggers
These all have disadvantages compared to the declarative approach of constraints:
  • Code in application screens can be bypassed, and also the same rule must often be implemented in many places (e.g. in both the Project screen and the Assignment screen).
  • Code in stored procedures must also often be implemented in many places (e.g. in both the Project package and the Assignment package). Also, to prevent bypassing of the rules, all updates must be done via the package, which limits the functionality available to the user (cannot write ad hoc updates)
  • Code in triggers must also often be implemented in many places (e.g. in both the Project triggers and the Assignment triggers). Also, triggers can often become complex due to work-arounds to avoid mutating tables issues etc.
  • Any procedural solution must explicitly lock records to prevent corruption in a multi-user environment – e.g. if a user is amending an Employee Assignment, then no other user may be allowed to amend Employee Assignments for the same employee, or to amend the Project dates.
This paper shows how such complex constraints can be implemented declaratively, such that the rules are defined once and then applied to all updates from whatever source.

This began with an on-line discussion I had with Tom Kyte about enforcing complex constraints here: http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:21389386132607

Objective

Ideally, given a complex rule such as one of the examples above, we would like to be able to create a complex check constraint (or “assertion” in ANSI SQL terms) such as:

CONSTRAINT c ON TABLE project p
CHECK (NOT EXISTS (SELECT null FROM emp_proj_assign ep
WHERE ep.projno = p.projno
AND (ep.start_date <> p.end_date)));

Such a constraint (if it could be implemented) would raise an error in any of the following situations:
  • User attempts to insert an emp_proj_assign record with dates outside the Project dates
  • User attempts to update an emp_proj_assign record with dates outside the Project dates
  • User attempts to update a Project record with dates that do not encompass all the associated emp_proj_assign dates
However, we can’t achieve that via constraints alone.

Solution: use Materialized Views

Materialized Views (a.k.a. Snapshots) are actually tables that are maintained by Oracle, whose contents correspond to the result of a query – i.e. like a view, but “materialized” because the result is actually stored in a table.

These provide the mechanism we need to implement the complex constraint as follows:
  • Create a materialized view to select data that violates the desired constraint (e.g. assignments where the dates are outside the associated project dates). The MV must be defined with REFRESH COMPLETE ON COMMIT so that it is updated before the end of the transaction.
  • Create a check constraint on the materialized view that always evaluates to FALSE – e.g. CHECK (1=0)
That’s it. Whenever the underlying tables are updated, the materialized view is refreshed. If the update violates the rule, then a row will be inserted into the materialized view; but the check constraint on the MV disallows any inserts into it, and so the transaction fails.

Issues
  • Oracle 8i cannot support REFRESH ON COMMIT on materialized views of the complexity required for some rules. 9i can handle some but not all (does not allow self-joins in MVs).. 10G can handle self-joins, but does not seem to allow subqueries. So this approach cannot be used for all rules.
  • Efficiency: needs to be benchmarked. Is a FAST refresh preferable to a COMPLETE refresh? With COMPLETE we have a query that looks at all rows of the MV’s base tables at the end of every transaction that affects those tables.
  • Cannot create a REFRESH ON COMMIT materialized view with a HAVING clause. For such cases (see example 3 below), the materialized view cannot include the constraint violation (e.g. HAVING COUNT(*) > 10), and so the check constraint must do it (e.g. CHECK (cnt <= 10)). Note that in this case the materialized view will consume space in the database.
Worked Examples

Based on the following tables:

create table project
( projno int primary key
, start_date date not null
, end_date date not null
);
create table emp_proj_assign
( empno int not null
, projno int not null
, start_date date not null
, end_date date not null
, primary key (empno, start_date)
);

1) Rule: An employee cannot have overlapping project assignments.

This is implemented as follows:

create materialized view emp_proj_mv1
refresh complete on commit as
select 1 dummy
from emp_proj_assign ep1, emp_proj_assign ep2
where ep1.empno = ep2.empno
and ep1.start_date <= ep2.end_date
and ep1.end_date >= ep2.start_date;


alter table emp_proj_mv1
add constraint emp_proj_mv1_chk
check (1=0) deferrable;

2) An employee's project assignment dates must fall between the project start
and end dates

create materialized view emp_proj_mv2
refresh complete on commit as
select 1 dummy
from emp_proj_assign ep, project p
where ep.projno = p.projno
and (ep.start_date <> p.end_date);
alter table emp_proj_mv2
add constraint emp_proj_mv2_chk
check (1=0) deferrable;

3) A project may not have more than 10 assignments (in total, ever!):

create materialized view emp_proj_mv3
build immediate
refresh complete on commit as
select projno, count(*) cnt
from emp_proj_assign
group by projno;
alter table emp_proj_mv3
add constraint emp_proj_mv3_chk
check (cnt <= 10)
deferrable;

4) A project cannot have more than 10 employees assigned at the same time.

(I have not yet worked this one out!)

8 comments:

Ravi said...

I like the idea of using materialized views to enforce complex integrity constraints.

One of the issues with it is that the implmentation is not obvious to a developer new to the project.

The overlapping date ranges can be solved using analytic functions in triggers.

You claim that complex rules can be implemented declaratively. Yet your article only shows a neat trick which is not declarative.

The only way to do i declaratively is to let the developer/designer enter declarative constraints in a "pre-compiler" language. The pre-compiler (or code generator) would then use the declarative statements to implement them in whatever way it saw fit, materialized views, code, etc.

The developers would only have to look at the declaration part to know that it is being implemented.

This is just like the current situation with regard to referential integrity, where the statement is declarative and implementation is left to the DBMS.

Ravi

Tony Andrews said...
This comment has been removed by a blog administrator.
Tony Andrews said...

Thanks for your comments. But I must take issue with this one:

> You claim that complex rules can be implemented declaratively. Yet your article only shows a neat trick which is not declarative.

I would say that "create materialized view" is declarative, in the same way that a constraint is declarative: there is no procedural code involved, and the DBMS takes care of enforcement of the "declared" rule. In what sense do you consider that to be non-declarative?

I agree that it is a "trick" that might not be immediately obvious to a new developer. That is a reason for providing good documentation. A tool where you could define the required constraint and it would generate the materialized view etc. as you suggest would be good too.

Ravi said...

Hi,

Since we are attempting to enforce constraints on a table and data that it can hold, the "declarative" part should be included in the definition of the table, just like referential integrity, unique constraints, etc.

When there is no option to make it a part of the table declaration, but must be applied after the fact of table creation, then I do not believe it to be truly declarative.

One thing we must do is ask the DBMS vendors for more useful functionality, such as complex constraints, type hierarchies, etc.

In the absence of any vendor feature, we could implement it using an interpreter that converts these constraints to meaningful code.

For example, the following would be declarative:

Create table t1 (
project_id number(6) not null,
employee_id number(6) not null,
start_date date not null,
end_date date not null,
...
enforce (sum(employee_id) <= 10
over (project_id, range(start_date, end_date)),
...
);


A precompiler could convert the "enforce" clause to relevant code.

The advantage of this is that there is no explicit reference to materialized views. The DBMS/compiler may choose to enforce it any appropriate way.

That is what I mean by a truly declarative statement.

When you explicitly issue commands like:

Step 1: Create a table.
Step 2: Create a materialized view to enforce my onstraint.
Step 3: ...

It is no longer declarative, but becomes procedural.

Ravi

Ravi said...

Hi,

Since we are attempting to enforce constraints on a table and data that it can hold, the "declarative" part should be included in the definition of the table, just like referential integrity, unique constraints, etc.

When there is no option to make it a part of the table declaration, but must be applied after the fact of table creation, then I do not believe it to be truly declarative.

One thing we must do is ask the DBMS vendors for more useful functionality, such as complex constraints, type hierarchies, etc.

In the absence of any vendor feature, we could implement it using an interpreter that converts these constraints to meaningful code.

For example, the following would be declarative:

Create table t1 (
project_id number(6) not null,
employee_id number(6) not null,
start_date date not null,
end_date date not null,
...
enforce (sum(employee_id) <= 10
over (project_id, range(start_date, end_date)),
...
);


A precompiler could convert the "enforce" clause to relevant code.

The advantage of this is that there is no explicit reference to materialized views. The DBMS/compiler may choose to enforce it any appropriate way.

That is what I mean by a truly declarative statement.

When you explicitly issue commands like:

Step 1: Create a table.
Step 2: Create a materialized view to enforce my constraint.
Step 3: ...

It is no longer declarative, but becomes procedural.

Ravi

The Love Handles said...
This comment has been removed by a blog administrator.
Tony Andrews said...

OK, I do see the distinction you are making. And asking the DBMS vendor (Oracle) to enhance its declarative constraints is really where I started with this question on Ask Tom:

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:21389386132607

But since right now there is no built-in, declarative, solution, we have to do the best we can. And to me, a solution that does not require writing ANY procedural code, but instead involves the (declarative) definition of a materialized view and the (declarative) definition of a constraint on that materialized view seems, well, declarative!

But if you prefer I will modify that to "pseudo-delarative" ;-)

Gary Myers said...

Just to point out that Standard Edition Oracle doesn't allow ON COMMIT REFRESH. It is an Enterprise Edition feature.

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:4541191739042