Sunday, February 15, 2009

UKOUG - Apex SIG

I was lucky enough to be able to attend the UK Oracle User Group's Application Express SIG on Friday - my first experience of any UKOUG event. The event was a "sell out", with all available places taken up well in advance. This in itself is great news, because it shows that APEX really is taking off, which means I can be fairly confident I haven't backed the wrong horse career-wise.

David Peake, the APEX Product Manager, gave the first two presentations:

  • The Latest and Greatest from Development

  • APEX @ Oracle


The first of these was a tantalising glimpse at what will be in 3.2 (coming soon) and 4.0 (coming later this year hopefully). These are described in the APEX Statement of Direction, but we were able to see some of them in a live demo. I am particularly looking forward to 4.0 for its improved tabular forms, and for Dynamic Actions - i.e. a "rich client" experience without having to write any Javascript.

Dimitri Gielis and John Scott from Apex Evangelists both gave interesting presentations, on Charting in Apex and Dispelling Myths about APEX respectively. Dimitri's piqued my interest to go back and look at charts again, and John's provided a lot of useful ammo for fighting against the usual "Apex is just a power user toy" kind of myths.

There was also a presentation from Matt Nolan and Vincent Migue from e-DBA called "Using Apex to Expose your Business to the Web" - and their website is a fine example of that.

After a short Q&A forum, we all went to the local pub, which was a great opportunity to get to know some of these people a little better.

All in all, a great day out (good grief, how sad do I sound?!) and I'd definitely like to attend the next one, should there be one.

Thursday, November 01, 2007

Apex Impact Analysis script (v2)

After receiving useful comments from Patrick Wolf on my previous post, here is an enhanced version of the script that also checks condition expressions. I'm not including condition_expression2 because I don't think that ever contains anything other than literal values (am I right?)



column obj_name format a50

undef search_text

accept search_text prompt "Enter search text: "

select application_id, page_id, 'Region' objtype, region_name obj_name, 'Source' usage_type
from apex_application_page_regions
where lower(region_source) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Item' obj_type, item_name obj_name, 'Source' usage_type
from apex_application_page_items
where lower(item_source) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Process' obj_type, process_name obj_name, 'Source' usage_type
from apex_application_page_proc
where lower(process_source) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Branch' obj_type, TO_CHAR(process_sequence) obj_name, 'Source' usage_type
from apex_application_page_branches
where lower(branch_action) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Region' obj_type, region_name obj_name, 'Condition' usage_type
from apex_application_page_regions
where lower(condition_expression1) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Item' obj_type, item_name obj_name, 'Condition' usage_type
from apex_application_page_items
where lower(condition_expression1) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Item' obj_type, item_name obj_name, 'Read Only' usage_type
from apex_application_page_items
where lower(read_only_condition_exp1) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Process' obj_type, process_name obj_name, 'Condition' usage_type
from apex_application_page_proc
where lower(condition_expression1) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Branch' obj_type, TO_CHAR(process_sequence) obj_name, 'Condition' usage_type
from apex_application_page_branches
where lower(condition_expression1) like '%&search_text.%'
ORDER BY 1,2,3,4
/

Wednesday, October 31, 2007

Apex: impact analysis script

Ever wondered where a particular package is being used by your Apex application(s)? This simpleSQL Plus script may be of use:



column obj_name format a50

undef search_text

accept search_text prompt "Enter search text: "

select application_id, page_id, 'Region' objtype, region_name obj_name
from apex_application_page_regions
where lower(region_source) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Item' obj_type, item_name obj_name
from apex_application_page_items
where lower(item_source) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Process' obj_type, process_name obj_name
from apex_application_page_proc
where lower(process_source) like '%&search_text.%'
UNION ALL
select application_id, page_id, 'Branch' obj_type, TO_CHAR(process_sequence) obj_name
from apex_application_page_branches
where lower(branch_action) like '%&search_text.%'
ORDER BY 1,2,3,4
/


For example:

Enter search text: my_package.my_fun

APPLICATION_ID PAGE_ID OBJTYPE OBJ_NAME
-------------- ---------- ------- ------------------------------------------
101 123 Process Call my function
101 127 Branch 30
102 128 Region My Report
102 129 Region My Query
103 21 Item P21_MY_ITEM


Feel free to tell me if (a) I have missed anything, or (b) I have re-invented the wheel and should be using some built-in Apex utility!

Sunday, February 11, 2007

Apex 3.0

I have finally got access to the Apex 3.0 evaluation instance, after some problems reading the email containing my login credentials. It looks good. The "What's New in Apex 3.0" document details lots of interesting new features; however there a couple of things on my "wish list" that are not mentioned there, but have been added. These are not very exciting, but assist the developer:
  • Region Static ID: now you can give your regions an identifier of your choice, rather than having to refer to Apex region IDs like R17346384974630405 in your code. Not only can your own identifiers be easier to read and remember, they won't change when you copy the page.
  • #BUTTON_ID#: this new substitution variable means you can reference the button ID in the button template, and so enable manipulation of the button via Javascript.
Up to now I have devised my own work-arounds for these, but these will make life simpler in future.

No doubt I'll stumble across other goodies as I spend more time playing with, er I mean "evaluating", 3.0

Wednesday, January 04, 2006

Why I Love HTMLDB

Over the last 3 years or so I have got to know HTMLDB well, and have become a big fan of it - as a development tool, not as a "power user toy". I have successfully built and deployed a number of HTMLDB applications. I find it appeals to me in a way that .Net and J2EE never have. Why is this? Here are some reasons:

  • all programming is in PL/SQL, my favourite programming language

  • all the "boring bits" like pagination of record sets are built-in

  • integration with the Oracle database is seamless

  • you don't need a big application server to support it

  • there is no question of reinventing the wheel by caching data, locking data, implementing business rules in the "middle tier" (there is NO "middle tier"!)


There are, I must admit, some shortcomings:

  • it was not designed with large applications and large development teams in mind, so code and build management is a challenge (but one that we are learning to overcome)

  • some of the functionality is not yet quite as robust as it could be (hopefully this will improve as the product matures)


However, HTMLDB doesn't seem to be to everyone's taste. Some hardened J2EE fanatics think it is a toy that is not good enough to be used by "programming professionals" - see for example this exchange I had on Oracle WTF.

I guess one of the key comments from the J2EE fan was this: "and as an ex Forms programmer NOTHING is a backward step from oracle forms!"

Now, I will admit that Oracle Forms is now getting rather old and tired - but in its heyday (around 1990-1995) it was a great product. What it did was integrate seamlessly with the database and do all the drudgery of paginating record sets etc. while you got on with implementing the business requirements. In fact, all the reasons I gave above for liking HTMLDB apply equally to Oracle Forms.

But for some years now, Forms has been looking like a "legacy system" - shoe-horned into the ubiquitous browser interface as WebForms but never looking quite at home there. And the alternatives I saw seemed to be a step backwards in terms of data management:

  • ASP/.Net: write lots of VB code and hand-crank things like record pagination

  • J2EE: write lots of Java code (which looks a LOT like C!), build your own classes to wrap around database objects etc. And cope with the "standards" that came and went like pop idols (JSF is it currently?)


Then I came across HTMLDB, promoted by Tom Kyte on his AskTom website. You could try it out for free, without even downloading anything, on Oracle's hosted HTMLDB development website. Of course, as with anything, there was a learning curve to get over, which probably took me a few months (elapsed: I was only playing with HTMLDB in my spare time.) But what was very soon apparent to me was that this could be the successor to Oracle Forms that I had been waiting for.

However, in their wisdom Oracle have decided to position HTMLDB as a "power user toy" to replace Excel and Access for very small databases - which it certainly can do. But this marketing undermines HTMLDB as a tool for "serious" application developers. It has been suggested to me that this may be because HTMLDB didn't come out of the Oracle Tools group, which of course has JDeveloper to promote.

Once I had built some small-scale applications with HTMLDB to replace aging Forms applications used only internally, the people I work for were sufficiently impressed to consider using it for something a lot bigger. I built a prototype which was received favourably, but we knew we ought to at least look at JDeveloper with ADF, since that is the approach Oracle recommend for moving away from Forms. So we got Oracle to send us someone to demonstrate JDeveloper to us. It all looked very slick, knocking up an "emp and dept" application in a few minutes - but nothing that couldn't have been done just as easily with HTMLDB. And when he tried to customise the application at our request, he quickly came unstuck. Soon afterwards, a colleague went to an Oracle "Developer Day" (or some such title) where JDeveloper was demoed agan. This time, the compilation process failed with a fatal error, and the demonstrator gave up and moved on to something else. What with these unimpressive demonstrations and the fact that our business logic already exists in the form of PL/SQL packages, it was very clear that HTMLDB was the only realistic option to replace the system within the timescale and budget allowed. (My impression is that JDeveloper might be really good in about 3 years time!)

But even if we were building a brand new system, my philosophy would be to build the business logic as PL/SQL packages and then build a light UI application on top of that: I really don't buy the whole J2EE way of doing things. It implies that the application is king, rather than the data. As Tom Kyte often says: applications come and go, but the data lasts "for ever".

So, for now at least, I am staking my future on HTMLDB rather than J2EE. Maybe I'll regret that one day, but I doubt it.

Monday, January 02, 2006

PL/SQL So Doku Solver

So Doku has taken over the UK, if not the world, in the last year. I am now addicted, but at first I couldn't see the point in these puzzle and prefered to write a program to solve them for me. What was the point in that? Absolutely none! But in case you ever feel the need to get a Su Doku puzzle solved without actualy doing it yourself, here is the code - think of it as a late Christmas present!

First, we need to create a table:

create table cells
( x number(1,0) not null
, y number(1,0) not null
, z varchar2(9)
, done varchar2(1) not null
, constraint cells_pk primary key (x, y)
)
/


Then a package:

create or replace package cell_pkg as
procedure solve(p_state in varchar2);
procedure print;
end;
/

create or replace package body cell_pkg as
procedure solve(p_state in varchar2)
is
v varchar2(1);
cnt integer;
processed integer;
it_failed exception;
begin
-- Set up a clean board
delete cells;
for r in 1..9 loop
for c in 1..9 loop
insert into cells (x,y,z,done) values (r,c,'123456789','N');
end loop;
end loop;
-- Apply initial state
for r in 1..9 loop
for c in 1..9 loop
v := substr(p_state,(r-1)*9+c,1);
if v between '1' and '9' then
update cells
set z = v
where x = r
and y = c;
end if;
end loop;
end loop;
-- Start processing
loop
processed := 0;
-- Process cells that are solved but not yet marked as "done":
for rec in (select * from cells where length(z)=1 and done='N')
loop
-- Remove that cell's value from the possible values for other cells
-- in same row, column or square
update cells
set z = replace(z,rec.z)
where ( x = rec.x
or y = rec.y
or ( floor((x-1)/3) = floor((rec.x-1)/3)
and floor((y-1)/3) = floor((rec.y-1)/3)
)
)
and (x <> rec.x or y <> rec.y); -- Exclude self!
-- Now mark this cell as done
update cells
set done = 'Y'
where x = rec.x and y = rec.y;
-- Note how many cells processed on this pass
processed := processed+1;
end loop;
-- Look for cells that are not solved, but where they are the only cell
-- in their row, column or square containing a given value
for i in 1..9 loop
for rec in (select * from cells c1
where length(z) > 1
and z like '%'||i||'%'
and ( not exists
(select null
from cells c2
where c2.x = c1.x -- Same row
and (c1.x <> c2.x or c1.y <> c2.y) -- Exclude self!
and c2.z like '%'||i||'%'
)
or not exists
(select null
from cells c2
where c2.y = c1.y -- Same column
and (c1.x <> c2.x or c1.y <> c2.y) -- Exclude self!
and c2.z like '%'||i||'%'
)
or not exists
(select null
from cells c2
where floor((c2.x-1)/3) = floor((c1.x-1)/3) -- Same
and floor((c2.y-1)/3) = floor((c1.y-1)/3) -- square
and (c1.x <> c2.x or c1.y <> c2.y) -- Exclude self!
and c2.z like '%'||i||'%'
)
)
)
loop
update cells
set z = i
where x = rec.x
and y = rec.y;
-- Note how many cells processed on this pass
processed := processed+1;
end loop;
end loop;
-- Have we solved it yet?
select count(*) into cnt from cells where length(z) > 1 and rownum = 1;
exit when cnt = 0;
-- No. If we didn't achieve anything on this pass then give up
if processed = 0 then
raise it_failed;
end if;
end loop;
print;
dbms_output.put_line('SUCCESS!!!');
exception
when it_failed then
print;
dbms_output.put_line('FAILED!!!');
end;

procedure print
is
begin
for rec in (select * from cells order by x,y)
loop
dbms_output.put(rec.z||' ');
if rec.y = 9 then
dbms_output.put_line('');
end if;
end loop;
end;
end;
/

Now, how to use it. Let's take the following Su Doku puzzle as an example:
.6.1.4.5.
..83.56..
2.......1
8..4.7..6
..6...3..
7..9.1..4
5.......2
..72.69..
.4.5.8.7.


We can solve this as follows:
begin
cell_pkg.solve(' 6 1 4 5 83 56 2 18 4 7 6 6 3 7 9 1 45 2 72 69 4 5 8 7 ');
end;
/

(Enter all the grid values reading from left to right, top to bottom, including ALL spaces).
The output will look like this:
9 6 3 1 7 4 2 5 8
1 7 8 3 2 5 6 4 9
2 5 4 6 8 9 7 3 1
8 2 1 4 3 7 5 9 6
4 9 6 8 5 2 3 1 7
7 3 5 9 6 1 8 2 4
5 8 9 7 1 3 4 6 2
3 1 7 2 4 6 9 8 5
6 4 2 5 9 8 1 7 3
SUCCESS!!!


But perhaps you can do better? If you can write a better version, please let me know!

UPDATE 20 April 2006: Bill Magee has picked up the challenge and run with it, and his improved version is here.

Wednesday, August 24, 2005

What does GROUP BY CUBE do?

This is one of those "new" features that has actually been around for a long time, but I had never used it. Then recently I was asked to tune a long-running report, and it occured to me that it might just be what I needed.

The report consisted of 4 summary queries, each of which presented the same data but summarised by a different "dimension". Imagine it was based on the EMP table, then the report would show:

Employee Summary by Job
-----------------------

Job # Emps Total Sal
---------------------------------------- ---------- ----------
CLERK 4 4150
ANALYST 2 6000
MANAGER 3 8275
SALESMAN 4 5600
PRESIDENT 1 5000
---------- ----------
14 29025


Employee Summary by Manager
---------------------------

Mgr # Emps Total Sal
---------------------------------------- ---------- ----------
7566 2 6000
7698 5 6550
7782 1 1300
7788 1 1100
7839 3 8275
7902 1 800
---------- ----------
14 29025

Employee Summary by Department
------------------------------

Dept # Emps Total Sal
---------------------------------------- ---------- ----------
10 3 8750
20 5 10875
30 6 9400
---------- ----------
14 29025


That is only 3 queries, but you get the idea.

Each query took 10 minutes to run (reasonable given the complexity of the query and the amount of data), so the whole report took 40 minutes.

My goal was to reduce the time to more like 10 minutes - perhaps by pre-computing the results grouped by all 4 dimensions into a temporary table and then reporting on that. However, it occured to me that GROUP BY CUBE might be what I needed, so I tried it out. Here is a simple query:

SQL> select job, mgr, deptno, count(*) numemps, sum(sal) tots
2 from emp
3 group by cube(job, mgr, deptno);

JOB MGR DEPTNO NUMEMPS TOTSAL
--------- ---------- ---------- ---------- ----------
1 5000
14 29025
10 1 5000
10 3 8750
20 5 10875
30 6 9400
7566 2 6000
7566 20 2 6000
7698 5 6550
7698 30 5 6550
7782 1 1300
7782 10 1 1300
7788 1 1100
7788 20 1 1100
7839 3 8275
7839 10 1 2450
7839 20 1 2975
7839 30 1 2850
7902 1 800
7902 20 1 800
CLERK 4 4150
CLERK 10 1 1300
CLERK 20 2 1900
CLERK 30 1 950
CLERK 7698 1 950
CLERK 7698 30 1 950
CLERK 7782 1 1300
CLERK 7782 10 1 1300
CLERK 7788 1 1100
CLERK 7788 20 1 1100
CLERK 7902 1 800
CLERK 7902 20 1 800
ANALYST 2 6000
ANALYST 20 2 6000
ANALYST 7566 2 6000
ANALYST 7566 20 2 6000
MANAGER 3 8275
MANAGER 10 1 2450
MANAGER 20 1 2975
MANAGER 30 1 2850
MANAGER 7839 3 8275
MANAGER 7839 10 1 2450
MANAGER 7839 20 1 2975
MANAGER 7839 30 1 2850
SALESMAN 4 5600
SALESMAN 30 4 5600
SALESMAN 7698 4 5600
SALESMAN 7698 30 4 5600
PRESIDENT 1 5000
PRESIDENT 10 1 5000
PRESIDENT 10 1 5000

52 rows selected.


Buried in amongst all those rows is all the data I need. There is a GROUPING function that can be used to see which group a row belongs to:

SQL> select job, mgr, deptno, count(*) numemps, sum(sal) totsal
2 , grouping(job) gj, grouping(mgr) gm, grouping(deptno) gd
3 from emp
4 group by cube(job, mgr, deptno);

JOB MGR DEPTNO NUMEMPS TOTSAL GJ GM GD
--------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1 5000 1 0 1
14 29025 1 1 1
10 1 5000 1 0 0
10 3 8750 1 1 0
20 5 10875 1 1 0
30 6 9400 1 1 0
7566 2 6000 1 0 1
7566 20 2 6000 1 0 0
7698 5 6550 1 0 1
7698 30 5 6550 1 0 0
7782 1 1300 1 0 1
7782 10 1 1300 1 0 0
7788 1 1100 1 0 1
7788 20 1 1100 1 0 0
7839 3 8275 1 0 1
7839 10 1 2450 1 0 0
7839 20 1 2975 1 0 0
7839 30 1 2850 1 0 0
7902 1 800 1 0 1
7902 20 1 800 1 0 0
CLERK 4 4150 0 1 1
CLERK 10 1 1300 0 1 0
CLERK 20 2 1900 0 1 0
CLERK 30 1 950 0 1 0
CLERK 7698 1 950 0 0 1
CLERK 7698 30 1 950 0 0 0
CLERK 7782 1 1300 0 0 1
CLERK 7782 10 1 1300 0 0 0
CLERK 7788 1 1100 0 0 1
CLERK 7788 20 1 1100 0 0 0
CLERK 7902 1 800 0 0 1
CLERK 7902 20 1 800 0 0 0
ANALYST 2 6000 0 1 1
ANALYST 20 2 6000 0 1 0
ANALYST 7566 2 6000 0 0 1
ANALYST 7566 20 2 6000 0 0 0
MANAGER 3 8275 0 1 1
MANAGER 10 1 2450 0 1 0
MANAGER 20 1 2975 0 1 0
MANAGER 30 1 2850 0 1 0
MANAGER 7839 3 8275 0 0 1
MANAGER 7839 10 1 2450 0 0 0
MANAGER 7839 20 1 2975 0 0 0
MANAGER 7839 30 1 2850 0 0 0
SALESMAN 4 5600 0 1 1
SALESMAN 30 4 5600 0 1 0
SALESMAN 7698 4 5600 0 0 1
SALESMAN 7698 30 4 5600 0 0 0
PRESIDENT 1 5000 0 0 1
PRESIDENT 1 5000 0 1 1
PRESIDENT 10 1 5000 0 0 0
PRESIDENT 10 1 5000 0 1 0


Based on that, I can filter out the rows I'm not interested in and format the results like this:

SQL> select case when gj=0 and gm=1 and gd=1 then 'Job'
2 when gj=1 and gm=0 and gd=1 then 'Manager'
3 when gj=1 and gm=1 and gd=0 then 'Dept'
4 end as keytype
5 , case when gj=0 and gm=1 and gd=1 then job
6 when gj=1 and gm=0 and gd=1 then to_char(mgr)
7 when gj=1 and gm=1 and gd=0 then to_char(deptno)
8 end as keyval
9 , numemps, totsal
10 from
11 (
12 select job, mgr, deptno, count(*) numemps, sum(sal) totsal
13 , grouping(job) gj, grouping(mgr) gm, grouping(deptno) gd
14 from emp
15 group by cube(job, mgr, deptno)
16 )
17 where (gj=0 and gm=1 and gd=1)
18 or (gj=1 and gm=0 and gd=1)
19 or (gj=1 and gm=1 and gd=0);

KEYTYPE KEYVAL NUMEMPS TOTSAL
------- ---------------------------------------- ---------- ----------
Manager 7566 2 6000
7698 5 6550
7782 1 1300
7788 1 1100
7839 3 8275
7902 1 800
1 5000
******* ---------- ----------
sum 14 29025
Job ANALYST 2 6000
CLERK 4 4150
MANAGER 3 8275
PRESIDENT 1 5000
SALESMAN 4 5600
******* ---------- ----------
sum 14 29025
Dept 10 3 8750
20 5 10875
30 6 9400
******* ---------- ----------
sum 14 29025


... which gives me the breakdown I want in a single query, and provides the performance boost I was hoping for!

Wednesday, April 20, 2005

Business Logic APIs

Database application development generally has two major components:

1) The user interface
2) The database

These have to talk to each other somehow, typically by an "Application Programmatic Interface" or API. What should such an API look like logically?

Let's take as an example a screen that allows the user to transfer funds from one account to another. It looks something like this:

Source Account: ..........
Target Account: ..........
Amount: ..........

[OK] [Cancel]

The database has the following tables:

CREATE TABLE accounts
( account_id INTEGER PRIMARY KEY
, account_name VARCHAR2(30)
, current_balance NUMBER CHECK (current_balance >= 0)
);

CREATE TABLE transactions
( transaction_id INTEGER PRIMARY KEY
, transaction_date DATE NOT NULL
, source_account_id INTEGER NOT NULL REFERENCES accounts
, target_account_id INTEGER NOT NULL REFERENCES accounts
, amount NUMBER CHECK (amount >= 0)
);

What needs to be done:

Once user has entered the 2 account numbers and the amount then we must:

  1. Update the source account and decrease its balance by the amount (subject to the check constraint that says the balance cannot become negative)

  2. Update the target account and increase its balance by the amount

  3. Create a transactions record for the transfer

How should we build the code?

The wrong approach

This is the approach that I so often see used:

1) Create a "table API" for table ACCOUNTS with a procedure:
PROCEDURE update_account_balance
( p_account_id IN INTEGER
, p_amount IN NUMBER
);

2) Create a “table API” for table TRANSACTIONS with a procedure:
PROCEDURE create_transaction
( p_source_account_id IN INTEGER
, p_target_account_id IN INTEGER
, p_amount IN NUMBER
);

3) In the application, invoke the following code when OK is pressed:
  • accounts_api.update_account_balance (:source_account_id, :amount * -1);
  • accounts_api.update_account_balance (:target_account_id, :amount);
  • transactions_api.create_transaction (:source_account_id, :target_account_id, :amount);
It works, but it is poor design. The client application is made responsible for carrying out the transaction properly: for example, if there are insufficient funds in the source account then the transfer must be aborted.

What is wrong with that?

  • Suppose there are other variants of the transfer process, implemented in different client screens. The business logic must be duplicated between the various screens. This leads to increased maintenance, increased testing, increased chance of something going wrong.

  • Suppose this application is a Windows client/server app, but now we want to add an alternative web-based interface. Yet again, we will be duplicating the business logic, with the attendant costs and risks.

The right approach

What is required is a Business Logic API, in which each complete transaction is represented by a single procedure like this:
PROCEDURE transfer_funds
( p_source_account_id IN INTEGER
, p_target_account_id IN INTEGER
, p_amount IN NUMBER
);

The body of that procedure performs all 3 steps of the process, and handles any errors. It either succeeds in all 3 steps, or leaves the database unchanged.

In the application, the code behind the OK button is:

accounts_api.Transfer_funds (:source_account_id, :target_account_id, :amount);

It is not possible for the application to corrupt the database, because it doesn’t contain any business logic – it is just an interface to the business logic that is in the API.

This has many advantages:
  • The business logic resides in exactly one place, the API – so only needs to be maintained once, tested once

  • The application is simpler – less maintenance, less testing

  • The application is lower risk

  • Adding an alternative interface is easier and low-risk, because we will only be creating a user interface, not new (duplicated) business logic

Conclusion

Have I built a straw man here? Isn’t the above obvious? You would think so, but as I already said I actually see code developed the wrong way all the time. I’m not making it up! In my experience it is usually because the developer (who may be fairly inexperienced in software design principles) is left to his or her own devices to build the application and the API at the same time, without any direction from a senior designer. Since he is building all the code, and only building one client module, what does he care whether the logic is in the client application or in the API?

It is only later, when changes are made to the rules, or a new interface is wanted, that the whole sorry mess becomes apparent.

Tuesday, November 23, 2004

Vadim Tropashko on Complex Constraints

I am flattered to find that my first article here (Enforcing Complex Constraints in Oracle) is credited in the Acknowledgements of this interesting article by Vadim Tropashko on DBAZine:

http://www.dbazine.com/tropashko8.shtml

As well as approaching the subject from first principles, the article also suggest further applications of such constraints, e.g. "foreign keys" to more than one parent table, and enforcing uniqueness of column values across more than one table.

Thursday, October 21, 2004

OTLT and EAV: the two big design mistakes all beginners make

Many people (myself included) start of as programmers, and only later start to work with databases. After a while, the developer notices two things about databases:

  1. Tables have a fixed set of columns; adding a column to a table involves changing any application code that accesses it.
  2. There are lots of “lookup” tables that typically have a code and a description.

Being a natural problem solver, and knowing that code re-use is a “good thing”, the developer thinks he can do better. Quite often, there is no one more experienced around to advise against, and so the developer implements his ideas; this is so common that both ideas have a name: the One True Lookup Table (OTLT) and the Entity-Attribute-Value (EAV) model.

One True Lookup Table (OTLT)

The idea: instead of having lots of “lookup” tables like these:

create table order_status (status_code varchar2(10), status_desc varchar2(40) );

create table country (country_code varchar2(3), country_name varchar2(30) );

create table priority (priority_no number(1), priority_desc varchar2(40) );

… why not just have ONE lookup table like this?:

create table lookup (lookup_type varchar2(10), lookup_code varchar2(20), lookup_desc varchar2(100) );

Great! Now we only need one “maintain lookup” screen instead of 3 (or 33 or whatever).

The trouble is, the developer doesn’t consider the disadvantages of this. Most importantly, we can no longer use foreign key constraints to protect the integrity of the data – unless one of the following conditions is true:

  • lookup_code is unique within table lookup
  • every child table uses 2 columns referencing (lookup_type,lookup_code) for its foreign keys

In most cases, neither of the above applies, and responsibility for data integrity resides solely in the application code. Show me such a database, and I’ll show you some data where the code does not match anything in the lookup table.

Entity-Attribute-Value (EAV)

The idea: instead of “hard-coding” columns into the table like this:

create table emp (empno integer primary key, ename varchar2(20), sal number, job varchar2(10));

insert into emp (empno, ename, sal, job) values (1234,’ANDREWS’,1000,’CLERK’);

… why not have total flexibility like this:

create table emp (empno integer primary key );

create table emp_value (empno references emp, code varchar2(20), value varchar2(100));

insert into emp (empno) values (1234);

insert into emp_values (‘NAME’,’ANDREWS’);

insert into emp_values (‘SAL’,’1000’);

insert into emp_values (‘JOB’,’CLERK’);

Great! Now we are free to invent new “attributes” at any time, without having to alter the table or the application!

However, consider a simple query: “show the names of all employees who are clerks and earn less than 2000”.

With the standard emp table:

select ename from emp where job=’CLERK’ and sal < 2000;

With the EAV tables:

select ev1.name

from emp_values ev1, emp_values ev2 emp_values ev3

where ev1.code = ‘NAME’

and ev1.empno = ev2.empno

and ev2.code = ‘JOB’

and ev2.value = ‘CLERK’

and ev1.empno = ev3.empno

and ev3.code = ‘SAL’

and TO_NUMBER(ev3.value) < 2000;

Not only is that much harder to follow, it is likely to be much slower to process too. And this is about the most simple of queries!

Conclusion

OTLT and EAV are “generic” approaches that are seductive to programmers, but are actually a bad idea for most databases. Resist the temptation!


Friday, October 15, 2004

"Pivot" Queries

One of the frequently asked questions on SQL forums is how to present data “horizontally” rather than “vertically”, known as a “pivot query”. For example, instead of this:

DEPTNO      JOB         HEADCOUNT
----------  ---------  ----------
10          CLERK               1
10          MANAGER             1
20          ANALYST             2
20          CLERK               2
20          MANAGER             1
30          CLERK               1
30          MANAGER             1
30          SALESMAN            4

… people would like to know how to produce this:

DEPTNO         ANALYST       CLERK     MANAGER    SALESMAN
----------  ----------  ----------  ----------  ----------
10                               1           1
20                   2           2           1
30                               1           1           4


In Oracle, the normal way is to write a query using DECODE like this:

select deptno
, count(DECODE(job,'ANALYST', 1)) as "ANALYST"
, count(DECODE(job,'CLERK', 1)) as "CLERK"
, count(DECODE(job,'MANAGER', 1)) as "MANAGER"
, count(DECODE(job,'SALESMAN', 1)) as "SALESMAN"
from emp
group by deptno
order by deptno;

The DECODE ensures that the jobs go in the right columns.
Now this type of query is quite easy to write (when you know how), but is also quite laborious; also, if the list of jobs changes then the query needs to be changed too.
To make producing such queries a “piece of cake” I have built a package called pivot that can be used to generate the SQL like this:
begin
pivot.print_pivot_query
( 'deptno'
, 'job'
, 'emp'
, '1'
, agg_types => 'count'
);
end;
/
select deptno
, count(DECODE(job,'ANALYST', 1)) as "ANALYST"
, count(DECODE(job,'CLERK', 1)) as "CLERK"
, count(DECODE(job,'MANAGER', 1)) as "MANAGER"
, count(DECODE(job,'SALESMAN', 1)) as "SALESMAN"
from emp
group by deptno
order by deptno
PL/SQL procedure successfully completed.


The package code follows. Note that it requires the “parse” package that I posted earlier.

CREATE OR REPLACE PACKAGE pivot IS
  TYPE ref_cursor IS REF CURSOR;
  PROCEDURE print_pivot_query
  ( group_cols   IN VARCHAR2              -- Comma-separated list of column(s), including table alias if applicable,
                                          -- to be used to group the records
  , pivot_col    IN VARCHAR2              -- The name of the column to be pivoted, including table alias if applicable
  , tables       IN VARCHAR2              -- Comma-separated list of table(s), with table alias if applicable
  , value_cols   IN VARCHAR2              -- Comma-separated list of column(s), including table alias if applicable
                                          -- for which aggregates are to be shown at each intersection
  , pivot_values IN VARCHAR2 DEFAULT NULL -- Comma-separated list of values of pivot column to be used;
                                          -- if omitted, all values are used (determined dynamically)
  , agg_types    IN VARCHAR2 DEFAULT NULL -- Comma-separated list of aggregate types, corresponding to value_cols;
                                          -- if omitted, SUM is the default
  , where_clause IN VARCHAR2 DEFAULT NULL -- where clause of query
  , order_by     IN VARCHAR2 DEFAULT NULL -- order by clause; if omitted, output is ordered by group_cols
  );
  FUNCTION pivot_query
  ( group_cols IN VARCHAR2
  , pivot_col IN VARCHAR2
  , tables IN VARCHAR2
  , value_cols IN VARCHAR2
  , pivot_values IN VARCHAR2 DEFAULT NULL
  , agg_types IN VARCHAR2 DEFAULT NULL
  , where_clause IN VARCHAR2 DEFAULT NULL
  , order_by IN VARCHAR2 DEFAULT NULL
  ) RETURN VARCHAR2;
  FUNCTION pivot_cursor
  ( group_cols IN VARCHAR2
  , pivot_col IN VARCHAR2
  , tables IN VARCHAR2
  , value_cols IN VARCHAR2
  , pivot_values IN VARCHAR2 DEFAULT NULL
  , agg_types IN VARCHAR2 DEFAULT NULL
  , where_clause IN VARCHAR2 DEFAULT NULL
  , order_by IN VARCHAR2 DEFAULT NULL
  ) RETURN ref_cursor;
END;
/
CREATE OR REPLACE PACKAGE BODY pivot IS
  g_mode varchar2(10);
  g_sql varchar2(32767);
  PROCEDURE pr
  ( p_text in varchar2
  )
  IS
    v_text VARCHAR2(32767) := p_text;
  BEGIN
    if g_mode = 'PRINT' then
      WHILE LENGTH(v_text) > 255 LOOP
        DBMS_OUTPUT.PUT_LINE( SUBSTR(v_text,1,255) );
        v_text := SUBSTR(v_text,256);
      END LOOP;
      DBMS_OUTPUT.PUT_LINE( v_text );
    else
      g_sql := g_sql || ' ' || p_text;
    end if;
  END pr;
  /*
  || Generates the SQL statement for a pivot query based on parameters
  || Example:
  || create_pivot_query
  || ( group_cols => 'd.dname'
  || , pivot_col => 'e.job'
  || , tables => 'emp e, dept d'
  || , value_cols => 'e.sal,e.age'
  || , agg_types => 'min,max'
  || , where_clause => 'e.deptno = d.deptno'
  || );
  || Generates a query like:
  || select d.dname
  || , min(DECODE(e.job,'ANALYST', e.sal, 0 )) as "min_ANALYST_esal"
  || , max(DECODE(e.job,'ANALYST', e.age, 0 )) as "max_ANALYST_eage"
  || , min(DECODE(e.job,'CLERK', e.sal, 0 )) as "min_CLERK_esal"
  || , max(DECODE(e.job,'CLERK', e.age, 0 )) as "max_CLERK_eage"
  || , min(DECODE(e.job,'MANAGER', e.sal, 0 )) as "min_MANAGER_esal"
  || , max(DECODE(e.job,'MANAGER', e.age, 0 )) as "max_MANAGER_eage"
  || , min(DECODE(e.job,'PRESIDENT', e.sal, 0 )) as "min_PRESIDENT_esal"
  || , max(DECODE(e.job,'PRESIDENT', e.age, 0 )) as "max_PRESIDENT_eage"
  || , min(DECODE(e.job,'SALESMAN', e.sal, 0 )) as "min_SALESMAN_esal"
  || , max(DECODE(e.job,'SALESMAN', e.age, 0 )) as "max_SALESMAN_eage"
  || from emp e, dept d
  || where e.deptno = d.deptno
  || group by d.dname
  || order by d.dname
  ||
  || i.e. the parameters are used like this:
  || select
  || , (DECODE(,, , 0 ))
  || from
  || where
  || group by
  || order by
  ||
  */
  PROCEDURE define_pivot_query
  ( group_cols IN VARCHAR2
  , pivot_col IN VARCHAR2
  , tables IN VARCHAR2
  , value_cols IN VARCHAR2
  , pivot_values IN VARCHAR2 DEFAULT NULL
  , agg_types IN VARCHAR2 DEFAULT NULL
  , where_clause IN VARCHAR2 DEFAULT NULL
  , order_by IN VARCHAR2 DEFAULT NULL
  )
  IS
    type ref_cursor is ref cursor;
    rc ref_cursor;
    pv_tab parse.varchar2_table;
    val_tab parse.varchar2_table;
    agg_tab parse.varchar2_table;
    num_pvs integer := 0;
    num_vals integer := 0;
    num_aggs integer := 0;
    alias varchar2(100);
  BEGIN
    g_sql := NULL;
    -- Determine pivot values: use list if given, otherwise construct query to find them
    if pivot_values is not null then
      parse.delimstring_to_table( pivot_values, pv_tab, num_pvs );
    else
      open rc for 'select distinct ' || pivot_col || ' from ' || tables || ' where ' || nvl(where_clause,'1=1');
      loop
        num_pvs := num_pvs+1;
        fetch rc into pv_tab(num_pvs);
        exit when rc%notfound;
      end loop;
      close rc;
      num_pvs := num_pvs-1;
    end if;
    parse.delimstring_to_table( value_cols, val_tab, num_vals );
    -- Determine aggregate functions (default is SUM)
    if agg_types is not null then
      parse.delimstring_to_table( agg_types, agg_tab, num_aggs );
    end if;
    if num_aggs <> num_vals then
      for i in num_aggs+1..num_vals loop
        agg_tab(i) := 'sum';
      end loop;
    end if;
    pr('select '||group_cols);
    for pv in 1..num_pvs loop
      pv_tab(pv) := trim(pv_tab(pv));
      for val in 1..num_vals loop
        val_tab(val) := trim(val_tab(val));
        if num_vals = 1 then
          alias := substr(pv_tab(pv),1,30);
        else
          alias := substr(agg_tab(val) || '_' || TRANSLATE(pv_tab(pv),'x. -()<>','x') || '_' || TRANSLATE(val_tab(val),'x. -()<>','x'),1,30);
        end if;
        pr( ', '||agg_tab(val)||'(DECODE(' || pivot_col || ',''' || pv_tab(pv) || ''', ' || val_tab(val) || '))'
            || ' as "' || alias || '"' );
      end loop;
    end loop;
    pr('from ' || tables );
    if where_clause is not null then
      pr('where ' || where_clause);
    end if;
    pr('group by ' || group_cols);
    pr('order by ' || NVL(order_by,group_cols));
  END define_pivot_query;
  PROCEDURE print_pivot_query
  ( group_cols IN VARCHAR2
  , pivot_col IN VARCHAR2
  , tables IN VARCHAR2
  , value_cols IN VARCHAR2
  , pivot_values IN VARCHAR2 DEFAULT NULL
  , agg_types IN VARCHAR2 DEFAULT NULL
  , where_clause IN VARCHAR2 DEFAULT NULL
  , order_by IN VARCHAR2 DEFAULT NULL
  )
  IS
  BEGIN
    g_mode := 'PRINT';
    define_pivot_query
    ( group_cols
    , pivot_col
    , tables
    , value_cols
    , pivot_values
    , agg_types
    , where_clause
    , order_by
    );
  END;
  FUNCTION pivot_query
  ( group_cols IN VARCHAR2
  , pivot_col IN VARCHAR2
  , tables IN VARCHAR2
  , value_cols IN VARCHAR2
  , pivot_values IN VARCHAR2 DEFAULT NULL
  , agg_types IN VARCHAR2 DEFAULT NULL
  , where_clause IN VARCHAR2 DEFAULT NULL
  , order_by IN VARCHAR2 DEFAULT NULL
  ) RETURN VARCHAR2
  IS
  BEGIN
    g_mode := 'TEXT';
    define_pivot_query
    ( group_cols
    , pivot_col
    , tables
    , value_cols
    , pivot_values
    , agg_types
    , where_clause
    , order_by
    );
    RETURN g_sql;
  END;
  FUNCTION pivot_cursor
  ( group_cols IN VARCHAR2
  , pivot_col IN VARCHAR2
  , tables IN VARCHAR2
  , value_cols IN VARCHAR2
  , pivot_values IN VARCHAR2 DEFAULT NULL
  , agg_types IN VARCHAR2 DEFAULT NULL
  , where_clause IN VARCHAR2 DEFAULT NULL
  , order_by IN VARCHAR2 DEFAULT NULL
  ) RETURN ref_cursor
  IS
    rc ref_cursor;
  BEGIN
    g_mode := 'CURSOR';
    define_pivot_query
    ( group_cols
    , pivot_col
    , tables
    , value_cols
    , pivot_values
    , agg_types
    , where_clause
    , order_by
    );
    OPEN rc FOR g_sql;
    RETURN rc;
  END;
END;

/

Parsing delimited fields in a character string

Often we have a need to parse a character string to get data from fields within it. Of course, SQL Loader handles this nicely, but sometimes we may be getting the data via a different route such as from a table or via UTL_FILE.

The following package facilitates this:

CREATE OR REPLACE PACKAGE parse AS
  /*
  || Package of utility procedures for parsing delimited or fixed position strings into tables
  || of individual values, and vice versa.
  */
  TYPE varchar2_table IS TABLE OF VARCHAR2(32767) INDEX BY BINARY_INTEGER;
  PROCEDURE delimstring_to_table
    ( p_delimstring IN VARCHAR2
    , p_table OUT varchar2_table
    , p_nfields OUT INTEGER
    , p_delim IN VARCHAR2 DEFAULT ','
    );
  PROCEDURE table_to_delimstring
    ( p_table IN varchar2_table
    , p_delimstring OUT VARCHAR2
    , p_delim IN VARCHAR2 DEFAULT ','
    );
END parse;
/
CREATE OR REPLACE PACKAGE BODY parse AS
  PROCEDURE delimstring_to_table
    ( p_delimstring IN VARCHAR2
    , p_table OUT varchar2_table
    , p_nfields OUT INTEGER
    , p_delim IN VARCHAR2 DEFAULT ','
    )
  IS
    v_string VARCHAR2(32767) := p_delimstring;
    v_nfields PLS_INTEGER := 1;
    v_table varchar2_table;
    v_delimpos PLS_INTEGER := INSTR(p_delimstring, p_delim);
    v_delimlen PLS_INTEGER := LENGTH(p_delim);
  BEGIN
    WHILE v_delimpos > 0
    LOOP
      v_table(v_nfields) := SUBSTR(v_string,1,v_delimpos-1);
      v_string := SUBSTR(v_string,v_delimpos+v_delimlen);
      v_nfields := v_nfields+1;
      v_delimpos := INSTR(v_string, p_delim);
    END LOOP;
    v_table(v_nfields) := v_string;
    p_table := v_table;
    p_nfields := v_nfields;
  END delimstring_to_table;
  PROCEDURE table_to_delimstring
    ( p_table IN varchar2_table
    , p_delimstring OUT VARCHAR2
    , p_delim IN VARCHAR2 DEFAULT ','
    )
  IS
    v_nfields PLS_INTEGER := p_table.COUNT;
    v_string VARCHAR2(32767);
  BEGIN
    FOR i IN 1..v_nfields
    LOOP
      v_string := v_string || p_table(i);
      IF i != v_nfields THEN
        v_string := v_string || p_delim;
      END IF;
    END LOOP;
    p_delimstring := v_string;
  END table_to_delimstring;
END parse;
/

This is how you might use it with a standard comma-delimited string:

SQL> declare
  2    v_tab parse.varchar2_table;
  3    v_nfields integer;
  4    v_string varchar2(1000) := '1000,Smith,John,13-May-1970';
  5  begin
  6    parse.delimstring_to_table (v_string, v_tab, v_nfields);
  7    for i in 1..v_nfields loop
  8      dbms_output.put_line('Field('||i||') = '||v_tab(i));
  9    end loop;
 10  end;
11 /
Field(1) = 1000
Field(2) = Smith
Field(3) = John
Field(4) = 13-May-1970

PL/SQL procedure successfully completed.

Enforcing complex constraints in Oracle

Oracle supports various kinds of declarative integrity constraints:
  • Primary Key: Uniquely identifies a row in the table
  • Unique: Other columns that must be unique
  • Foreign Key: Column value must match value in another table
  • Check: Simple single-table, single-row data rules.
Examples of possible check constraints are:
“start_date <= end_date”

“check_ind in (‘Y’,’N’)”
“amount between 0 and 99999.99”

However, many more complex business rules cannot be implemented via the above constraints. For example:

  • Employee Project Assignment start date and end date must fall between Project start date and end date
  • An employee may not have 2 Employee Project Assignments that overlap date-wise
  • A Project cannot have more than 10 Employees Assignments
  • An Employee cannot book time to a Task on a Project to which he is not assigned
  • The manager of a Department must belong to the Department.
These are usually enforced (if at all) procedurally, by one of the following methods:
  • Code in the application screens
  • Code in stored procedures (APIs) called from the application screens
  • Database triggers
These all have disadvantages compared to the declarative approach of constraints:
  • Code in application screens can be bypassed, and also the same rule must often be implemented in many places (e.g. in both the Project screen and the Assignment screen).
  • Code in stored procedures must also often be implemented in many places (e.g. in both the Project package and the Assignment package). Also, to prevent bypassing of the rules, all updates must be done via the package, which limits the functionality available to the user (cannot write ad hoc updates)
  • Code in triggers must also often be implemented in many places (e.g. in both the Project triggers and the Assignment triggers). Also, triggers can often become complex due to work-arounds to avoid mutating tables issues etc.
  • Any procedural solution must explicitly lock records to prevent corruption in a multi-user environment – e.g. if a user is amending an Employee Assignment, then no other user may be allowed to amend Employee Assignments for the same employee, or to amend the Project dates.
This paper shows how such complex constraints can be implemented declaratively, such that the rules are defined once and then applied to all updates from whatever source.

This began with an on-line discussion I had with Tom Kyte about enforcing complex constraints here: http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:21389386132607

Objective

Ideally, given a complex rule such as one of the examples above, we would like to be able to create a complex check constraint (or “assertion” in ANSI SQL terms) such as:

CONSTRAINT c ON TABLE project p
CHECK (NOT EXISTS (SELECT null FROM emp_proj_assign ep
WHERE ep.projno = p.projno
AND (ep.start_date <> p.end_date)));

Such a constraint (if it could be implemented) would raise an error in any of the following situations:
  • User attempts to insert an emp_proj_assign record with dates outside the Project dates
  • User attempts to update an emp_proj_assign record with dates outside the Project dates
  • User attempts to update a Project record with dates that do not encompass all the associated emp_proj_assign dates
However, we can’t achieve that via constraints alone.

Solution: use Materialized Views

Materialized Views (a.k.a. Snapshots) are actually tables that are maintained by Oracle, whose contents correspond to the result of a query – i.e. like a view, but “materialized” because the result is actually stored in a table.

These provide the mechanism we need to implement the complex constraint as follows:
  • Create a materialized view to select data that violates the desired constraint (e.g. assignments where the dates are outside the associated project dates). The MV must be defined with REFRESH COMPLETE ON COMMIT so that it is updated before the end of the transaction.
  • Create a check constraint on the materialized view that always evaluates to FALSE – e.g. CHECK (1=0)
That’s it. Whenever the underlying tables are updated, the materialized view is refreshed. If the update violates the rule, then a row will be inserted into the materialized view; but the check constraint on the MV disallows any inserts into it, and so the transaction fails.

Issues
  • Oracle 8i cannot support REFRESH ON COMMIT on materialized views of the complexity required for some rules. 9i can handle some but not all (does not allow self-joins in MVs).. 10G can handle self-joins, but does not seem to allow subqueries. So this approach cannot be used for all rules.
  • Efficiency: needs to be benchmarked. Is a FAST refresh preferable to a COMPLETE refresh? With COMPLETE we have a query that looks at all rows of the MV’s base tables at the end of every transaction that affects those tables.
  • Cannot create a REFRESH ON COMMIT materialized view with a HAVING clause. For such cases (see example 3 below), the materialized view cannot include the constraint violation (e.g. HAVING COUNT(*) > 10), and so the check constraint must do it (e.g. CHECK (cnt <= 10)). Note that in this case the materialized view will consume space in the database.
Worked Examples

Based on the following tables:

create table project
( projno int primary key
, start_date date not null
, end_date date not null
);
create table emp_proj_assign
( empno int not null
, projno int not null
, start_date date not null
, end_date date not null
, primary key (empno, start_date)
);

1) Rule: An employee cannot have overlapping project assignments.

This is implemented as follows:

create materialized view emp_proj_mv1
refresh complete on commit as
select 1 dummy
from emp_proj_assign ep1, emp_proj_assign ep2
where ep1.empno = ep2.empno
and ep1.start_date <= ep2.end_date
and ep1.end_date >= ep2.start_date;


alter table emp_proj_mv1
add constraint emp_proj_mv1_chk
check (1=0) deferrable;

2) An employee's project assignment dates must fall between the project start
and end dates

create materialized view emp_proj_mv2
refresh complete on commit as
select 1 dummy
from emp_proj_assign ep, project p
where ep.projno = p.projno
and (ep.start_date <> p.end_date);
alter table emp_proj_mv2
add constraint emp_proj_mv2_chk
check (1=0) deferrable;

3) A project may not have more than 10 assignments (in total, ever!):

create materialized view emp_proj_mv3
build immediate
refresh complete on commit as
select projno, count(*) cnt
from emp_proj_assign
group by projno;
alter table emp_proj_mv3
add constraint emp_proj_mv3_chk
check (cnt <= 10)
deferrable;

4) A project cannot have more than 10 employees assigned at the same time.

(I have not yet worked this one out!)