Tony Andrews on Oracle and Databases: What does GROUP BY CUBE do?

This is one of those "new" features that has actually been around for a long time, but I had never used it. Then recently I was asked to tune a long-running report, and it occured to me that it might just be what I needed.

The report consisted of 4 summary queries, each of which presented the same data but summarised by a different "dimension". Imagine it was based on the EMP table, then the report would show:

Employee Summary by Job
-----------------------

Job                                          # Emps  Total Sal
---------------------------------------- ---------- ----------
CLERK                                             4       4150
ANALYST                                           2       6000
MANAGER                                           3       8275
SALESMAN                                          4       5600
PRESIDENT                                         1       5000
                                         ---------- ----------
                                                 14      29025
                                                 

Employee Summary by Manager
---------------------------

Mgr                                          # Emps  Total Sal
---------------------------------------- ---------- ----------
7566                                              2       6000
7698                                              5       6550
7782                                              1       1300
7788                                              1       1100
7839                                              3       8275
7902                                              1        800
                                         ---------- ----------
                                                 14      29025

Employee Summary by Department
------------------------------

Dept                                         # Emps  Total Sal
---------------------------------------- ---------- ----------
10                                                3       8750
20                                                5      10875
30                                                6       9400
                                         ---------- ----------
                                                 14      29025

That is only 3 queries, but you get the idea.

Each query took 10 minutes to run (reasonable given the complexity of the query and the amount of data), so the whole report took 40 minutes.

My goal was to reduce the time to more like 10 minutes - perhaps by pre-computing the results grouped by all 4 dimensions into a temporary table and then reporting on that. However, it occured to me that GROUP BY CUBE might be what I needed, so I tried it out. Here is a simple query:

SQL> select job, mgr, deptno, count(*) numemps, sum(sal) tots
  2  from emp
  3  group by cube(job, mgr, deptno);

JOB              MGR     DEPTNO    NUMEMPS     TOTSAL
--------- ---------- ---------- ---------- ----------
                                         1       5000
                                        14      29025
                             10          1       5000
                             10          3       8750
                             20          5      10875
                             30          6       9400
                7566                     2       6000
                7566         20          2       6000
                7698                     5       6550
                7698         30          5       6550
                7782                     1       1300
                7782         10          1       1300
                7788                     1       1100
                7788         20          1       1100
                7839                     3       8275
                7839         10          1       2450
                7839         20          1       2975
                7839         30          1       2850
                7902                     1        800
                7902         20          1        800
CLERK                                    4       4150
CLERK                        10          1       1300
CLERK                        20          2       1900
CLERK                        30          1        950
CLERK           7698                     1        950
CLERK           7698         30          1        950
CLERK           7782                     1       1300
CLERK           7782         10          1       1300
CLERK           7788                     1       1100
CLERK           7788         20          1       1100
CLERK           7902                     1        800
CLERK           7902         20          1        800
ANALYST                                  2       6000
ANALYST                      20          2       6000
ANALYST         7566                     2       6000
ANALYST         7566         20          2       6000
MANAGER                                  3       8275
MANAGER                      10          1       2450
MANAGER                      20          1       2975
MANAGER                      30          1       2850
MANAGER         7839                     3       8275
MANAGER         7839         10          1       2450
MANAGER         7839         20          1       2975
MANAGER         7839         30          1       2850
SALESMAN                                 4       5600
SALESMAN                     30          4       5600
SALESMAN        7698                     4       5600
SALESMAN        7698         30          4       5600
PRESIDENT                                1       5000
PRESIDENT                    10          1       5000
PRESIDENT                    10          1       5000

52 rows selected.

Buried in amongst all those rows is all the data I need. There is a GROUPING function that can be used to see which group a row belongs to:

SQL> select job, mgr, deptno, count(*) numemps, sum(sal) totsal
  2  ,      grouping(job) gj, grouping(mgr) gm, grouping(deptno) gd
  3  from emp
  4  group by cube(job, mgr, deptno);

JOB              MGR     DEPTNO    NUMEMPS     TOTSAL         GJ         GM         GD
--------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
                                         1       5000          1          0          1
                                        14      29025          1          1          1
                             10          1       5000          1          0          0
                             10          3       8750          1          1          0
                             20          5      10875          1          1          0
                             30          6       9400          1          1          0
                7566                     2       6000          1          0          1
                7566         20          2       6000          1          0          0
                7698                     5       6550          1          0          1
                7698         30          5       6550          1          0          0
                7782                     1       1300          1          0          1
                7782         10          1       1300          1          0          0
                7788                     1       1100          1          0          1
                7788         20          1       1100          1          0          0
                7839                     3       8275          1          0          1
                7839         10          1       2450          1          0          0
                7839         20          1       2975          1          0          0
                7839         30          1       2850          1          0          0
                7902                     1        800          1          0          1
                7902         20          1        800          1          0          0
CLERK                                    4       4150          0          1          1
CLERK                        10          1       1300          0          1          0
CLERK                        20          2       1900          0          1          0
CLERK                        30          1        950          0          1          0
CLERK           7698                     1        950          0          0          1
CLERK           7698         30          1        950          0          0          0
CLERK           7782                     1       1300          0          0          1
CLERK           7782         10          1       1300          0          0          0
CLERK           7788                     1       1100          0          0          1
CLERK           7788         20          1       1100          0          0          0
CLERK           7902                     1        800          0          0          1
CLERK           7902         20          1        800          0          0          0
ANALYST                                  2       6000          0          1          1
ANALYST                      20          2       6000          0          1          0
ANALYST         7566                     2       6000          0          0          1
ANALYST         7566         20          2       6000          0          0          0
MANAGER                                  3       8275          0          1          1
MANAGER                      10          1       2450          0          1          0
MANAGER                      20          1       2975          0          1          0
MANAGER                      30          1       2850          0          1          0
MANAGER         7839                     3       8275          0          0          1
MANAGER         7839         10          1       2450          0          0          0
MANAGER         7839         20          1       2975          0          0          0
MANAGER         7839         30          1       2850          0          0          0
SALESMAN                                 4       5600          0          1          1
SALESMAN                     30          4       5600          0          1          0
SALESMAN        7698                     4       5600          0          0          1
SALESMAN        7698         30          4       5600          0          0          0
PRESIDENT                                1       5000          0          0          1
PRESIDENT                                1       5000          0          1          1
PRESIDENT                    10          1       5000          0          0          0
PRESIDENT                    10          1       5000          0          1          0

Based on that, I can filter out the rows I'm not interested in and format the results like this:

SQL> select case when gj=0 and gm=1 and gd=1 then 'Job'
  2              when gj=1 and gm=0 and gd=1 then 'Manager'
  3              when gj=1 and gm=1 and gd=0 then 'Dept'
  4              end as keytype
  5  ,      case when gj=0 and gm=1 and gd=1 then job
  6              when gj=1 and gm=0 and gd=1 then to_char(mgr)
  7              when gj=1 and gm=1 and gd=0 then to_char(deptno)
  8              end as keyval
  9  ,      numemps, totsal
 10  from
 11  (
 12  select job, mgr, deptno, count(*) numemps, sum(sal) totsal
 13  ,      grouping(job) gj, grouping(mgr) gm, grouping(deptno) gd
 14  from emp
 15  group by cube(job, mgr, deptno)
 16  )
 17  where (gj=0 and gm=1 and gd=1)
 18  or    (gj=1 and gm=0 and gd=1)
 19  or    (gj=1 and gm=1 and gd=0);

KEYTYPE KEYVAL                                      NUMEMPS     TOTSAL
------- ---------------------------------------- ---------- ----------
Manager 7566                                              2       6000
        7698                                              5       6550
        7782                                              1       1300
        7788                                              1       1100
        7839                                              3       8275
        7902                                              1        800
                                                          1       5000
*******                                          ---------- ----------
sum                                                      14      29025
Job     ANALYST                                           2       6000
        CLERK                                             4       4150
        MANAGER                                           3       8275
        PRESIDENT                                         1       5000
        SALESMAN                                          4       5600
*******                                          ---------- ----------
sum                                                      14      29025
Dept    10                                                3       8750
        20                                                5      10875
        30                                                6       9400
*******                                          ---------- ----------
sum                                                      14      29025

... which gives me the breakdown I want in a single query, and provides the performance boost I was hoping for!

4 comments:

Anonymous said...: nice!

the combined sql is ugly, but hey, the performance improvement is more important, isn't it

but what happens if the application adds another job type or dept? have you created a maintenance nightmare?; Sunday, September 11, 2005
Tony Andrews said...: > but what happens if the application adds another job type or dept? have you created a maintenance nightmare?

No, nothing in the SQL depends on the particular jobs or depts in the data. The only need for maintenance would be if a completely new "dimension" was required, such as "grade" or "project".; Sunday, September 11, 2005
Tony Andrews said...: Actually, that final SQL can be simplified a little to:

SQL> select case when gj=0 then 'Job'
2 when gm=0 then 'Manager'
3 when gd=0 then 'Dept'
4 end as keytype
5 , case when gj=0 then job
6 when gm=0 then to_char(mgr)
7 when gd=0 then to_char(deptno)
8 end as keyval
9 , numemps, totsal
10 from
11 (
12 select job, mgr, deptno, count(*) numemps, sum(sal) totsal
13 , grouping(job) gj, grouping(mgr) gm, grouping(deptno) gd
14 from emp
15 group by cube(job, mgr, deptno)
16 )
17 where gj+gm+gd = 2; Sunday, September 11, 2005
Rob van Wijk said...: Instead of aggregating and later on filtering out some of the aggregates, you can use grouping sets to exactly specify which groups you want. Even more readable, in my opinion.

SQL> select case
2 when grouping(job)=0 then 'Job'
3 when grouping(mgr)=0 then 'Manager'
4 when grouping(deptno)=0 then 'Dept'
5 end as keytype
6 , case
7 when grouping(job)=0 then job
8 when grouping(mgr)=0 then to_char(mgr)
9 when grouping(deptno)=0 then to_char(deptno)
10 end as keyval
11 , count(*) numemps
12 , sum(sal) totsal
13 from emp
14 group by grouping sets (job,mgr,deptno); Monday, September 17, 2007

Tony Andrews on Oracle and Databases

Wednesday, August 24, 2005

What does GROUP BY CUBE do?

4 comments:

Total Pageviews

About Me