Indexing NULL column values


After reading some great articles from Burleson Consulting about the “is null/is not null” predicates I decided to make some additional tests. It’s a very common mistake to write SQL with this kind of predicates since the NULL values aren’t included in regular indexes (B*Tree, not function-based), so almost every “IS NULL” results in a TABLE FULL SCAN.

I want to test how much a index can improve a IS [NOT] NULL predicate measuring the #requests per minute that a legacy box can serve. In the old days this server was a mission-critical server, but nowadays is retired as test server and can be compared to a very low end box (P3-512M ram).
Well, I measure the time that the select statement takes to execute (using JMeter)  instead of cost because a 30% reduction cost does not mean a 30% time reduction, and usually the end-user speak in terms of time, and there are many “tricks” like moving a table to the KEEP_POOL that can speed-up SQL execution although the cost is the same.

For this test I’ve setup a test table with “create table as select * from dba_objects” and tested the following scenarios:

  • A: select count(*) from test where owner is null
  • B: select count(*) from test where owner is not null

With these indexes:

  1. no index at all
  2. index on owner column (b*tree)
  3. function based index on nvl2(owner,0,1) and rewriting the sql above as “… where NVL2(owner,0,1)=?”

The combinations led to 6 tests. In fact, I repeated every test 3 times in batches of 2k, 20k, and 50k exeutions with 1 concurrent user, but I’ll show only the 50k results. Before starting a test, the database is restarted to avoid caching effects.

The results are that NVL2 indexing wins hands down, about 5x times faster that normal indexing:

Both cases 1a and 1b have this execution plan (varying the #rows in the inner step)

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   161   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST | 50875 |   298K|   161   (2)| 00:00:02 |
---------------------------------------------------------------------------

With a normal index on OWNER column, the IS NULL (2a) has the same plan as 1a/1b

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   161   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST |     1 |     6 |   161   (2)| 00:00:02 |
---------------------------------------------------------------------------

But the IS NOT NULL (2b) query, is improved because Oracle can use the index to read all keys directly, instead of the table.

-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |     6 |    29   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE       |             |     1 |     6 |            |          |
|*  2 |   INDEX FAST FULL SCAN| TEST_INDEX1 | 50875 |   298K|    29   (4)| 00:00:01 |
-------------------------------------------------------------------------------------

In the 3a/3b query, both is null/is not null are solved throught a index fast full scan. But comparing to 2a/2b, the #bytes read in the step 3 is 100 times lower than 2b:

-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |     6 |    25  (12)| 00:00:01 |
|   1 |  SORT AGGREGATE       |             |     1 |     6 |            |          |
|*  2 |   INDEX FAST FULL SCAN| TEST_INDEX1 |   509 |  3054 |    25  (12)| 00:00:01 |
-------------------------------------------------------------------------------------

Detecting full table scans due to this kind of predicates seems to be a great opportunity to reduce the IO and increase the capacity of the database.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: