Tuesday, September 25, 2007

Clean your data

Whenever I am helping people create tables, I always try to get the data as clean as possible before it is inserted. There all several different degrees of cleaning, but what I look for is putting things in a usable format. For example if you find yourself always having to do a TRIM or an UPPER function on field in your where statement then that data probably should have been cleaned before it got inserted. A simple function on their indexed column makes that index useless.

Something like this would never use a standard index.

Select * from participant
where upper(name) like 'SMITH, JOHN%'

If I can't track down all the code doing the data modificiations on the table then I will use table triggers. This will trim or same case the data for the table. It is much better to not use triggers if you can get away with it. While triggers will get the job done, people often forget about them and they become the magic behind the scenes. This can lead to unintended conconsequences too. Like slowing down any bulk operations you do against that table.

However, sometimes you don't have the ability to change the table data. If this is the case then I will consider building a functioned based index. Since Oracle 8i, Oracle has had the ability to create an index based on a function.

I'll create a simple test
create table
foobar
as select *
from all_objects;

create index foobar_idx1 on foobar (object_name);

BEGIN
dbms_stats.gather_table_stats(ownname=>'XXXXXX',
tabname=>'FOOBAR',estimate_percent=>100,cascade=>TRUE);
END;


Here is my normal query
select * from foobar where object_name like 'BLAH%'

0 SELECT STATEMENT 2 188
1 TABLE ACCESS BY INDEX ROWID FOOBAR 2 188 *
2 INDEX RANGE SCAN FOOBAR_IDX1 2

But lets say, I have to upper object_name because it has mixed case in it

select * from foobar where upper(object_name) like 'BLAH%'

Now I am back to a full table scan. Ouch.

0 SELECT STATEMENT 2589 237K 166 (3) *
1 TABLE ACCESS FULL FOOBAR 2589 237K 166 (3)

Now I will create a function based index.

create index foobar_idx2 on foobar (upper(object_name));

BEGIN
dbms_stats.gather_table_stats(ownname=>'XXXXXX',
tabname=>'FOOBAR',estimate_percent=>100,cascade=>TRUE);
END

0 SELECT STATEMENT 2 196
1 TABLE ACCESS BY INDEX ROWID FOOBAR 2 196 *
2 INDEX RANGE SCAN FOOBAR_IDX2 2

Now its using an index. Things should be much faster now.

Friday, September 21, 2007

Thanks Tom

Tom Kyte explained a problem on his blog that had been bugging me all summer long. There was this little gremlin in an application that was popping up every few weeks. I traced the problem down to a single query that seemed for whatever reason to loose its mind. I had figured out that the problems always happened after stats had been run on the tables. In fact, it had been discovered that doing stats again usually fixed the situation. There seemed to be something fishy with stats, but I knew things didn't work that way. However, I never made the next leap that Tom did. The problem as it turns out was caused by over binding. All the constants in the where clause were variables. The developers were using them to document their code. The table in question was very simple. Consisting of just two columns. An id and a status. The status field was highly skewed. 99% of the table had a Processed status and 1% had an Unprocessed status. There was even a histogram so Oracle would know that the field was skewed.

The query that was misbehaving looked like this

select *
from sometable
where status = c_unprocessed;

What I didn't realize is that there was another process that was slightly different.

select *
from sometable
where status = c_processed;

Now with the histogram, Oracle would peek at the variable value and know whether or not it should use an index. The plan would get stored to the SQL cache. It probably stayed there for weeks since the processes ran every couple of hours. The act of doing stats caused the plan to drop out of the SQL cache so that a new plan could be done. The problem is that only 1 plan is stored for both of those SQL statements. Whichever one runs first after the stats get done is the plan that makes it to the SQL cache. Bingo, that explained the crazy behavior.
The fix was very simple. Just remove those variables from the code.

select *
from sometable
where status = 'U';

select *
from
sometablewhere status = 'P';

Wednesday, September 19, 2007

Functions

Over the years I have come across lots of developers. When they discover they can use functions in SQL statements they end up using them every where. Coming from that background myself I can see how they can be very seductive. The idea was always to try to reuse as much code as possible. You wanted to make things as modular as possible. Functions allowed you do that. Using functions you could effectively outer join a table or tables to your main query. They seem great by allowing you to cut your query into chunks and even reuse the pieces. Of course, when you see how a function laden query performs you start to see the problem. Developers never believe me when I say they need to remove some or all the functions in the query. I am not one that says you can never use you a function. You just need to be prepared for how much performance functions can suck up. The bigger your resultset, the worse your performance.

Here is a very simple example.

function fn_1 (p1 in number) return number is
v_cnt number;
begin
select count(distinct col3) into v_cnt
from sometable2
where col1 = p1;
return v_cnt;
end;

Select col1, fn_1(col1) fn1
from sometable1 a
where col2 = 'blah';

I had actually come across one query that had 8 functions in it. The developer wanted me to do my magic and make the database go faster. Of course, its not the database having issues its the SQL statement.

A simple solution to my example above is something like this.

Select col1, count(distinct b.col3) cnt
from sometable1 a, sometable2 b
where a.col1 = b.col1 (+)
and a.col2 = 'blah'
group by a.col1;

A single query like this allows the database to evaluate the full task and do that SQL operation. When you have something like the original query, the database executes the query then loops through and executes the function for each row. Looking at the explain plan for the original query you would never see the work that is being done in the function. The function's work would be hidden. You would have to pull out the SQL and run explain plans on each query inside the function to know what its doing. Every step you do is one more step that slows down the query. Developers love using functions and Oracle has responded by adding a special cache in 11g that will help speed up function laden queries. Of course even with a cache you will never go faster than the just doing things in one query. The cache would help functions that get called with the same input often. So if your function took the 2 character state code and returned the full state name, its possible that most of those calls would be cached after a while. However something like taking in an account number and returning a balance would probably never be in the cache.

The main thing know about functions that reference tables being used in SQL is to try to not use them if at all possible, functions that are passed parameters and do logic with them can make the main query simpler. Ask "Is there a table in my function?" Yes - bad. No - good. Sometimes it makes sense to use them and its nice to have the flexibility to do that. I would say using them should be the exception for your code and not the rule.

Monday, September 17, 2007

First Post

I am new to the blog thing, so I'll see how all this goes. First the background. I am an Oracle DBA. I work very closely with the developers to make sure that their SQL and table designs are good. I have done this job for 5 years or so. Before that I was a developer coding against Oracle databases. Back further I did Sybase. Even further back I did IMS and Cobol.

The purpose of this blog is to share some of my knowledge that I have gained in working with Oracle to a wider audience. I am the first to admit that I am no guru. In fact, it seems I am always learning something new about the database. I find myself giving database lessons to developers on a one on one basis. I have no issue with teaching. Only that 1 or 2 people get the benefit of the lesson. So this little blog will allow others to benefit from these lessons.