When you execute a query in an Oracle database, Oracle has to decide how to retrieve that data. That’s where the query optimizer steps in. The query optimizer makes decisions like which indexes to use, weather to perform a full table scan etc. based on the tables, columns, calculations and joins in a query. Oracle can do this quite efficiently with the cost-based optimizer, but it is important that there be accurate statistics available for it to use.
A short history
Originally the query optimizer worked on a static set of around 20 rules. These rules would be applied regardless of the size and type of data in a table.
Oracle 7 introduced the cost-based optimizer which can make more intelligent optimization decisions. By analyzing pre-gathered statistics on database objects the cost-based optimizer estimates the “cost” of processing several possible execution plans. The cost-based optimizer then chooses the cheapest execution plan and the database executes the plan.
In Oracle Database 7 through 9i either the cost-based or rule-based optimizer could be used. The rule-based optimizer is no longer included in Oracle 10g.
How can I tell which optimizer mode I am using
If you are running Database 10g or later, you are using the cost-based optimizer.
If you are running Database 7 through 9i you should check the
SQL> show parameter optimizer_mode
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_mode string CHOOSE
RULE means your database is using the old rule-based optimizer. The good news is you do not have to gather statistics on your data. The bad news is your queries probably aren’t running as well as they could.
CHOOSE was introduced as a stop-gap between rule- and cost-based optimizers. If there are no statistics available, the rule-based optimizer will be used; however if statistics are available the query optimizer will default to cost-based mode.
ALL_ROWS will force your database to use the cost-based optimizer regardless of statistics, so you’d better gather them.
How can I gather statistics for my index, table, schema, database, etc.?
The DBMS_STATS package is used to gather statistics for the cost based optimizer. Historically the ANALYZE command would perform similar operations
Here are a few popular examples. Of course you should always consult the documentation for your Oracle distribution before using a new command.
These can be run through SQL*Plus, but you will probably want to automate them for more active databases. There are more options that I have chosen to show here, but these should be a good start.
This will analyze statistics for your entire database. It is likely to take quite a while (hours) and generally should not be necessary, but if you want to analyze the whole database this will do it with one command.
EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'JEMMONS')
This will analyze statistics for everything owned by the user ‘JEMMONS’. It is important to put the username in single quotes and all capitol letters. You can exclude the parameter and parentheses to analyze the current user’s statistics.
EXECUTE DBMS_STATS.GATHER_TABLE_STATS(ownname => 'JEMMONS', tabname => 'ENROLLMENT_DATA')
This will gather statistics for a specific table and all its indexes. This may be a good idea on tables which change drastically on a regular basis.
EXECUTE DBMS_STATS.GATHER_INDEX_STATS(ownname => 'JEMMONS', indname => 'ENROLLMENT_CRSE_NUMB')
This will gather statistics on a specific index. If there is a need to drop and rebuild an index you could use this to re-analyze the index after rebuild.
The GATHER AUTO option can and should be added to the commands above after initial analysis. This will cause only objects with missing stats or more than 10% changed since last analysis (via insert, update or delete) to be analyzed. The resulting command should look something like the following:
EXECUTE DBMS_STATS.GATHER_DATABASE_STATS(options => 'GATHER AUTO')
How often should I gather statistics?
This is a question I cannot answer. I have schemas which do not change often that I may analyze once a month, others that I will gather new statistics on once a day. The GATHER AUTO option should be used to automatically gather missing and stale statistics rather than re-analyzing everything; however, sometimes there may be an advantage to re-analyzing an entire schema or database.