Optimizer Statistics part #2

See Statistics gathering part #1 here

The more I think about optimizer statistics, the better I imagine their critical role to select the best execution plan for an SQL query. Statistics provide details about the database its object used by the optimizer, according to documentation. Categories include table statistics such as numbers of rows, index statistics such as B-tree levels, system statistics such as CPU and I/O performance, and column statistics such as number of nulls.

Without them, the optimizer will assume uniform data value distribution and will estimate row source size with no correlation to data, stored in a database. Hence keeping optimizer statistics up-to-date guarantees that they accurately represent the data that’s stored in the database

Statistics on tables and indexes are maintaned with DBMS_STATS package subprograms. In some circumstances, such as for cross-checking, better not to publish statistics immediately. If that’s the case, before statistics gathering it requre to postpone its publishing:

EXEC DBMS_STATS.set_table_prefs('HR', 'EMPLOYEES', 'PUBLISH', 'false');

Optionally need to delete statistics:

EXEC DBMS_STATS.DELETE_INDEX_STATS(OWNNAME=>'HR', INDNAME=>'EMP_BIG_HIRE_DATE_IX');

Or to import statistics:

EXEC DBMS_STATS.IMPORT_TABLE_STATS (ownname => 'HR', tabname => 'EMPLOYEES',stattab => 'STATTAB',statown =>'HR', cascade => TRUE);

Or to gather new statistics:

EXEC DBMS_STATS.GATHER_INDEX_STATS(ownname =>'HR', indname =>'EMP_BIG_HIRE_DATE_IX', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, degree => DBMS_STATS.DEFAULT_DEGREE);

Of course, it possible to run an SQL statement against pending statistics by enable it:

ALTER SESSION SET OPTIMIZER_USE_PENDING_STATISTICS=TRUE;

To publish all pending statistics for the particular table:

EXEC DBMS_STATS.publish_pending_stats('HR','EMPLOYEES');