SQL directives beat SPM (or not?)

Last edited 1-Sept-2016

For quite a time I have been a happy user of SPM(SQL Plan Management) to resolve SQL performance issues by using a lazy man’s approach and Oracle capabilities:

in the latest versions Oracle will generate many SQL plans in it’s search of a better SQL plan. All you have to do is to select the one which makes sense and load it into SPM.

Until one day a valid and enabled SPM would stop to enforce the chosen plan. dbms_xplan.display_cursor would  show “Sql Plan Directive used for this statement”

Deleting the Sql Plan Directive on the objects concerned has solved the issue.

Apparently, SQL Directives take precedence over SPM.

Here are some answers I’ve got from Oracle:

https://blogs.oracle.com/optimizer/entry/how_to_use_sql_plan

BEGIN EDIT 1-Sept-2016:

Here are quite interesting thoughts of Jonathan Lewis https://jonathanlewis.wordpress.com/2016/08/02/adaptive-mayhem/#comments

Analyzing 10053 does support his statement “Anything short of a completely specificied plan could be modified by dynamic sampling, so I think an outline would be much more likely to be affected by an SQL directive than an SQL Plan Baseline since outlines tend to be less well specified”.

10053 Trace logs SPD part at the very end of the trace file, meaning SPD takes precedence over – over what ? That’s a good question.  SPD must be SQL Plan Directives. Looking into the SPD part of the trace one finds hinted SQL:

SPD: BEGIN context at statement level
…
******* UNPARSED QUERY IS *******
SELECT /*+ LEADING (“T4” “T1” “T2” “T3”) FULL (“T4”) USE_HASH (“T3”) FULL (“T3”) USE_HASH (“T2”) FULL (“T2”) USE_HASH (“T1”) FULL (“T1”) */ …
…
Permutations for Starting Table :0
Join order[1]: T4[T4]#0 T1[T1]#1 T2[T2]#2 T3[T3]#3

Those are SQL Plan Baseline hints. Still, SPD acronym looks suspicious. At the very least, this part of CBO probably considers both SQL Directives and SQL Plan Baselines.

END EDIT 1-Sept-2016

All that said, one thing is clear. IT guy must always ask one simple question: what is the precedence?

In Oracle CBO (Cots Based Optimizer) we have many factors:

Statistics (don’t forget System Statistics)

Dynamic Statistics

Hints

Outlines

SQL Profiles

SQL Patches

SPM

SQL Directives

For quite a time the precedence of factors influencing CBO was rather self-evident and had no need to be documented explicitly. Not any longer.

 

 

 

 

 

 

Posted in Uncategorized | Leave a comment

Oracle database on Amazon RDS

Adding more on 8-Sept-2016:

Adding more on 2-Sept-2016:

Time to throw a few notes about Oracle database on Amazon RDS.

  • It works
  • Issues getting access to data files: file system access is generally forbidden on RDS for reasons understandable. However, imagine you ran into  RDS limits. For example, you need a one-off patch (which RDS would not allow you to install). You need to escape RDS quickly into a normal Oracle database. You can exp/edpdp/imp/impdp/GoldenGate/Amazon Migration Service/SQL interface. But SQL interface is sloooow. Here you are – caught by RDS for good.
    NOTE: colleague of mine claims  Amazon already has native MS SQL Server backup into S3. So far it seems to be a full db backup with no control to have incremental or transaction log backups. However, if one day Amazon implements Oracle native RMAN backups into S3 then a rolling restore becomes an option – that would open an RDS esacape window.
    NOTE2: Amazon seems to be supporting MySQL and Maria RDS binary log based replication into external database:  http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Exporting.NonRDSRepl.html
    I can confirm that Aurora RDS (Aurora is about MySQL) successfully replicates into another Aurora RDS in another region. This Pythian link was instrumental to get it working, however – Amazon document has a nasty bug:
    https://www.pythian.com/blog/creating-external-slave-live-aws-aurora-instance/
  • Once RDS starts doing something (backuping/creating/patching/restoring/etc) you have very little idea how long it is going to take. Example, restoring Aurora database from a snapshot – you get status “creating”, you get button logs, but there are no logs yet.
  • Access to Oracle redo log files. Take a look here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.Oracle.html
    Amazon document writers do not even get that Oracle redo logs are light yeas away from Oracle trace and audit logs.
    Log miner: can do it. Log retention can be set too: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.CommonDBATasks.html#Appendix.Oracle.CommonDBATasks.RetainRedoLogs
  • Character sets. Amazon does support a number of them but a few older ones are not included.
  • Issues setting hidden parameters (I failed) – note, some hidden parameters have been exposed by Amazon (i.e. can be changed).
  • Can’t change  some parameters like service name. I failed to add service name. Want to rename database – it’s copy&paste (i.e. back database and restore as a different thing – do this on a few terabytes to get taste of it). Want to add service name for application purposes (resource management, end-to-end user tracing) – no no.
  • Tablespace growth control. So far I recommend using bigfile tablespaces. Then you can execute “alter tablespace … autoextend on/off maxsize …” If you use small file tablespace then “alter database” raises privileges error immediately. Btw, Amazon default Oracle RDS is keen to let you grow (and pay) – it has the maximum limit on bigfile tablespace, SYSTEM tablespace including. Let it be with payments but letting database grow out of control will end badly for your database (and you personally).
  • One-off patches are no no (as was mentioned already). I do not know a good RDS escape strategy. RDS is a trap. I personally would like to hear if Amazon allowed a single RDS customer to get hands on RDS datafiles and migrate for good whererver he or she wished – that would be nice of Amazon.
  • Database Flashback – I failed to get it running. I assume it can’t.
  • Some performance questions linger regarding High Availability and Disaster Recovery implementation – the Multi Availability Zones option. It’s an open secret what technology Amazon is using (not a Data Guard, which is stated openly). Then it is File/Storage block replication vs redo log replication. I do assume that redo log [vectors] replication does require way less bandwidth and has a better latency.
  • MultiAZ RDS failover happens within minutes (less than 5 minutes).
  • RMAN backups. What RMAN backups ? RMAN restore of a particular failed block, data file, tablespace, ctonrol file, log file – no no. Amazon RDS restore granularity is the whole database or nothing. How long will the restore take: very little idea. However, you can read logs.
  • Amazon RDS point in time restore is of 5 minutes granularity, mine and other people tests seem to confirm this number
  • Can you freeze RDS database backup so that it does not expire at the end of backup retention period ? Seems like not: you can freeze database snapshot, true. But I did not find how to freeze backups of database archive logs. Here is what Amazon says:   http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html
    “You can restore to any point in time during your backup retention period”
    If this is truly so then it is a serious limitation for some testing scenarious.
  • Take a look into v$datafile. I get it that Amazon attaches a single EBS volume for your datafiles. A single EBS volume has it’s limits of IOPS. On Amazon you combine many EBS volumes to get IOPS you need. Throughput (bandwidth in Amazon terms as EBS is attached to network interfaces) is another concern – it’s limited to Amazon instance network card, naturally. Forget about gigabytes per second. About 700MB/sec EBS throughput is maximum what I was told regarding normal Oracle installation on one of high end Amazon instances. RDS maximum is even less.
  • Upgrades. I have upgraded 12.1.0.1 SE1 into 12.1.0.2 SE2. It took about an hour on 2 core (not damned vCPU but CORE!) server.
    Amazon did a few backups underway – even incremental backups do take time. TBD: perhaps those backups involve only archive log backups?
    The fact you have no control over backups kicking in is unnerving.
    One note: When doing upgrade on a rather small instance db.m3.large the upgrade took a staggering 4 and a half hours. Note, that was a skeleton test database. “one off pacthes and performance improvements” phase kept improving the performance for exactly 3.5 hours. Perhaps it was caused by instance having just one core.
  • Now, the paranoid part. If anything happens with the database then you have no access to the RDS host operating system. You are at Amazon mercy.

Ok, enough for the Bad and Ugly. What about the Good. Provisioning is reasonably fast. Not light fast but still fast. Backups and restores do work. Failover does work. Upgrade – the one I have done (see above) – was successful. Some basic monitoring is included. Overall, Oracle RDS is a very reasonable offer for application that can get away with what was said above. 24×7 aspect is the most troublesome thing for me: as I has already said, if anything non-trivial happens with your database then RDS is a black box.

Posted in Uncategorized | Leave a comment

Recreate/migrate user with password: CREATE USER “USER1” IDENTIFIED BY VALUES ‘S:…’

Just execute


select sys.dbms_metadata.get_ddl('USER', 'USER1') from dual;

to get the new fancier “identified by values” syntax – remember, Oracle has changed and made stronger database user passwords. The old “identified by values” won’t work anymore.

 

 

 

Posted in Uncategorized | Leave a comment

log writer latency

if lgwr latency is more important than throughput then as of 11.2.0.4(and 3) consider the following:

– apparently log_buffer size governs group commit batch size. A bigger log buffer means bigger write batch size which means longer session waits on commit. See lgwr warnings of over 500ms log writes under heavier throughput: the bigger log buffer size – the bigger warning size is reported

– 11g lgwr polling might increase log file sync wait. Use parameter _use_adaptive_log_file_sync to disable polling

– Consider binding processes lgwr and nssN (this is a data guard process) to dedicated cpu’s

Posted in Uncategorized | Leave a comment

Oracle Solaris Sun pwrite performance investigation

1. Collect explorer
**********************

Run explorer in the following manner :

/opt/SUNWexplo/bin/explorer -w default

The explorer data collector may be obtained from :

http://www.oracle.com/us/support/systems/premier/services-tools-bundle-sun-systems-163717.html

Download from : https://support.us.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1153444.1

Useful documents on explorer :

Oracle Explorer Data Collector – Product Information Center ( Doc ID 1312847.1 )
Oracle Services Tools Bundle (STB) – Explorer, SNEEP, LWACT, RDA ( Doc ID 1153444.1 )
Oracle Explorer Commands ( Doc ID 1003863.1 )
How to Run Oracle Explorer and Forward the Data to an Oracle Service Engineer ( Doc ID 1018748.1 )
How to Upload Files to Oracle Support ( Doc ID 1547088.2 )
Oracle Diagnostic File Upload ( Doc ID 1547088.2 )
My Oracle Support Help – Attaching Support Files to a Service Request ( Doc ID 1447472.1 )
How to Send a File to Oracle Support Using SFTP (sftp.oracle.com) ( Doc ID 464666.1 )
How to Collect a Live Solaris Kernel Dump ( Doc ID 1004608.1 )

GUDS – A Script for Gathering Solaris Performance Data ( Doc ID 1285485.1 )

*************************************
2. Collect guds using this options
*************************************

Collecte GUDS in GLOBAL Zone ONLY

# # ./guds_3_3 -q -X3 -c15 -i30 -n5 -w0 -T -H0 -L10 -r -s -D/var/tmp -d “Change this comment to something useful about the performance at the time this collections was made.”

Make sure ‘guds’ is run during the time that the problem is occurring.

If Oracle DB is part of performance issue, please collect AWR data for SAME time period as GUDS has been collected. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Guds may be obtained from this document : GUDS – A Solaris Performance Data Gathering Script ( Doc ID 1285485.1 )

***********************************
3. Respond to these questions
***********************************


Please write up a detailed problem description; including answers
to each of the questions below:

– what is the problem you wish to have Oracle address?

Identify the application, and what specifically is having the problem;
if latency, is this for a batch job, or user interaction, etc. ?

– when did the problem first occur?

– what is this system used for, and when is the problem occurring?

– what data have you collected that characterizes the level of system
performance as a problem?

What is slow?

How much slower is it than expected?

– have there been any changes that might help understand the origin of
the problem?

– what is affected, how long, other applications that may be running at
the time, any hardware or software changes (patches …), tuning,
change in workload, change in storage, any error messages, other
systems affected, ….?

– are there any similar systems that you might expect to have the problem
but do not?

The idea is to give us as complete a picture of the system and problem as
possible from the problem description.

Posted in Uncategorized | Leave a comment

Disable indexes for data load

investigate index disable options in Oracle 11g.

Short answer – it works mostly nicely.

Example (a full test script is given bellow):

alter index test1_i1  unusable;

insert /*+append*/ into test1 select * from dba_objects where rownum<=1000;
commit;

alter index test1_i1  rebuild;

This method saves you programmming: indexes can be created in advance.
The only things you have to do are:
disable (unusable)  them before insert.
Enable (rebuild) them after insert.

Now an important part:
A unique index still has to be dropped. Oracle takes care to guard you against error in case your data load code relies on unique index to catch unique constraints violation errors.

However there is an exception: if unique index also has a unique constraint associated with it then a programmer can explicitely disable the constraint and take the responsibility:

alter table test1 add constraint test1_i2 primary key    (owner, object_name, object_type);

alter index test1_i2 unusable;
alter table test1  disable constraint test1_i2  ;

And then the insert works:
insert /*+append*/ into test1 select * from dba_objects where rownum<=1000;

But of course you will detect the “unique key violated” errors only after trying to enable the constraint:

alter table test1  enable constraint test1_i2  ;

Full test script:

create table test1 as select * from dba_objects where rownum<=1000;

create index test1_i1 on  s28543.test1 (owner);

truncate table test1;

drop index test1_i2 ;
create UNIQUE index test1_i2 on  test1 (owner, object_name, object_type);
alter table test1 add constraint test1_i2 primary key    (owner, object_name, object_type);

alter index test1_i1  unusable;
alter index test1_i2 unusable;
select * from dba_constraints where table_name=’TEST1′;
alter table test1  disable constraint test1_i2  ;

alter table test1  enable constraint test1_i2  ;

insert /*+append*/ into test1 select * from dba_objects where rownum<=1000;
commit;
select status from dba_indexes  where index_name=’TEST1_I1′;
rem UNUSABLE

insert into test1 select * from dba_objects where rownum<=1000;
commit;

select status from dba_indexes  where index_name=’TEST1_I1′;
alter index test1_i1  rebuild;
select status from dba_indexes  where index_name=’TEST1_I1′;

Posted in Uncategorized | Leave a comment

unaccounted time in v$sql elapsed_time >> cpu_time + user_io_wait_time + concurrency_wait_time + etc

Regarding unaccounted time in v$sql, when elapsed_time is much bigger than cpu_time + user_io_wait_time + concurrency_wait_time + etc.

One reason: Log writer waits are not included into user_io_wait_time – it is truly a USER io time.

Test it: Stop log writer: kill -s SIGSTOP <lgwr_pid>. See how elapsed_time increases but other times stay unchanged.

Another credible reason: OS is losing time on paging (swapping)

 

 

Posted in Uncategorized | Leave a comment