Nigel Thomas on Achtung! Naga Vegesna on v$rman_status slow 11gR2 Modify Baseline Wind… on Oracle 11gR2 new AWR views and… laimisnd on 11g ora-1555 on activated stan… Jakub Wartak on 11g ora-1555 on activated stan…
- January 2017
- November 2016
- August 2016
- October 2014
- September 2014
- July 2014
- June 2014
- May 2014
- April 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- August 2013
- April 2013
- December 2012
- November 2012
- October 2012
- February 2012
- November 2011
- September 2011
- July 2011
- June 2011
- April 2011
- March 2011
Take a look into TIME-WAIT (sometimes spelled like TIME_WAIT by netstat utility) TCP/IP state:
Net effect of how TCP/IP protocol works is that one client can generate about 500 database connections per second and then it [client] will hit OS limit. The number 500 comes from 30K ports available / 60 TIME-WAIT seconds.
Who would do that in the database world you may ask. Answer: if it can be done it will be done. I personally came across 3 applications in the last month doing exactly that, one finally hit the 30K connections limit in TIME-WAIT state.
Truth be told those were MySQL applications. MySQL is pretty forgiving when it comes to 500 connects/disconnects per second, but a) MySQL is Oracle now b) not a big deal to do the same on Oracle RDBMS either.
Btw, sometimes the situation can be both masked (and saved) by forcing local connections which bypass the TCP/IP stack. In that case all works fine till some part of application starts using TCP/IP.
Last edited 1-Sept-2016
For quite a time I have been a happy user of SPM(SQL Plan Management) to resolve SQL performance issues by using a lazy man’s approach and Oracle capabilities:
in the latest versions Oracle will generate many SQL plans in it’s search of a better SQL plan. All you have to do is to select the one which makes sense and load it into SPM.
Until one day a valid and enabled SPM would stop to enforce the chosen plan. dbms_xplan.display_cursor would show “Sql Plan Directive used for this statement”
Deleting the Sql Plan Directive on the objects concerned has solved the issue.
Apparently, SQL Directives take precedence over SPM.
Here are some answers I’ve got from Oracle:
BEGIN EDIT 1-Sept-2016:
Here are quite interesting thoughts of Jonathan Lewis https://jonathanlewis.wordpress.com/2016/08/02/adaptive-mayhem/#comments
Analyzing 10053 does support his statement “Anything short of a completely specificied plan could be modified by dynamic sampling, so I think an outline would be much more likely to be affected by an SQL directive than an SQL Plan Baseline since outlines tend to be less well specified”.
10053 Trace logs SPD part at the very end of the trace file, meaning SPD takes precedence over – over what ? That’s a good question. SPD must be SQL Plan Directives. Looking into the SPD part of the trace one finds hinted SQL:
SPD: BEGIN context at statement level … ******* UNPARSED QUERY IS ******* SELECT /*+ LEADING (“T4” “T1” “T2” “T3”) FULL (“T4”) USE_HASH (“T3”) FULL (“T3”) USE_HASH (“T2”) FULL (“T2”) USE_HASH (“T1”) FULL (“T1”) */ … … Permutations for Starting Table :0 Join order: T4[T4]#0 T1[T1]#1 T2[T2]#2 T3[T3]#3
Those are SQL Plan Baseline hints. Still, SPD acronym looks suspicious. At the very least, this part of CBO probably considers both SQL Directives and SQL Plan Baselines.
END EDIT 1-Sept-2016
All that said, one thing is clear. IT guy must always ask one simple question: what is the precedence?
In Oracle CBO (Cots Based Optimizer) we have many factors:
Statistics (don’t forget System Statistics)
For quite a time the precedence of factors influencing CBO was rather self-evident and had no need to be documented explicitly. Not any longer.
Adding more on 8-Sept-2016:
Adding more on 2-Sept-2016:
Time to throw a few notes about Oracle database on Amazon RDS.
- It works
- Issues getting access to data files: file system access is generally forbidden on RDS for reasons understandable. However, imagine you ran into RDS limits. For example, you need a one-off patch (which RDS would not allow you to install). You need to escape RDS quickly into a normal Oracle database. You can exp/edpdp/imp/impdp/GoldenGate/Amazon Migration Service/SQL interface. But SQL interface is sloooow. Here you are – caught by RDS for good.
NOTE: colleague of mine claims Amazon already has native MS SQL Server backup into S3. So far it seems to be a full db backup with no control to have incremental or transaction log backups. However, if one day Amazon implements Oracle native RMAN backups into S3 then a rolling restore becomes an option – that would open an RDS esacape window.
NOTE2: Amazon seems to be supporting MySQL and Maria RDS binary log based replication into external database: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Exporting.NonRDSRepl.html
I can confirm that Aurora RDS (Aurora is about MySQL) successfully replicates into another Aurora RDS in another region. This Pythian link was instrumental to get it working, however – Amazon document has a nasty bug:
- Once RDS starts doing something (backuping/creating/patching/restoring/etc) you have very little idea how long it is going to take. Example, restoring Aurora database from a snapshot – you get status “creating”, you get button logs, but there are no logs yet.
- Access to Oracle redo log files. Take a look here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.Oracle.html
Amazon document writers do not even get that Oracle redo logs are light yeas away from Oracle trace and audit logs.
Log miner: can do it. Log retention can be set too: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.Oracle.CommonDBATasks.html#Appendix.Oracle.CommonDBATasks.RetainRedoLogs
- Character sets. Amazon does support a number of them but a few older ones are not included.
- Issues setting hidden parameters (I failed) – note, some hidden parameters have been exposed by Amazon (i.e. can be changed).
- Can’t change some parameters like service name. I failed to add service name. Want to rename database – it’s copy&paste (i.e. back database and restore as a different thing – do this on a few terabytes to get taste of it). Want to add service name for application purposes (resource management, end-to-end user tracing) – no no.
- Tablespace growth control. So far I recommend using bigfile tablespaces. Then you can execute “alter tablespace … autoextend on/off maxsize …” If you use small file tablespace then “alter database” raises privileges error immediately. Btw, Amazon default Oracle RDS is keen to let you grow (and pay) – it has the maximum limit on bigfile tablespace, SYSTEM tablespace including. Let it be with payments but letting database grow out of control will end badly for your database (and you personally).
- One-off patches are no no (as was mentioned already). I do not know a good RDS escape strategy. RDS is a trap. I personally would like to hear if Amazon allowed a single RDS customer to get hands on RDS datafiles and migrate for good whererver he or she wished – that would be nice of Amazon.
- Database Flashback – I failed to get it running. I assume it can’t.
- Some performance questions linger regarding High Availability and Disaster Recovery implementation – the Multi Availability Zones option. It’s an open secret what technology Amazon is using (not a Data Guard, which is stated openly). Then it is File/Storage block replication vs redo log replication. I do assume that redo log [vectors] replication does require way less bandwidth and has a better latency.
- MultiAZ RDS failover happens within minutes (less than 5 minutes).
- RMAN backups. What RMAN backups ? RMAN restore of a particular failed block, data file, tablespace, ctonrol file, log file – no no. Amazon RDS restore granularity is the whole database or nothing. How long will the restore take: very little idea. However, you can read logs.
- Amazon RDS point in time restore is of 5 minutes granularity, mine and other people tests seem to confirm this number
- Can you freeze RDS database backup so that it does not expire at the end of backup retention period ? Seems like not: you can freeze database snapshot, true. But I did not find how to freeze backups of database archive logs. Here is what Amazon says: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html
“You can restore to any point in time during your backup retention period”
If this is truly so then it is a serious limitation for some testing scenarious.
- Take a look into v$datafile. I get it that Amazon attaches a single EBS volume for your datafiles. A single EBS volume has it’s limits of IOPS. On Amazon you combine many EBS volumes to get IOPS you need. Throughput (bandwidth in Amazon terms as EBS is attached to network interfaces) is another concern – it’s limited to Amazon instance network card, naturally. Forget about gigabytes per second. About 700MB/sec EBS throughput is maximum what I was told regarding normal Oracle installation on one of high end Amazon instances. RDS maximum is even less.
- Upgrades. I have upgraded 18.104.22.168 SE1 into 22.214.171.124 SE2. It took about an hour on 2 core (not damned vCPU but CORE!) server.
Amazon did a few backups underway – even incremental backups do take time. TBD: perhaps those backups involve only archive log backups?
The fact you have no control over backups kicking in is unnerving.
One note: When doing upgrade on a rather small instance db.m3.large the upgrade took a staggering 4 and a half hours. Note, that was a skeleton test database. “one off pacthes and performance improvements” phase kept improving the performance for exactly 3.5 hours. Perhaps it was caused by instance having just one core.
- Now, the paranoid part. If anything happens with the database then you have no access to the RDS host operating system. You are at Amazon mercy.
Ok, enough for the Bad and Ugly. What about the Good. Provisioning is reasonably fast. Not light fast but still fast. Backups and restores do work. Failover does work. Upgrade – the one I have done (see above) – was successful. Some basic monitoring is included. Overall, Oracle RDS is a very reasonable offer for application that can get away with what was said above. 24×7 aspect is the most troublesome thing for me: as I has already said, if anything non-trivial happens with your database then RDS is a black box.
select sys.dbms_metadata.get_ddl('USER', 'USER1') from dual;
to get the new fancier “identified by values” syntax – remember, Oracle has changed and made stronger database user passwords. The old “identified by values” won’t work anymore.
if lgwr latency is more important than throughput then as of 126.96.36.199(and 3) consider the following:
– apparently log_buffer size governs group commit batch size. A bigger log buffer means bigger write batch size which means longer session waits on commit. See lgwr warnings of over 500ms log writes under heavier throughput: the bigger log buffer size – the bigger warning size is reported
– 11g lgwr polling might increase log file sync wait. Use parameter _use_adaptive_log_file_sync to disable polling
– Consider binding processes lgwr and nssN (this is a data guard process) to dedicated cpu’s
1. Collect explorer
Run explorer in the following manner :
/opt/SUNWexplo/bin/explorer -w default
The explorer data collector may be obtained from :
Useful documents on explorer :
Oracle Explorer Data Collector – Product Information Center ( Doc ID 1312847.1 )
Oracle Services Tools Bundle (STB) – Explorer, SNEEP, LWACT, RDA ( Doc ID 1153444.1 )
Oracle Explorer Commands ( Doc ID 1003863.1 )
How to Run Oracle Explorer and Forward the Data to an Oracle Service Engineer ( Doc ID 1018748.1 )
How to Upload Files to Oracle Support ( Doc ID 1547088.2 )
Oracle Diagnostic File Upload ( Doc ID 1547088.2 )
My Oracle Support Help – Attaching Support Files to a Service Request ( Doc ID 1447472.1 )
How to Send a File to Oracle Support Using SFTP (sftp.oracle.com) ( Doc ID 464666.1 )
How to Collect a Live Solaris Kernel Dump ( Doc ID 1004608.1 )
GUDS – A Script for Gathering Solaris Performance Data ( Doc ID 1285485.1 )
2. Collect guds using this options
Collecte GUDS in GLOBAL Zone ONLY
# # ./guds_3_3 -q -X3 -c15 -i30 -n5 -w0 -T -H0 -L10 -r -s -D/var/tmp -d “Change this comment to something useful about the performance at the time this collections was made.”
Make sure ‘guds’ is run during the time that the problem is occurring.
If Oracle DB is part of performance issue, please collect AWR data for SAME time period as GUDS has been collected. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Guds may be obtained from this document : GUDS – A Solaris Performance Data Gathering Script ( Doc ID 1285485.1 )
3. Respond to these questions
Please write up a detailed problem description; including answers
to each of the questions below:
– what is the problem you wish to have Oracle address?
Identify the application, and what specifically is having the problem;
if latency, is this for a batch job, or user interaction, etc. ?
– when did the problem first occur?
– what is this system used for, and when is the problem occurring?
– what data have you collected that characterizes the level of system
performance as a problem?
What is slow?
How much slower is it than expected?
– have there been any changes that might help understand the origin of
– what is affected, how long, other applications that may be running at
the time, any hardware or software changes (patches …), tuning,
change in workload, change in storage, any error messages, other
systems affected, ….?
– are there any similar systems that you might expect to have the problem
but do not?
The idea is to give us as complete a picture of the system and problem as
possible from the problem description.