I’m now an Oracle ACE Director

Posted January 15, 2010 by James Morle
Categories: Oracle

Just a quick post in celebration of my recent appointment as an Oracle ACE Director! I’m very happy to be invited to this group, and look forward to getting involved in some of the discussions.

The Oracle Wait Interface Is Useless – News About Part Two

Posted January 14, 2010 by James Morle
Categories: Oracle

As many have probably noticed, the blogging experiment with Tanel hasn’t been quite as productive as it might optimally have been. Tanel doesn’t currently have the time to write part two of this article, so I’ll I’m going to pick up the baton and write the next part. Watch this space!

The Oracle Wait Interface Is Useless (sometimes) – Part One: The Problem Definition

Posted November 9, 2009 by James Morle
Categories: Oracle

So here we go, this is part one of this experiment in blogging and co-writing. Tanel has actually written some good stuff already for this, but I wanted to try and formalise things under a common title and make it easier to follow between our sites.

I thought it would be logical to start this process by producing a more concrete problem definition, so that’s the focus of this part. It’s unlikely that we will come up with a complete method in this initial work, but hopefully the wheels will at least turn a little by the end of it!

So first of all, why would I dare to say that the Oracle Wait Interface is useless? Well, partly because I quite like titles that are a little bit catchy, and partly because it is indeed sometimes useless. The emphasis is on the word sometimes, though, because the Oracle Wait Interface is still the single most useful feature in any database product. Wow – that’s quite a claim, isn’t it? This isn’t the place to fully explain why that is, and many others have written great works on this subject already. Check out Cary Millsap’s works, notably his book, Optimizing Oracle Performance, which focuses in great detail on this subject. For the sake of this article, however, here’s why it is so useful: It tells you where the time goes. Think about it: If something is running too slowly, knowing where the time is used up is the single piece of information required to focus on the right subject for tuning.

So what’s wrong with the Oracle wait interface? Just one thing, actually – it is designed to  provide visibility of relatively slow waits. The reason for this is simply that there is a slight overhead in timing every single wait. If that overhead becomes a noticeable proportion of the actual wait itself then the measurement becomes inaccurate (and makes the problem worse). On UNIX-like platforms (yes, that includes Linux), the actual timing interface is implemented using gettimeofday(2) system calls, one before the event and one after the event. This call gives microsecond granularity of timing, at least in theory (on my Opteron 280 test machine, gettimeofday() calls take 1.5 microseconds). So, using this kind of mechanism for events that take a relatively long time makes perfect sense – disk I/O, for example, that will take at least three orders of magnitude longer to complete than the timing calls themselves. Conversely, they make no sense at all for calls that take even as little as 50 microseconds, as the 3 microsecond penalty for measuring the wait is 6% of the actual event time itself at that point. There you go, that’s the beginning of the justification that the wait interface is useless,  in a nutshell.

But hang on, isn’t 50 microseconds really, really fast? Well no, actually, it isn’t. Taking Intel’s Nehalem processor (with Quickpath) as an example, a memory latency is around the 50 ns range – three orders of magnitude faster than the 50 microsecond cut-off that I just arbitrarily invented. Memory access is also the slowest thing that a CPU can do (without factoring in peripheral cards) – in this case the CPU has to wait for about 150 cycles while that memory access takes place. So it’s very possible to have a function call that does fairly complex work and is still an order of magnitude or two faster than the gettimeofday() system call.

Time for an example. Actually, this is a variation on the example that made me start thinking about this – I had been perfectly happy with the Oracle Wait Interface until this point for 99% of cases!
Read the rest of this post »

And Now For Something Completely Different…

Posted October 12, 2009 by James Morle
Categories: Oracle

So just a short post today, more of an announcement…

At 11.05am on Wednesday 2nd December, I will co-present a talk at the UKOUG conference with Tanel Poder. OK, so nothing particularly earth-shattering there. However, we’ve decided to do something a little bit different with the creative process. The talk is entitled “The Oracle Wait Interface Is Useless (sometimes)”, and is a subject that both Tanel and I have been working on separately for a little while. The premise is this: The wait interface is great for ‘slow’ waits, but what about waits that are not instrumented by the wait interface? What about waits that are not waits from Oracle’s perspective, such as reading a page of memory? What about pure inefficiency? There is, of course, the concept of DB Time, but it is not currently granular enough.

So, you probably can see the idea behind the presentation: The goal is to present some alternative diagnostic techniques to determine the cause of poor performance. We happen to both have similar ideas on this, and they don’t just stop at Oracle.

So here’s the new concept, at least for us: We are going to write the content for the presentation as a tag team effort between our two blogs. Airing our dirty laundry in public, so to speak. We think this will give a fairly unique opportunity for public comment before the presentation is actually given!

So, over to Tanel for part one…

Latency Hiding For Fun and Profit

Posted September 28, 2009 by James Morle
Categories: Generic, Oracle

Tags: , , ,

Yep, another post with the word ‘latency’ written all over it.

I’ve talked a lot about latency, and how it is more often than not completely immutable. So, if the latency cannot be improved upon because of some pesky law of physics, what can be done to reduce that wasted time? Just three things, actually:

  1. Don’t do it.
  2. Do it less often.
  3. Be productive with the otherwise wasted time.

The first option is constantly overlooked – do you really need to be doing this task that makes you wait around? The second option is the classic ‘do things in bigger lumps between the latency’ – making less roundtrips being the classic example. This post is about the third option, which is technically referred to as latency hiding.

Everybody knows what latency hiding is, but most don’t realise it. Here’s a classic example:

I need some salad to go with the chicken I am about to roast. Do I:

(a) go to the supermarket immediately and buy the salad, then worry about cooking the chicken?

OR

(b) get the chicken in the oven right away, then go to the supermarket?

Unless the time required to buy the salad is much longer than the chicken’s cook-time, the answer is always going to be (b), right? That’s latency hiding, also known as Asynchronous Processing. Let’s look at the numbers:

Variable definitions:

Supermarket Trip=1800s

Chicken Cook-Time=4800s

Calculations:

Option (a)=1800s+4800s=6600s (oh man, nearly two hours until dinner!)

Option (b)=4800s (with 1800s supermarket time hidden within it)

Here’s another example: You have a big code compile to do, and an empty stomach to fill. In which order do you execute those tasks? Hit ‘make’, then grab a sandwich, right?

As a side note, this is one of my classic character flaws – I just live for having tasks running in parallel this way. Not a flaw, I hear you say? Anyone that has tried to get software parallelism  (such as Oracle Parallel Execution) knows the problem – some tasks finish quicker than expected, and then there’s a bunch of idle threads.  In the real world, this means that my lunch is often very delayed, much to the chagrin of my lunch buddies.

OK, so how does this kind of thing work with software? Let’s look at a couple of examples:

  1. Read Ahead
  2. Async Writes

Read ahead  from physical disk is the most common example of (1), but it equally applies to result prefetching in, say, AJAX applications. Whatever the specific type, it capitalises on parallel processing from two resources. Let’s look at the disk example for clarification.

Disk read ahead is where additional, unrequested, reads are carried out after an initial batch of real requested reads. So, if a batch job makes a read request for blocks 1,2,3 and 4 of a file, “the disk” returns those blocks back and then immediately goes on to read blocks 5,6,7,8, keeping them in cache. If blocks 5,6,7,8 are then subsequently requested by the batch job after the first blocks are processed, they can immediately be returned from cache, thus hiding the latency from the batch job. This has the impact of hiding the latency from the batch job and increases throughput as a direct result.

Async writes are essentially the exact opposite of read-ahead. Let’s take the well-known Oracle example of async writes, that of the DBWR process flushing out dirty buffers to disk. The synchronous way to do this is to generate a list of dirty buffers and then issue a series of synchronous writes (one after the other) until they are all complete. Then start again by looking for more dirty buffers. The async I/O way to do the same operation is to generate the list, issue an async write request (which returns instantly), and immediately start looking for more dirty buffers. This way, the DBWR process can spend more useful time looking for buffers – the latency is hidden, assuming the storage can keep up.

By the way, the other horror of the synchronous write is that there is no way that the I/O queues can be pushed hard enough for efficient I/O when sending out a single buffer at a time. Async writes remedy that problem.

I’ve left a lot of the technical detail out of that last example, such as the reaping of return results from the async I/O process, but didn’t want to cloud the issue. Oops, I guess I just clouded the issue, just ignore that last sentence…

Spotting the Red Flags (Part 1 of n)

Posted September 24, 2009 by James Morle
Categories: Oracle

Tags:

As a consultant I get to see many different systems, managed by different people. A large proportion of these systems are broken in exactly the same ways to others, which makes spotting the problems pretty straightforward! It occurred to me at some point that this is a bit like a crime scene – somebody had murdered the system, and it was my job to find out ‘whodunnit’, or at least ‘whatdunnit’. The ‘what’ is quite frequently one (or few) actions performed by the administrator, normally with good intention, that result in system performance or availability carnage. I call these actions ‘Red Flags’, and spotting them can save a lot of time.

A couple of years ago at the Hotsos conference, I did a small impromptu survey. It’s not all that often that you have 500 Oracle geeks in a room that you can solicit opinion from, so it seemed like a good chance to cast the net for Red Flags. I seeded the process with a few of my personal favourites and then started writing as the suggestions came in. Here are the ‘seed’ entries:

  • Any non-default setting for optimizer_index_cost_adj
  • Carefully configured KEEP and RECYCLE pools
  • Multiple block sizes
  • Some kind of spin_count configuration

Remember, these are not necessarily the wrong thing to do to your system, but they probably are. That’s why they attract my attention in the first instance.

I got a whole load of responses. Of those responses, some had missed the point and just identified something that was broken. This is more subtle than that – these are things that are forensic evidence of system homicide. Some of the decent ones I got back follow:

  • /*+ RULE */ hint in abundance (I know, lot’s of apps do this, and it sometimes makes sense)
  • Numeric data in VARCHAR2 columns
  • Indexes and tables on ‘separate disks’
  • All columns in a table NULLable

Of course, there were more cynical responses too, such as  (he he): cluster_database=true

I was going to write some kind of presentation about it, but I think this might work nicely as a blog entry, potentially multiple entries! If anyone has some good suggestions, please let me know and I’ll start adding them into part 2, along with any more I think of… 🙂

The amazing truth about the Red Flags, though, is that they are incredibly hard to dislodge. Once optimizer_index_cost_adj is set to anything but default, it is nigh on impossible to convince anyone that the system is probably now tuned on its head. Same with a multiple buffer pool configuration, and same with multiple block sizes.

Forget I/O Bound, You’re Latency Bound, Bub

Posted September 21, 2009 by James Morle
Categories: Storage

Tags: , , ,

Since it’s been nearly ten years since I wrote my book, Scaling Oracle8i, I thought it was about time that I started writing again. I thought I would start with the new-fangled blogging thing, and see where it takes me. Here goes.

As some will know, I run a small consulting company called Scale Abilities, based out of the UK. We get involved in all sorts of fun projects and problems (or are they the same thing?), but one area that I seem to find myself focusing on a lot is storage. Specifically, the performance and architecture of storage in Oracle database environments. In fact I’m doing this so much that, whenever I am writing presentations for conferences these days, it always seems to be the dominant subject at the front of my mind.

One particular common thread has been the effect of latency. This isn’t just a storage issue, of course, as I endeavoured to point out in my Hotsos Symposium 2008 presentation “Latency and Skew”. Latency, as the subtitle of that particular talk said, is a silent killer. Silent, in that it often goes undetected, and the effects of it can kill performance (and still remain undetected). I’m not going to go into all the analogies about latency here, but let’s try and put a simple definition out for it:

Latency is the time taken between a request and a response.

If that’s such a simple definition, why is it so difficult to spot? Surely if a log period of time passes between a request and a response, the latency will be simple to nail? No.

The problem is that it is the small latencies that cause the problems. Specifically, it is the “small, but not so small that they are not important” ones that are so difficult to spot and yet cause so many problems. Perhaps an example is now in order:

A couple of years ago, a customer of mine was experiencing a performance problem on their newly virtualised database server (VMware 3.5). The problem statement went a little bit like this: Oracle on VMware is broken – it runs much slower on VMware than on physical servers. The customer was preparing a new physical server in order to remove VMware from the equation. Upon further investigation, I determined the following:

  1. The VMware host (physical server running VMware) was a completely different architecture to the previous dedicated server. The old server was one of the Intel Prescott core type (3.2GHz, Global Warming included at no extra cost), and the new one was one of the Core 2 type with VT instructions.
  2. Most measurable jobs were actually faster on the virtualised platform
  3. Only one job was slower

The single job that was slower was so much slower that it overshadowed all the other timings that had improved. Of the four critical batch jobs, the timings looked like this:

Physical server:

  • Job 1: 26s
  • Job 2: 201s
  • Job 3: 457s
  • Job 4: 934s
  • Total: 1618s

Virtualised Server:

  • Job 1: 15s
  • Job 2: 111s
  • Job 3: 208s
  • Job 4: 2820s
  • Total: 3154s

It can be seen that, if one takes the total as the yardstick, the virtualised server is almost twice as slow: Therein lies the danger of using averages and leaping to conclusions. If Job 4 is excluded, the totals are 684s vs 334s, making the virtualised server more than twice as quick as the physical one.

Upon tracing Job 4 on the VMware platform with Oracle extended SQL tracing (10046 level 8), I discovered that it was making a lot of roundtrips to the database. Hang on, let me give that the right emphasis: A LOT of roundtrips. However, each roundtrip was really  fast – about 0.2ms if memory serves. So where’s the problem with that? It’s not exactly high latency, is it? Well it is if you have to do several million of them.

As it turns out, there was something in the VMware stack (perhaps just additional codepath to get through the vSwitch) that was adding around 0.1ms latency to each roundtrip. When this tenth of a millisecond is multiplied by several million (something like 20 million in this case), it becomes a long time. About 2000s to be precise, which more than made up for the extra time. The real answer – do less roundtrips, the new server will be at least twice as fast as the old server.

So what does this have to do with I/O? Plenty. The simple fact is that roundtrips are a classic source of unwanted latency. Latency is the performance killer.

Let’s look at some I/O examples from a real customer system. Note: this is not a particularly well tuned customer system:

  • Latency of 8KB sequential reads: 0.32ms
  • Latency of 4MB sequential reads: 6ms

Obviously, the 4MB reads are taking a lot longer (18x), but that makes sense, right? Apart from one thing: The 4MB reads are serving 512x more data in each payload. The net result of these numbers is as follows:

  • Time to sequentially read 2.1GB in 8KB pieces: 76s
  • Time to sequentially read 2.1GB in 4MB pieces: 24s

So what happened to the 52s difference in these examples? Was this some kind of tuning problem? Was this a fault on the storage array? No. The 52s of time was lost in latency, nowhere else.

Here’s another definition of latency: Latency is wasted time.

So, look out for that latency. Think about it when selecting nested-loop table joins instead of full-table scans. Think about it when doing single-row processing in your Java app. Think about it before you blame your I/O system!