So here we go, this is part one of this experiment in blogging and co-writing. Tanel has actually written some good stuff already for this, but I wanted to try and formalise things under a common title and make it easier to follow between our sites.
I thought it would be logical to start this process by producing a more concrete problem definition, so that’s the focus of this part. It’s unlikely that we will come up with a complete method in this initial work, but hopefully the wheels will at least turn a little by the end of it!
So first of all, why would I dare to say that the Oracle Wait Interface is useless? Well, partly because I quite like titles that are a little bit catchy, and partly because it is indeed sometimes useless. The emphasis is on the word sometimes, though, because the Oracle Wait Interface is still the single most useful feature in any database product. Wow – that’s quite a claim, isn’t it? This isn’t the place to fully explain why that is, and many others have written great works on this subject already. Check out Cary Millsap’s works, notably his book, Optimizing Oracle Performance, which focuses in great detail on this subject. For the sake of this article, however, here’s why it is so useful: It tells you where the time goes. Think about it: If something is running too slowly, knowing where the time is used up is the single piece of information required to focus on the right subject for tuning.
So what’s wrong with the Oracle wait interface? Just one thing, actually – it is designed to provide visibility of relatively slow waits. The reason for this is simply that there is a slight overhead in timing every single wait. If that overhead becomes a noticeable proportion of the actual wait itself then the measurement becomes inaccurate (and makes the problem worse). On UNIX-like platforms (yes, that includes Linux), the actual timing interface is implemented using gettimeofday(2) system calls, one before the event and one after the event. This call gives microsecond granularity of timing, at least in theory (on my Opteron 280 test machine, gettimeofday() calls take 1.5 microseconds). So, using this kind of mechanism for events that take a relatively long time makes perfect sense – disk I/O, for example, that will take at least three orders of magnitude longer to complete than the timing calls themselves. Conversely, they make no sense at all for calls that take even as little as 50 microseconds, as the 3 microsecond penalty for measuring the wait is 6% of the actual event time itself at that point. There you go, that’s the beginning of the justification that the wait interface is useless, in a nutshell.
But hang on, isn’t 50 microseconds really, really fast? Well no, actually, it isn’t. Taking Intel’s Nehalem processor (with Quickpath) as an example, a memory latency is around the 50 ns range – three orders of magnitude faster than the 50 microsecond cut-off that I just arbitrarily invented. Memory access is also the slowest thing that a CPU can do (without factoring in peripheral cards) – in this case the CPU has to wait for about 150 cycles while that memory access takes place. So it’s very possible to have a function call that does fairly complex work and is still an order of magnitude or two faster than the gettimeofday() system call.
Time for an example. Actually, this is a variation on the example that made me start thinking about this – I had been perfectly happy with the Oracle Wait Interface until this point for 99% of cases!
Read the rest of this post »

RSS - Posts