Posted tagged ‘Storage’

“Flash” Storage Will Be Cheap – The End of the World is Nigh

May 29, 2010

A couple of weeks ago I tweeted a projection that the $/GB for flash drives will meet the $/GB for hard drives within 3-4 years. It was more of a feeling based upon current pricing with Moore’s Law applied than a well researched statement, but it felt about right. I’ve since been thinking some more about this within the context of current storage industry offerings from the likes of EMC, Netapp and Oracle, wondering what this might mean.

First of all I did a bit of research – if my 3-4 years guess-timate was out by an order of magnitude then there is not much point in writing this article (yet). I wanted to find out what the actual trends in flash memory pricing look like and how these might project over time, and I came across the following article: Enterprise Flash Drive Cost and Technology Projections. Though this article is now over a year old, it shows the following chart which illustrates the effect of the observed 60% year on year decline in flash memory pricing:

Flash Drive Pricing Projections

This 60% annual drop in costs is actually an accelerated version of Moore’s Law, and does not take into account any radical technology advances that may happen within the period.  This drop in costs is probably driven in the most part by the consumer thirst for flash technology in iPods and so forth, but naturally ripples back up into the enterprise space in the same way that Intel and AMD’s processor technologies do.

So let’s just assume that my guess-timate and the above chart are correct (they almost precisely agree) – what does that mean for storage products moving forward?

Looking at recent applications of flash technology, we see that EMC were the first off the blocks by offering the option of relatively large flash drives as drop-in replacements for their hard drives in their storage arrays. Netapp took a different approach of putting the flash memory in front of the drives as another caching layer in the stack. Oracle have various options in their (formerly Sun) product line and a formalised mechanism for using flash technology built into the 11g database software and into the Exadata v2 storage platform. Various vendors offer internal flash drives that look like hard drives to the operating system (whether connected by traditional storage interconnects such as SATA or by PCI Express). If  we assume that the cost of flash technology becomes equivalent to hard drive storage in the next three years, I believe all these technologies will quickly become the wrong way to deploy flash technology, and only one (Oracle) has an architecture which lends itself to the most appropriate future model (IMHO).

Let’s flip back to reality and look at how storage is used and where flash technology disrupts that when it is cheap enough to do so.

First, location of data: local or networked in some way? I don’t believe that flash technology disrupts this decision at all. Data will still need to be local in certain cases and networked via some high-speed technology in others, in much the same way as it is today. I believe that the networking technology will need to change for flash technology, but more on that later.

Next, the memory hierarchy: Where does current storage sit in the memory hierarchy? Well, of course, it is at the bottom of the pile, just above tape and other backup technologies if you include those. This is the crucial area where flash technology disrupts all current thinking – the final resting place for data is now close or equal to DRAM memory speeds. One disruptive implication of this is that storage interconnects (such as Fibre Channel, Ethernet, SAS and SATA) are now a latency and bandwidth bottleneck. The other, potentially huge, disruption is what happens to the software architecture when this bottleneck is removed.

Next, capacity: How does the flash capacity sit with hard drive capacity? Well that’s kind of the point of this posting… it’s currently quite a way behind, but my prediction is that they will be equal by 2013/2014. Importantly though, they will then start to accelerate away from hard drives. Given the exponential growth of data volumes, perhaps only semiconductor based storage can keep up with the demand?

Next, IOPs: This is the hugely disruptive part of flash technology, and is a direct result of a dramatically lowered latency (access time) when compared to hard disk technology. Not only is the latency lowered, but semiconductor-based storage is more or less contention-free given the absence of serialised moving parts such as a disk head. Think about it – the service time for a given hard drive I/O is directly by the preceding I/O and where the head was left on the platter. With solid-state storage this does not occur and service times are more uniform (though writes are consistently slower than reads).

These disruptions mean that the current architectures of storage systems are not making the most of semiconductor-based storage. Hey, why do I keep calling it “semiconductor-based storage” instead of SSD or flash? The reason is that the technologies used in this area are changing frequently, from DRAM-based systems to NOR-based flash to NAND based flash to DRAM-fronted flash; Single-level cells to Multi-level cells; battery-backed to “Super Cap” backed. Flash, as we know it today, could be outdated as a technology in the near future, but “semiconductor-based” storage is the future regardless.

I think that we now need technologies that look more like Oracle Exadata v2, with low-latency RDMA interfaces directly into the Operating System/Database. However, they need to easily and natively support other types of storage (unstructured data such as files, VMware datastores and so forth). The Exadata architecture lends itself well to changes in this area in both hardware trends and access protocols.

Perhaps more importantly, we are also only just beginning to understand the implications in software architecture for the disrupted memory hierarchy. We simply cannot continue to treat semiconductor-based storage as “fast disk” and need to start thinking, literally, outside the box.

Forget I/O Bound, You’re Latency Bound, Bub

September 21, 2009

Since it’s been nearly ten years since I wrote my book, Scaling Oracle8i, I thought it was about time that I started writing again. I thought I would start with the new-fangled blogging thing, and see where it takes me. Here goes.

As some will know, I run a small consulting company called Scale Abilities, based out of the UK. We get involved in all sorts of fun projects and problems (or are they the same thing?), but one area that I seem to find myself focusing on a lot is storage. Specifically, the performance and architecture of storage in Oracle database environments. In fact I’m doing this so much that, whenever I am writing presentations for conferences these days, it always seems to be the dominant subject at the front of my mind.

One particular common thread has been the effect of latency. This isn’t just a storage issue, of course, as I endeavoured to point out in my Hotsos Symposium 2008 presentation “Latency and Skew”. Latency, as the subtitle of that particular talk said, is a silent killer. Silent, in that it often goes undetected, and the effects of it can kill performance (and still remain undetected). I’m not going to go into all the analogies about latency here, but let’s try and put a simple definition out for it:

Latency is the time taken between a request and a response.

If that’s such a simple definition, why is it so difficult to spot? Surely if a log period of time passes between a request and a response, the latency will be simple to nail? No.

The problem is that it is the small latencies that cause the problems. Specifically, it is the “small, but not so small that they are not important” ones that are so difficult to spot and yet cause so many problems. Perhaps an example is now in order:

A couple of years ago, a customer of mine was experiencing a performance problem on their newly virtualised database server (VMware 3.5). The problem statement went a little bit like this: Oracle on VMware is broken – it runs much slower on VMware than on physical servers. The customer was preparing a new physical server in order to remove VMware from the equation. Upon further investigation, I determined the following:

  1. The VMware host (physical server running VMware) was a completely different architecture to the previous dedicated server. The old server was one of the Intel Prescott core type (3.2GHz, Global Warming included at no extra cost), and the new one was one of the Core 2 type with VT instructions.
  2. Most measurable jobs were actually faster on the virtualised platform
  3. Only one job was slower

The single job that was slower was so much slower that it overshadowed all the other timings that had improved. Of the four critical batch jobs, the timings looked like this:

Physical server:

  • Job 1: 26s
  • Job 2: 201s
  • Job 3: 457s
  • Job 4: 934s
  • Total: 1618s

Virtualised Server:

  • Job 1: 15s
  • Job 2: 111s
  • Job 3: 208s
  • Job 4: 2820s
  • Total: 3154s

It can be seen that, if one takes the total as the yardstick, the virtualised server is almost twice as slow: Therein lies the danger of using averages and leaping to conclusions. If Job 4 is excluded, the totals are 684s vs 334s, making the virtualised server more than twice as quick as the physical one.

Upon tracing Job 4 on the VMware platform with Oracle extended SQL tracing (10046 level 8), I discovered that it was making a lot of roundtrips to the database. Hang on, let me give that the right emphasis: A LOT of roundtrips. However, each roundtrip was really  fast – about 0.2ms if memory serves. So where’s the problem with that? It’s not exactly high latency, is it? Well it is if you have to do several million of them.

As it turns out, there was something in the VMware stack (perhaps just additional codepath to get through the vSwitch) that was adding around 0.1ms latency to each roundtrip. When this tenth of a millisecond is multiplied by several million (something like 20 million in this case), it becomes a long time. About 2000s to be precise, which more than made up for the extra time. The real answer – do less roundtrips, the new server will be at least twice as fast as the old server.

So what does this have to do with I/O? Plenty. The simple fact is that roundtrips are a classic source of unwanted latency. Latency is the performance killer.

Let’s look at some I/O examples from a real customer system. Note: this is not a particularly well tuned customer system:

  • Latency of 8KB sequential reads: 0.32ms
  • Latency of 4MB sequential reads: 6ms

Obviously, the 4MB reads are taking a lot longer (18x), but that makes sense, right? Apart from one thing: The 4MB reads are serving 512x more data in each payload. The net result of these numbers is as follows:

  • Time to sequentially read 2.1GB in 8KB pieces: 76s
  • Time to sequentially read 2.1GB in 4MB pieces: 24s

So what happened to the 52s difference in these examples? Was this some kind of tuning problem? Was this a fault on the storage array? No. The 52s of time was lost in latency, nowhere else.

Here’s another definition of latency: Latency is wasted time.

So, look out for that latency. Think about it when selecting nested-loop table joins instead of full-table scans. Think about it when doing single-row processing in your Java app. Think about it before you blame your I/O system!