How many disks does it take to store a disk-full of data?by Zsolt
Kerekes, editor - November 10, 2010 |
How many disks does it take to store a disk-full of data?
...in a
way which ensures you can always get to it quickly.
Is this a trick
question?
Yes. No. Yes.
Ok. I confess. Maybe.
And
this comes as no surprise to you - because you can see gallons of text
dripping down into the bottom of your browser. Let's agree that it's not going
to be the short and obvious answer.
But I promise that if you stay
with me - as we wade through this article - you'll see I'm laying the
foundations for some serious rethinking about the economics and received
wisdom about how data storage is done.
At this stage - it doesn't
matter whether we're talking about
hard disks or the
solid state kind. But I
promise to return to that difference later - and to show why it could make
a difference to the calculation.
So - what do I mean by a disk-full of
data?
I'm not being tricky here. I mean simply a quantity of data
that is unique and incompressible - and fills up a disk.
When I started
thinking about this article a few years ago - I was going to phrase the question
like this...
"how many terabyte disks does it take to store a
terabyte of data?"
But now we've got 3TB disks - and one day
we'll have 10TB disks - and eventually
much bigger ones too.
I want to avoid you (or me) having to reach for a calculator when following
this article. Any maths involved will be the simple kind that can be done by
counting fingers (and toes).
Let's start...
It would be
perfectly reasonable to say. I've got a disk full of data. "A disk"
- is equal to one disk. So the answer to the question is one disk.
End of calculation. End of article.
Click
and go onto the next web page in your busy browsing day.
Did I
mention before? This data is very very valuable.
You're running a
VC backed start-up
company and it includes all the customer inquiries from your first month
emerging from stealth mode. Or it's the compressed digital output of that new
movie you've been editing - which everyone expects will be on the short list for
the next Oscar. Or it's this morning's orders for your online retailing
business. Just imagine whatever data it is that you wouldn't like to lose.
That's what's on it. (If on the other hand you would be very happy to lose a
disk full of data - for reasons you don't want anyone else to know - and you
would prefer the data to vanish beyond
forensic recall -
there's another bunch of articles which will help you
here.)
OK
- so maybe at the local level - another disk would be a good idea. Let's call
it the backup disk. To make sure we don't forget to do it - we're going to run
the backup disk and the
original disks (that's 2 disks so far) as a
RAID 1 system. That's 2
mirrored disks. They're local. How local? Same box? Same office? - That's good
enough for now.
If disk 1 fails - then disk 2 keeps me running - and
vice versa.
But then again - if anything bad happened to either one of
those 2 disks - like a simple hardware failure - you'd be back to where you
started - and vulnerable. So let's have a hot stand-by -so that if one of those
RAIDed disks fails - then the system automatically starts a RAID rebuild and
creates another local copy on the standby disk before the lone surviving disk
with the data fails. But that rebuild can take hours - and during those hours
that sole survivor disk can fail too - before it has has finished cloning
itself. And even if it doesn't - there's another hazard to prepare for...
Because
all the local disks could fail together - for any of the following reasons -
and some more too.
- your building has burnt down, or been flooded or suffered some local
disaster
- your office has been broken into - and the disks stolen
- a virus or software error or systems administration mistake has wiped all
the local disks
- a lightning strike - or power surge zapped the power supplies in such a
way that all the equipment in your RAID system got fried
Not to worry. You're already ahead of me there. That's why you have an off
site backup. And we're still counting disks - remember?
For the same
reasons discussed above - for every diskfull of unique data - the other site
is RAIDing your data - and making sure there's a hot standby.
So
we're up to 6 disks.
But what if your office suffers a disaster
- and the online backup
service goes bust or stops the service when you need it. It happens a lot.
Backups fail just when you need them. They may have actually failed before - but
you didn't know because you didn't need them. But that's another story.
If
you read the longer articles on my
storage reliability
page - you'll see that to ensure a realistic probability of getting your data
back - you really need to spread your data risk on more than 3 disparate sites.
But let's skimp on the cost and call it 3 sites.
So - 3 sites - each
with 3 disks - that's 9 disks.
But what if I still need the
same data - in 5 years time?
It may even be worth more then - because
you might have figured out how to make more money out of it by then. Or it may
be that you need the data for legal or other reasons. You're a bigger company
now - and you can scale the value of that data. But only if you've still got it.
Only problem is - that the typical life of a hard drive is 3 years - so you
have to buy another 9 disks at some time. They will have more capacity than you
need - due to technical progress. But putting 2 copies of your data on the
same disk doesn't help you get instant access in the zap or flood situation.
(Although it might help with
data recovery.)
As
you can see - the cost of replacing all those redundant disks starts to mount
up.
So in a 5 year timescale you (or your backup surrogates) have
been obliged to buy or rent about 18 disks.
Although there is
some benefit from scaling up - and you don't incur exactly the same level of
overhead - if you own 1,000 disks worth of data - the necessity to diversify
across geography and across common mode vulnerabilities is a high overhead in
all cases. And if you aren't doing this - you are only fooling yourself that you
are covered.
And before someone emails me and suggests that cloud
storage is the answer - consider this.
Just because you can't see
all those disks out there failing - and just because they are someone else's day
to day problem - doesn't make the maths go away. And when a big natural
disaster or medium business disaster - or a little bad software upgrade -
prevents you from talking to all those cloud disks - you will not be comforted
by the knowledge that somebody else was responsible for counting the disks -
looking at the flashing LEDs - and caring for them.
Did I remember to
say that your business model means that you have to ensure that any of the
data is accessible in a little more than 50 milli-seconds. So you can't use
tape backup. Because it can take 30 seconds for
tape libraries to find the
right tape and access random data. And retrieving data from tape is far from
being a certain process with a happy ending.
Enough of the pessimism.
Here's some good news.
In real-life all data is not unique.
And all data is not incompressible.
And all data is not
equally valuable - but for the purposes of this article - we are going
to suspend disbelief and maintain that it is. (Remember the Oscar? You're going
to remaster that movie to make a 3d version...)
The mitigating factors
I mentioned above mean that organizations - which own multiple disks worth of
data - have a fighting chance of making their data survivable - using an order
of magnitude less storage disks than suggested in my single disk case.
But
if you add in the random access time requirement - then most hard disk based
compression and dedupe systems still fail to meet the operational requirements.
Solid
state storage systems can, however, deliver real-time compression and dedupe and
still offer random access times which are as good as - or even better than that
for uncompressed and undeduped hard drive arrays.
The reason they can
do this is because a fast SSD's raw
random IOPS
can be hundreds of times faster than a hard disk (at the single disk level).
So even - if the overhead of dedupe and compression create 50x more disk churn
- the net result for a bulk storage data packed SSD system can still be better
than that of an unpacked HDD system.
When you also take into account
that reliable SSDs
(as opposed to badly
designed / flaky SSDs) may offer operating lives which are on average 3x
as long as the best enterprise hard drives - then you see just one of the
many reasons why the economics of
bulk storage flash
SSDs will start to look better than that of HDD arrays - in the
datacenter - long before convergence in the cost per raw terabyte of
storage. | |
...

|
| |
... |
 |
A day in the life of a
data protection wizard. |
... |
 |
... |
SSD Pricing -
where does all the money go? |
SSDs are among the most
expensive computer hardware products you will ever buy and comprehending the
factors which determine SSD costs is often a confusing and irritating
process... |
 |
...which is not made any
easier when market prices for apparently identical capacity SSDs can vary more
than 100x to 1!
Why is that? ...read the article to
find out | | | |
... |
|
... |
|
... |
SSD Data Recovery
Methodologies |
It's hard enough understanding the design
of any single SSD. And there are so many different designs in the market.
Have
you ever wondered what it looks like at the other end of the SSD supply chain
- when a user has a damaged SSD which contains priceless data with no usable
backup? |
| | |
... |
The memory chip count
ceiling around which the SSD controller IP is optimized - predetermines the
efficiency of achieving system-wide goals like cost, performance and
reliability. |
Size matters! - in
controller architecture | | |
... |
|
... |
Surviving SSD
sudden power loss |
Why should you care
what happens in an SSD when the power goes down?
This important design
feature - which barely rates a mention in most SSD datasheets and press releases
- has a strong impact on
SSD data integrity
and operational
reliability.
This article will help you understand why some
SSDs which (work perfectly well in one type of application) might fail in
others... even when the changes in the operational environment appear to be
negligible. |
| | |
. |
| |