Multi-terabyte solid
state storage arrays are seeping into the server environment in the same way
that RAID systems did back
in the early 1990s.
But just as those RAID pioneers learned that there
was a lot more to making a reliable disk array than stuffing a lot of PC
hard disks into a box
with a fan and a power supply - so too will multi-terabyte SSD users discover
that problems which are undetectable or do no harm in small SSDs can lead to
serious data corruption risks when those same SSDs are scaled up without the
right architecture and sometimes with it in place too.
I know from
the emails I get that many readers think that once they've looked at the
single issue of flash
endurance - they've covered covered the bases for enterprise SSDs.
That's
why storagesearch.com is planning to publish a collection of definitive
technology articles to help guide the industry through this risky transition
process - which will be linked from this page starting in the 4th quarter of
2008.
Users with significant storage investments need simple
guidelines to help them get the best results from the different types of media
they use. That's always been true in the past and will remain so in the future.
A good theoretical understanding of data failure modes is what lies
behind the way that mature storage products are designed and managed. But these
complex considerations can be translated into simple guides for users as the
table below shows.
The new articles will provide users with the
theoretical justifications they need when they are faced with the difficult
economic choices that come from deploying different types of SSDs (with
different cost models) in different applications within their organizations. | |
 |
| . |
The first phase in the SSD market revolution was when users became
aware of the potential
benefits of
SSDs and when these products reached price points many of them could afford.
The next phase will be when enterprise users move away from a technology
focused market (which is what they are being offered by vendors now) towards
an applications specific SSD market in which they have to choose which
products work best for their own specific deployments.
Users today
are faced with the dilemma of paying vastly different price points for products
which are superficially similar from the capacity and IOPS point of view - but
which may be vastly different in data reliability.
By "data
reliability" I don't mean that the SSD has failed - but that some data
within the SSD array has been altered or corrupted. (And will continue
accumulating data corruptions even if you swap in new replacement drives of the
same type.)
The cost of data corruption is different for different
applications and in different business applications.
Balancing risk
against cost is a decision users make when they choose a supplier - even if they
have not consciously analyzed the issues which matter. And choosing a more
expensive supplier doesn't protect the user from being mis-sold the wrong type
of product.
Many mistakes will be made by vendors and users.
For
the next phase in the SSD market revolution to continue momentum users need
guidance they can trust to help them navigate the many complex decisions which
are beyond performance speedup or power saving considerations. | | |
Megabyte's Simple
List of Handling and Managing Storage Media
You've done this
before. Here are some examples from past decades. |
| tape |
don't forget to rewind the tape often (otherwise
bad things happen) and don't cut little bits off just because you've run out
of string |
| hard disk drives |
don't drop the hard disk on the concrete floor,
and don't store spare disks in your MRI scanner or under the magnet in your
crane |
| RAM |
don't zap the chips when doing memory upgrades
(you look good in that nylon shirt BTW) |
| CD,
DVD, UDO |
wipe the jam from your sticky fingers before
inserting optical media into the
archive appliance |
| multi terabyte SSDs |
buy the right product for the job?
think
of solid state storage by application type rather than technology type (RAM SSD vs flash SSD)
That's what the new articles will help us to understand. | |
By publishing this call
for papers I'm inviting technology experts in the industry to contribute new
original papers, or let me know about suitable articles they have already
written on this subject to which I can direct readers (without needing to sign
up to read them).
I will also be contacting candidate authors by email.
in
the meantime take a look at existing articles on this theme.
- Can you trust your
flash SSD specs? - the product which you carefully qualified may not be
identical to the one that's going into your production line, because the SSD oem
has "improved" it. But the improvement makes another operating
parameter - which you deeply care about - unacceptably worse.
- Flash
SSD Data Reliability and Lifetime (pdf) - starting from a description of
floating gates and going all the way up to the architecture of a flash SSD this
paper includes good descriptions of data failure modes, including:- erase
failure, (erase) stress induced long term leakage, disturb faults, and the
potential for inadequate error correction code coverage in MLC.
But
don't go away with the idea that
RAM SSD arrays don't
have data corruption modes too. The difference may be that some long established
vendors in this part of the market have been designing products which mitigate
these risk factors. But that doesn't mean to say that new market entrants know
what they should be doing.
Even big oems can make elementary
mistakes which cost billions of dollars of lost sales - as I described in my
2001 article
Looking
Back on Sun's Cache Memory Problem. |