The 1st version of this article was published in 2009
4 years
later in 2013 (as I predicted) things haven't got much better
Zsolt Kerekes
editor |
another new season for
that depressing consumer SSD saga
Editor:- October 18, 2011 -
Like anyone who makes a lot of
market predictions I'm
delighted if any of them come true - but there are 1 or 2 cases where I would
be just as happy to be proved wrong - in particular - on the subject of
consumer SSD
reliability.
2 years ago I wrote an article called - Why can
consumers expect to see more flaky flash SSDs? - which had the sub-headline - "You
need to stay vigilant because it's not going to get better anytime
soon." (That earlier article is at the bottom of this page.)
I get
a lot of emails about this subject - which is threatening to tarnish the
reputation of the whole SSD market - and not just the small part which is
consumer drives.
Today I got an email from a reader who told me
that out of 13 client SSDs he'd bought 7 months ago "4 have died so far."
He gave me a link to a blog on
codinghorror.com
which illustrates the soul searching and frustration that SSD unreliability
is causing to so many thoughtful people who want to get the speedup advantages
of SSDs but are rightly anxious about the flaky reputation of consumer SSDs.
I
agree this is a lamentable state of affairs - which needs some explanations.
This is a tidied up version of what I said.
SSDs aimed at the
consumer market are designed to deliver basic functionality at the lowest
price. That means the designers (originally due to ignorance but
nowadays with foreknowledge) have to decide what shortcuts they can take in the
production process and what
design factors they can
leave out to reduce the price - compared to a reliable
industrial /
military /
enterprise grade
SSD.
There are countless techniques they can use to get the cost down.
- Shutting off reliability features in the controller. For example the
SandForce SF-2200 controller (launched in
Feb 2011 and
optimized for consumer markets) has an option which enables oems to deliver an
SSD with a smaller or larger usable capacity when using exactly the same set of
flash chips. The bigger capacity sounds like it's better value for money to the
consumer but they are losing some of the RAIS protection which means the
SSD won't survive the failure of an entire flash chip. And that's just the tip
of the SSD
capacity iceberg.
- Using cheaper components in the
power loss
management system. The consequences are that the trigger events to save data
may come at the wrong time or that the capacitors don't hold enough charge to
maintain reliable operation for vital data saves because they have
drifted out of tolerance. That's before considerations like whether the
controller has an
intrinsic foolproof auto recovery architecture in the first place.
- Using no-name cheap flash memory. The difference between the best and worst
flash manufacturers is a
many times factor.
Also if the memory is unknown the controller parameters may not be set up
correctly for it leading to wrong handling by the controller.
- Saving time and cost on testing the design. Many consumer SSD products
aren't adequately validated before they're shipped. That's why you hear about
firmware upgrades and recalls. It's expensive for SSD makers to invest in
comprehensive tests before they ship - and some consumer SSD marketers worry
that if they delay their launches they run the risk of losing market share.
When companies want to design reliable SSDs for non consumer
markets there are many additional steps they take in their processes
such as qualifying the memory to see which is the best, allocating more memory
to act as hot spares for defective blocks, using better reliability
architecture, and burn-in and functional test before shipping.
But even when all those precautions are taken - expensive SLC
flash enterprise SSDs can fail too. The difference in the enterprise is
that the data is more likely to be
backed up and the
storage system is likely to be protected by a
RAID-like or fail-over
architecture which means that life can go on without too much disruption.
If
it's any consolation the
hard disk industry was
even worse at one time. In 1986 when I was designing a demo RAID controller -
most of the brand new drives I got for the project arrived with serious
faults.
What is the SSD industry doing to improve the state of the
art? You can get an idea of who's doing what in the
SSD reliability papers.
Why can consumers expect to see more flaky flash SSDs? - the
2009 version
Editor:-
August 10, 2009
- Intel has been
in the computer news in recent weeks for suspending further shipments of its
new
X25-M -
SATA 2.5" MLC flash SSD due to a serious design problem.
Potential
customers were advised that shipments will resume after what is euphemistically
called - a "firmware upgrade".
This isn't the 1st time that
Intel has shipped a flaky SSD, and it's not the 1st time that a
flash SSD
manufacturer has shipped products where the design was incompletely verified
(or specified) in the 1st place - requiring a frantic firmware upgrade to make
its operational use more satisfactory.
And it won't be the last either.
These stories have become commonplace. And because the latest Intel fiasco
wasn't a surprise - it didn't rate more than a footnote in
these pages.
Newcomers
to StorageSearch.com may be
shocked that by the frequency with which the storage market's reputation is
being splatted by the residue from so many unreliable new products. And by
that I mean - products whose operation you cannot rely on to be what you
reasonable expected - instead of the narrower meaning of "products
which are dead on arrival" - or which terminally cease operation due to
some form of wear-out,
environmental or age related process.
I named
storage reliability
as 1 of the 3 most important future trends in my
state of the storage market
article published in 2005. In that article I also predicted that uncorrectable
failures in storage systems (due to embedded design assumptions made in earlier
generations) could, if not dealt with by drive and interface designers, pose a
more serious threat to enterprise computer systems than the Y2K bug
in the late 1990s.
It's reasonable to ask - why has the flash SSD
market gained such a poor reputation?
I explained why users
shouldn't trust
published flash SSD benchmarks in an article published a year ago - in which
I discussed the technical environment and specific reasons related to
performance. But it's clear from readers emails that many concerned SSD
consumers don't have the technical background to understand much of the
content in this and the more detailed articles comparing
MLC and SLC,
RAM and flash SSDs
etc. I often have to explain that when it comes to SSDs - StorageSearch.com
isn't aimed at the consumer market - but at enterprise users and oem specifiers
of SSD technology who - in the course of their vendor qualification -
invest a lot of time and resources into learning about SSDs - before making a
commitment which could cost them their jobs or business if the choice
turns out to be wrong.
When we started our intense SSD coverage more than
10 years ago,
a typical flash SSD cost $50K and a typical rackmount cost an order of magnitude
more.
Why do you have to be so "extremely" careful to
understand the internals of a flash SSD?
You probably don't - if all
you're doing is buying a single
notebook SSD
for a single notebook. (Treading "very" carefully - will suffice.) You
definitely do need to exercise extreme caution- if you're the person designing
SSDs into a new notebook or new SSD storage array,
IPTV server, voicemail
server, defense appliance or search-engine architecture. That's only a small
part of the spectrum of reader questions I get asked about related to SSDs -
and why I try to avoid giving simplistic answers. Until now...
Here's
a simplistic explanation of why you have to be careful about
understanding flash SSD technology before you deploy it in serious apps, and why
SSD vendors will continue shipping flaky SSDs and then recalling them or fixing
them after the sale for several more years...
Flash SSDs are solid
state - but different to processors because... When Intel or AMD or Sun design
a new processor - they run test suites on the new design - which encapsulate
market knowledge amassed during more than 20 years. These verification suites
simulate the kinds of loads which are common and uncommon in the market. When a
new processor emerges into the market - the design is already compatible with
what the market expects - because the test suite defines the product. Similar
test suites have been developed by all hard drive manufacturers - which include
a knowledge awareness of all the tricky quirky ways that operating systems and
applications are going to hit the hard drive - and how it is expected to
respond.
There are no such industry
test suites for flash
SSDs. That's because a flash SSD can look like a disk drive in some contexts -
and it can look like a processor accelerator in others. The range of
applications for SSDs is much greater than for hard drives - and SSDs have many
different ways they can implement the same features depending on which market
they were designed for. One reason why the flash SSD market has been going
wrong with so many new products is that most consumer SSD manufacturers have
simply re-used their hard drive test suites to validate their new SSDs. That has
put pressure on designers to tweak the
SSD controllers to
make their products look good in benchmarks (and look more saleable). But the
test suites don't test the weaknesses of the SSDs - only the strengths.
There
are 2 general exceptions.
Flash SSDs designed specifically for the
industrial oem
market tend to be better designed - because these vendors know their
products will be hammered by lengthy customer evaluations before being deployed.
Many industrial flash SSD oems are used to testing their products with the
entire code of their customers' embedded products. Over many years they have
built up experience of the weak parts of their products - and either adapted
the firmware in their SSDs - or suggested ways in which their customers can
change their software to work better with their SSDs.
Rackmount flash SSD
arrays designed for enterprise server acceleration include a span of products
and companies - whose applications experience ranges from nil to decades. But I
can offer a simple solution to shortlisting an SSD supplier here.
In
the world's first comprehensive survey of
What SSD Users Want -
instigated by StorageSearch.com in 2004 - we posed the question - "What
would make it easier for you to buy SSD technology and remove doubts and risks
which currently act as roadblocks?" - The top 2 factors quoted in replies
were performance guarantees and try before you buy. Some
enterprise SSD oems seeing this market feedback were quick to adapt these
concepts to the way they did business.
My advice? - If you're planning
to make a big enterprise SSD purchase - tell suppliers that you'll only consider
their products if they offer you a money back performance guarantee (which
they can easily do if they have enough experience with your type of application)
or ask them if they will let you "try before you buy" - (if your
application environment is unusual and outside the scope of their speedup
models).
How can consumers and SMBs navigate around SSD landmines?
If you're a consumer or small business looking at a modest spend
on flash SSDs it's probably unrealistic for you to invest the resources to
learn about this technology and safely qualify products for yourself. As I've
already said above - you can't trust magazine reviews either. They should just
be regarded as an indication of what is possible - rather than a guarantee of
what you'll see in practice.
My advice is - talk to a specialist
SSD
reseller.
I know of less than 10 SSD VARs worldwide who have
been focusing exclusively on SSDs for consumers as their primary business for
many years - but I'm not an expert on VARs. There may be more.
It's
counter productive for SSD VARs to recommend products which are difficult to get
hold of - or which have high return rates. Tell them what is important to you -
and ask what they recommend. Ask them how long they've been in the SSD market
too. If it's less than 2 years - go somewhere else. One way you can
independently verify if what they say is true - is checking out web references
to their SSD activities - including for example their website listings in past
years in the
Internet Archive.
Will it
get easier to navigate the SSD market in future?
Yes. Sure. But
that could be another 5 years in the future. I think the SSD market will
get a lot more
complicated and
confusing before it gets any simpler.
To help you understand
what's going on and see the future clearly I hope you'll come back to
StorageSearch.com as we continue our long term mission - of "leading the
way to the new storage frontier". |

|
......................................................................................................................... |
| |
..... |
 |
If Megabyte spent less time on email
he'd get more SSD
articles written. | |
|
.. |
|
WARNING! - CONSUMER SSD
contents liable to
change without notice |
Editor:- June 13, 2014 - it seems that the risk
of preplanned component substitutions by the original branded SSD maker (rather
than merely the supply chain risk of counterfeits by persons unknown) is another
uncertainty which readers in the consumer SSD market may now have to contend
with.
...read more | | |
|
.. |
|
|
|
. |
|
SSD Review exposes how
rebranded memory can adulterate consumer SSDs |
Editor:- February 18, 2013 - the SSD Review recently published
an
in-depth article which shows how the memory chips in
consumer SSDs -
which appear to come from one source - may actually have come from somewhere
else.
The article - by Les Tokar, Editor-in-Chief
of the SSD Review - reads at times
like a gripping detective story - and looks into the murky topic of remarking
and rebranding flash chips - which can lead to adulteration and quality
problems in the memory supply chain - all in pursuit of getting the lowest
manufacturing cost.
These problems and risks have been well known in expert SSD circles
but Les Tokar's new exposé brings this shadow world into vivid focus.
...read
the article | | |
|
. |
|
..."Not
every manufacturer takes product quality seriously. When an SSD
manufacturer tries to downgrade Nand Flash to lower the price and
impress consumers, they also pass on the risk of data loss to consumers." |
...Email from
Renice Technology
(September 2011) warning about buying SSDs from oems which don't test
and qualify the quality and compatibility of their raw flash suppliers. | | |
|
. |
|
"If Intel's SSD design
business was a horse - it would have been shot a long time ago and put out of
its misery..." |
...Editor commenting to
a reader (July 2011) about reports of yet another
flaky design problem
with Intel SSDs - this time related to
power cycling. | | |
|
. |
|
Surviving SSD
sudden power loss |
Why should you care
what happens in an SSD when the power goes down?
This important design
feature - which barely rates a mention in most SSD datasheets and press releases
- has a strong impact on
SSD data integrity
and operational
reliability.
This article will help you understand why some
SSDs which (work perfectly well in one type of application) might fail in
others... even when the changes in the operational environment appear to be
negligible. |
| | |
|
. |
|
pushing the
SSD testing rock farther up the hill |
Editor:- August 25, 2010 - I'm
mostly resistant to the idea of rehashing recent news stories - but yesterday
while talking about new SSD technologies a reader asked me to take another
look at
SNIA's SSD
performance testing guidelines - which I reported on
a month ago.
I
said I had been surprised it took
ORGs like
SNIA so long to look at
these issues - because I had been aware of "Halo effects" in
flash SSD benchymarks for years - and commented - "But I guess member
led ORGs have a built in lag factor and only move at the speed of the
slowest exec members."
The reader - Neal Ekker -
whom I knew from his
time
at
Texas Memory Systems -
put up a spirited defense for this particular ORG's opus and said...
""...We've
all known about the fishy-ness of SSD performance claims for years. But I'd like
to draw attention to what an impressive accomplishment the SNIA SSS PTS
represents, no matter its technical merits or ramifications. I watched it
happen, and I can tell you it was an amazing POLITICAL achievement. And
I don't mean that in a negative way. Any time there's more than one person in a
room, there's politics. For a collection of engineers representing both their
own egos and the interests of their employers to finally agree on even this
rather bare-bones beginning standard was just remarkable to observe. I can't
begin to give enough credit to some of the chief movers and shakers.
Neal Ekker added - "This is why I want more attention focused on
the SSS PTS right now, so we don't lose momentum entirely. There's still plenty
of work to be done. We need additional companies and fresh faces and energies to
step up and push this rock
a little farther up the hill."
Editor's comments:- During the majority of the SSS PTS development Neal
Ekker served as the SNIA SSSI Education Committee Chair. He's now a for-hire
independent SSD marketing consultant. ...Neal's bio,
...SSS
PTS (pdf), Storage
People | | |
|
. |
|
|
|
. |
|
SSD Data Recovery
Concepts and Methodologies |
It's hard enough understanding the design
of any single SSD. And there are so many different designs in the market. |
 |
If you've ever wondered what it looks like at
the other end of the SSD supply chain - when a user has a damaged SSD which
contains priceless data with no usable backup - this article - written by
Jeremy Brock, President, A+ Perfect Computers
- who is one of a rare new breed of
SSD recovery
experts will give you some idea. read the article | | | | |