| Editor's
intro |
Does the fatal gene of
"write endurance" built into
flash SSDs prevent
their deployment in intensive server acceleration applications? It was
certainly true as little as a few years ago. What's the risk with today's
devices? | |
...Later:-
February 27, 2008
Are MLC SSDs Ever
Safe in Enterprise Apps?
SLC
versus
MLC
in Enterprise SSD arrays |
The original purpose of
this SSD Myths article was to show that you needn't worry about wear-out if
you use "best of breed"
flash SSDs with
write-endurance on the order of 1 million cycles and above.
When it
was first published (in
March 2007)
all flash SSDs in traditional
hard disk form factors
used SLC.
But in the year following publication many
leading SSD oems
(including Samsung,
Mtron and
STEC ) have also
introduced MLC products too.
This new
follow up article
starts down a familiar lane but an unexpected technology twist takes you to a
new world of possibilities. | |
Flash based
solid state
disks would seem to be the ideal virtual storage device...
In
every other respect you can treat them in exactly the same way as a
hard drive:- same
interface, same software model. They even fit mechanically into the same
standard hard drive slots. And in many ways they are better - significantly
faster, consuming less electric power and more tolerant of ambient temperature
and vibration extremes. You mostly don't need to know about what's inside them.
They are the perfect "fit and forget" storage product.
In
the smaller form factors like 1.8"
and 2.5" - the gap
in capacity between SSDs and hard drives has disappeared. If it wasn't for the
price you'd use them - right? (The price advantage of SSDs in particular
applications is discussed in
another
article.)
What's wrong with this utopian vision?
And
why is it that even if you were offered a flash SSD accelerator for your server
absolutely FREE you might still hesitate about installing it?
The
answer explains why the flash SSD server acceleration market still isn't a
billion dollar plus market - even 4 years after I
first posed this exact same
question.
When you look in more detail at flash SSDs there is just
one skinny dark stormcrow hanging around the edge of this picture which makes
you feel uneasy about a technology which in other respects is acquiring an
untarnished reputation. That's the prickly issue of write endurance. |
. Write Endurance: - The number of write
cycles to any block of flash is limited - and once you've used up your quota
for that block - that's it! The disk can become unreliable.
In the early days of flash SSDs managing this was a real
headache for oems and users. The maximum number of write cycles to an address
block - the endurance - was initially small (about 10,0000 write cycles in
1994, rising to 100,000 in 1997). And the capacity of flash storage was small
too. So the write endurance limit was more than just a theoretical
consideration. In the worst case - you could destroy a flash SSD in less than a
week! But in those days the SSD was being designed in by electronics engineers
who knew exactly how the SSD was going to be used. If it helped solve the
problem they could even rewrite the software a different way to lessen the
risk.
But when you buy an SSD for use in a notebook or server - you
don't write the software. You don't control the data. So how do you know in
advance if you're going to hit that brick wall?
This fear is an issue
which has slowed down the adoption of flash SSDs in commercial server
acceleration applications. Write endurance doesn't affect
RAM based SSDs - which have
until now dominated that part of the market - mainly due to their superior
speed. But the speed of flash SSDs has improved to the point where they could
replace RAM based SSDs
in many server acceleration slots at a much lower price - if it wasn't for the
worry about endurance.
Write endurance has been a FUD issue for
potential enterprise server users. They know it's lurking there - but who can
they trust to quantify the problem in their own language? |
|
Server makers didn't want users
to know about SSDs (any type - period) during 2000 to 2006 - because more SSDs
meant selling less servers. In the
2005
edition of the SSD Buyers Guide I wrote about the problem...
"One
disadvantage, compared to RAM SSDs is that flash has an intrinsic limit on the
total number of write cycles to a particular destination. The limit varies,
according to manufacturer but is over millions of cycles in the most durable
products. Internal controllers within the flash SSD manage this phenomenon and
can reallocate physical media transparently to prolong media life. In most
applications, high endurance flash SSDs can have a reliable operating life which
is typically 3 times as high as that of a hard drive. But I would hesitate about
installing a flash SSD as a server speedup in a university maths research
department, for example, or in other applications where the ratio of data writes
to data reads is unusually high."
In May 2006 I came to
the conclusion that my earlier doubts may need to be revised.
It was
clear from reader emails and negative comments about SSDs which I saw in other
publications that fear and doubt about the impact of write endurance was slowing
down adoption of flash SSDs in the server acceleration market. It was also clear
that most users didn't know how to interpret the kind of data being offered by
SSD oems - which was designed for an elite audience of electronics designers -
and not for managers of storage systems. So I contacted all flash SSD oems with
the idea of setting up a standard way of presenting endurance life expectancy
data - with a proposal which I called the "SSD Half Life." That dialog
met with some enthusiasm but there wasn't enough vendor support to take it
further. The SSD oems I talked to took reliability very seriously - but didn't
want their own proprietary reliability schemes and models swamped by a general
industry wide scheme.
The way that SSD oems deal with the management
of write endurance internally within their products varies but they all have the
common theme of scoring how many times a block of memory has been written to,
and then reallocating physical blocks to logical blocks dynamically and
transparently to spread the laod across the whole disk. In a well designed flash
SSD you would have to write to the whole disk the endurance number of cycles to
be in danger.
Some manufacturers go a step further.
SiliconSystems
has a patented algorith which delivers a lifetime which it claims is better
than simplistic wear levelling. Another manufacturer
Adtron actually has a
percentage of spare flash blocks in the SSD - which are invisible to the host
interface and don't show up as spare storage. But internally - when blocks get
close to the limit - the data is transparently switched over to the spare parts
of the disk to give an additional breathing space. |
 |
Increasing
Flash Solid State Disk Reliability an article by SiliconSystems |
Solid
state disks, based on flash technology, have greatly improved in performance in
recent years and now compete head to head with RAM based accelerator systems.
Flash also has significant advatanges in servers compared to RAM SSDs due to low
power consumption.
But if you think that all solid state disks which
use flash are equally reliable and enduring then think again.
That's
a bit like saying that a Mercedes 300SL sports coupe is as tough as a Tiger
tank because both were made in Germany and both are built out of metal. But as
Oddball (Donald Sutherland) says in the movie
Kelly's
Heroes "I ain't messing with no Tigers."
This article
by SiliconSystems, shows how their patented architecture cleverly manages the
wear out mechanisms inherent in all flash media to deliver a disk lifetime that
is about 4 times greater than of other enterprise flash products and upto 100
times greater than intrinsic flash memory. ...read the article,
...SiliconSystems
profile, Solid state disks | |
The precise numbers are a
proprietary secret but are based on analyzing the software from real customers'
SSD applications over many years. OEMs, like these, which target high
reliability applications, are also more picky about which flash chips they use,
and qualify them according to the results they see from testing.
the
Flash SSD Application from Hell* - the Rogue Data Recorder
In
most real-life applications the computer does a lot more reads from disk than
writes - and the duty cycle (that's the percentage of time that the disk is
being accessed at all) is low. But to estimate whether you should be worried
about write endurance with today's SSD technology I've chosen a worst case
example - the Rogue Data Recorder.
Real
hard disk based data
recorders from companies like
Conduant can record
data continuously in an endless loop. They are useful for a bunch of
applications such as capturing pre-trigger data in seismic events, capturing
unpredictable data for modelling and bugging phone calls. I managed a company
in the mid 80s which pushed storage technology to its limits to get wire speed
continuous recording onto disk and massive memory systems with inbuilt real-time
trigger processors, embedded workstations and array processors for various
types of industries and agencies. That was a good education for my day job now
of cutting and pasting. |
| Leading
Data Recorder Company Comments |
...Later:- Ken Owens, CEO,
Conduant commented.
"In
many applications that use a standard file system, the directory updates are a
major concern for using up the available flash life.
Even though
recording applications are inherently heavy on writing, the optimization of the
directory structure actually minimizes the number of times a specific location
is written thereby extending the life of the flash media.
This is on top of any wear leveling algorthms provide by the SSD
manufacturer. Conduant systems can even use Compact Flash in some recording
applications." | |
Most of you wouldn't set out to
design a real-time data recorder - and if you are doing that - this article
isn't going to tell you anything you don't already know. But by looking at the
worst thing which could happen and estimating a confidence boundary from that -
it can tell you how much you need to worry.
The nightmare scenario
for your new server acceleration flash SSD is that a piece of buggy
software written by the maths department in the university or the analytics
people in your marketing department is launched on a Friday afternoon just
before a holiday weekend - and behaves like a data recorder continuously writing
at maximum speed to your disk - and goes unnoticed.
How long have
you got before the disk is trashed?
For this illustrative
calculation I'm going to pick the following parameters:- |
|
| Configuration:- |
a single flash SSD. (Using more
disks in an array could increase the operating life.) |
| Write
endurance rating:- |
2 million cycles. (The typical
range today for flash SSDs is from 1 to 5 million. The technology trend has been
for this to get better.) |
| Sustained write speed:- |
80M bytes / sec (That's the
fastest for a flash SSD available today and assumes that the data is being
written in big DMA blocks.) |
| capacity:- |
64G bytes - that's about an
entry level size. (The bigger the capacity - the longer the operating life -
in the write endurance context.)
Today single flash SSDs are available
with 160G capacity in 2.5" form factor from
Adtron and 155G in a 3.5"
form factor from BiTMICRO
Networks.
Looking ahead to Q108 - 2.5" SSDs will be available
upto 412GB from BiTMICRO.
And STEC will be
shipping 512GB 3.5" SSDs. | |
|
To get that very high speed the
process will have to write big blocks (which also simplifies the calculation).
We
assume perfect wear levelling which means we need to fill the disk 2 million
times to get to the write endurance limit.
2 million (write endurance)
x 64G (capacity) divided by 80M bytes / sec gives the endurance limited life in
seconds.
That's a meaningless number - which needs to be divided by
seconds in an hour, hours in a day etc etc to give...
The end result
is 51 years!
But you can see how just a handful of years ago -
when write endurance was 20x less than it is today - and disk capacities were
smaller.
For real-life applications refinements are needed to
the model which take into account the ratio and interaction of write block size,
cache operation and internal flash block size. I've assumed perfect cache
operation - and sequential writes - because otherwise you don't get the maximum
write speed. Conversely if you aren't writing at the maximum speed - then the
disk will last longer. Other factors which would tend to make the disk last
longer are that in most commercial server applications such as databases - the
ratio of reads to writes is higher than 5 to 1. And as there is no wear-out or
endurance limit on read operations - the implication is to increase the
operating life by the read to write ratio.
As a sanity check - I found
some data from
Mtron (one of the few
SSD oems who do quote endurance in a way that non specialists can understand).
In the data sheet for their
32G product - which incidentally has 5 million cycles write endurance - they
quote the write endurance for the disk as "greater than 85 years assuming
100G / day erase/write cycles" - which involves overwriting the disk 3
times a day.
How to interpret these numbers?
With
current technologies write endurance is not a factor you should be worrying
about when deploying flash SSDs for server acceleration applications - even
in a university or other analytics intensive environment.
How about
RAID systems
stuffed with
flash SSDs?
The calculation above gives the worst case (shortest)
operating life based on stuffing data into a single disk at the fastest
possible speed. Having a faster interface coming into the a
box stuffed with SSDs
doesn't make the life shorter - because the data can only be striped to any
individual disk at the limiting rate for that disk.
Au contraire:- not
only can an SSD RAID array offer a multiple of a single SSD's throughput,
and IOPs, just as with hard disks but depending on the array configuration the
operating life can be multiplied as well - because not all the disks
will operate at 100% duty cycle. That means that MTBF and not write endurance
will be the limiting factors. And although oem published
MTBF data for hard
disks has been discredited recently - the MTBF data for flash SSDs has been
verified for over a decade in more discriminating applications in high
reliability embedded systems.
I've been waiting years for storage oems
to start marketing flash SSD based storage arrays - as alternatives to RAM based
systems. What's held that market back has been the looming shadow of write
endurance. That myth - that flash SSDs wear out - now belongs to the past.
* clarifying why the the Rogue Data Recorder is the Worst Case
Application
I didn't need to explain this choice to those
who design SSDs, but it's clear from some comments I've seen that some readers
who don't have an electronics / semiconductor education or don't know enough
about SSD internals have queried this choice.
Why, for example, does
the data recorder example stress a flash SSD more than say continuously writing
to the same sector?
The answer is that the data recorder - by writing
to successively sectors - makes the best use of the inbuilt block erase/write
circuits and the external (to the flash memory - but still internal to the SSD)
buffer / cache. In fact it's the only way you can get anywhere close to the
headline spec data write throughput and write IOPS.
This is because
you are statistically more likely to find that writing to different address
blocks finds blocks that are ready to write.
If you write a program
which keeps rewriting data to exactly the same address sector - all
successive sector writes are delayed until the current erase / write cycle for
that part of the flash is complete. So it actually runs at the slowest
possible write speed.
If you were patient enough to try writing a
million or so times to the same logical sector - then at some point the internal
wear levelling processor would have transparently assigned it to a different
physical address in flash by then. This is invisible to you. You think you're
still writing to the same memory - but you're not. It's only the logical address
that stays the same. In fact you are stuffing data throughout the whole
physical flash disk - while operating at the slowest possible write speed.
It
will take orders of magnitude longer wearing out the memory in this way than in
the rogue data recorder example. That's because writing to
flash is not the same as
writing to RAM, and also
because writing to a flash SSD sector is not the same as writing to a block of
dumb flash memory. There are many layers of virtualization between you and the
raw memory in an SSD. If you write to a dumb flash memory chip successively to
the same location - then you can see a bad result quite quickly. But
comparing dumb flash storage to intelligent flash SSDs is like comparing the
hiss on a 33 RPM vinyl music album to that on a CD. They are quite different
products - even though they can both play same music.
...Later:- Clarifying Flash Endurance Specifications
I've
added this footnote in response to some reader emails which asked about the
variation in flash endurance specs quoted by different flash SSD oems.
Like
any semiconductor related spec (such as memory speed, or analog offset voltage
in an op-amp, or failed memory blocks in a high desnity RAM chip) - there's a
spread of performance which depends on the process and may vary over time in
the same wafer fab, or at the same time when chips are made in different fabs
within the same company.
A spec such as 100k or 1 million or 10 million erase-write cycles -
is a business decision made according to market conditions - which gives
generic semiconductor buyers a confidence level that if they buy 1 million
chips - then the reject rate - of those wthat will fail due to process
tolerances - will be acceptably low. The shape of the distribution curve may
not actually be gaussian - but there is a distribution curve in there which is
implied by the published specs.
Due to process variations between oems
(some designs will be automatically shrunk from old designs, other layout
geometries may be recompiled or optimised for that particular process point)
there will be vast differences between the endurance from different
chipmakers.
As the generic semiconductor flash market doesn't place a premium on
this spec - the "datasheet" published standard number will gradually
improve at a slow pace (every 2-3 years) even if some oems are making chips
today which are 10x better.
If I was designing a high reliability flash SSD - I would want to get
into the process details - qualify devices and order them to my own spec.
Currently SSD volumes are too low - to give much buying power with flash
chipmakers. Therefore few SSD oems are able to buy flash chips qualified to
their own specs. (This is done by batch testing samples and by negotiation with
the fab where the chips are made.) Some SSD oems make their own flash chips -
and while this gives them more control over the end to end process - it does not
necessarily mean that they start with the best chips. |
|
 |
|
|
For more information about
SSDs take a look at these resources
- Solid State Disks - is
our directory of SSD manufacturers, and includes current news stories related
to the SSD market
- Squeak! -
Why are Most Analysts Wrong About Solid State Disks? - describes the main
applications which account for nearly all the SSDs used, and gives the user
value propositions explaining why SSDs are taking over in these applications.
It includes strategic predictions about the market for the next several years.
|
|