|
SLC
versus
MLC
in Enterprise SSD arrays |
|
Editor:- February 27, 2008 - The original purpose of my
SSD Myths
article was to show that you needn't worry about wear-out if you use "best
of breed" flash
SSDs with write-endurance on the order of 1 million cycles and above.
When
it was first published (in
March 2007)
all flash SSDs in traditional
hard disk form factors
used SLC.
But in the year following publication many
leading SSD oems
(including Samsung,
Mtron and
STEC ) have also
introduced MLC products too.
MLC doubles the capacity of flash
memory by interpreting 4 digital states in the signal stored in a single cell
- instead of the traditional (binary) 2 digital states.
This technique
has been commercialized and proven over many years in hundreds of millions of
cell phones and MP3 / iPod music players - where the theoretical consequence of
data corruption (if anything went wrong with this risky "new" storage
technology) was no more serious than an inaudible sub millisecond sound blip or
invisible pixel splat.
In the
SSD market MLC yields much
lower cost storage than SLC with read / write speeds which are nearly as fast
as the best SLC devices.
The manufacturers of first generation "hard
disk replacement" MLC flash SSDs have responsibly classified them as aimed
at the "notebook market" and by subtle wording differentiated them
from their more pricey "enterprise" products. In the low duty cycle
world of a notebook these MLC SSDs should give a good operating life - typically
similar to the hard disks they replace. (Most SSD marketers would claim their
MTBFs are even better than HDDs).
But there's no way to tell the
difference between SLC and MLC SSDs externally (apart from the model numbers).
Put them in a rackmount system in a datacenter with fast processors which can
pump them continuously close to the maximum speed and what happens? |
 |
It's a simple matter to plug
new data for MLCs into the calculation I did for the worst case wear-out process
for flash SSDs - which I called the Rogue Data Recorder.
Instead of
the 64GB example I used then, I'll assume the MLC SSD has 128GB capacity.
MLC SSDs have more capacity than SLC. And more capacity means longer operating
life - before cells wear out.
I'll still use the 80M bytes / sec
sustained write speed - because the fastest MLC products (in Feb 2008) can
already do that. (Meanwhile the fastest SLC products have moved up in the world
and are about 50% faster.)
The next factor is where we hit the big
problem... Instead of a write endurance rating of 2 million cycles (for the best
SLC) - I can only use a figure of 10,000 for MLC. MLC has a much lower rating
due to the complex interaction of discriminating multiple logic levels reliably
coupled with the intrinsic failure mechanism of wear-out.
Plugging
these numbers in the same calculation gives an estimated MLC flash SSD operating
life (at max write throughput) which is 6 months! (instead of 51 years
for a 64GB SLC SSD).
That's not good enough for a data driven
enterprise. There isn't a wide enough safety margin.
Proponents of MLC
might say - can't you batch select MLC chips for better write endurance in the
same way that some oems do for SLC wear out? - Couldn't that give a figure that
is 10x better?
There's not enough data to give a definitive answer -
but I suspect the answer would be no!
The reason is that you would be
selecting for the mutual inclusion of a single chip being inside 2 different
probability curves for what are already secondary characteristics. (Like looking
for the ideal man in
Sex and the
City.) Even in the unlikely event that you could find some devices with the
magic properties to do this - the yield would be small - pushing the cost up
and eliminating the main reason for using MLC.
That's where I thought
this "SLC versus MLC in enterprise SSDs" discussion would end. But
then another factor appeared out of the blue.
Sam Anderson at
EasyCo pointed out to me
that one side effect of their patent pending Managed
Flash Technology is that their software "effectively erases erase
blocks 10 to 100 times less frequently than drives doing traditional
random writes" because it writes address blocks monotonically.
EasyCo's
MFT was originally designed to give much faster system IOPS in flash SSD
arrays by using patent pending write algorithms which manage arrays of standard
SSDs in a way which reduces the probability of successive writes to an address
block which is already busy in a time consuming erase/write cycle.
This
new (to me) attribute of MFT opens up the possibility of yet another generation
of high speed rackmount SSDs with new price points which could be 50x lower
than
RAM SSDs while being
only 3x slower overall in typical applications.
I'll return to that
subject soon in a new article called - Demystifying SSD IOPS. |
|
Conclusion?
I
can't give a definitive answer to the question - Are MLC SSDs Ever Safe in
Enterprise Apps?
With the current state of technology in 2008 - it
depends on the application and the consequences of data corruption.
I
wouldn't risk it if I were a bank - but I might not mind if my own bank risked
it and changed some pluses to minuses...
Seriously though I hope this
article has shown that there are serious risks inherent in using MLC flash
SSDs if they are not applied correctly.
Some of these risks can be
managed by choosing an SSD array supplier who has qualified and tested their
racks with products from a single known source (because every make of MLC flash
SSD has its own unique failure profile).
I know that despite my
warnings - MLC flash SSDs will get used in some enterprise apps - because the
cost difference (compared to other options) is very attractive.
In my
view using an MLC flash SSD array for an enterprise application without at least
using the (claimed) wear-out mitigating effects of a technology like
Easyco's MFT is like jumping out of a plane without a parachute.
And
even with a parachute - strange things may still happen to wannabe MLC SSD
enterprise pioneers on the way down.
More Articles About Flash SSD Endurance
article:- Increasing
Flash Solid State Disk Reliability Squeak! - SSD Myths
and Legends - "write endurance" article:- Is All
CompactFlash Really Created Equal? (pdf) article:- Flash Disk
Reliability Begins at the IC Level (pdf) article:- Flash
Solid State Disk Write Endurance in Database Environments |
| . |
|
| |
|
|
|
| . |
|
|
| . |
| Are MLC
SSDs More Susceptible to Power Rail Disturbance? |
As someone who in a
past career designed analog data acquisition products and systems which got
right down below the thermal noise and who cared about the shape and material
of PCB tracks I want to air another concern about the (in)/advisability of
using MLC Nand flash in datacenter applications where there's a lot of power
rail disturbance.
Although MLC devices have been used in commercial
products since 2003 - the products they have been in (phones and portable music
players) have been battery operated environments where (inside the casing) the
environment's overall power rail and
electromagnetic
compatibility has been controlled and managed by the system designers who
know enough about these things. And as I say elsewhere in this article - the
consequences of misread data in these applications are trivial.
You
could say almost the same about the environment for a MLC flash SSD inside a
notebook PC. It's a known, testable environment. Although the user can plug
modules in - they're rarely a high energy disturbance product. The designers
would have tested it with a range of plug-ins, and they've sold millions of
similar notebooks before. There will be few surprises.
An array of
SSDs in a datacenter cabinet is not such a quiet place.
There are
plenty of fast processors all around. Above you - below you. The SSD designer
does not control that space. Every installation is unique.
Something
which you may not be aware of - is that inside an MLC flash chip are
effectively:- a 2 bit anlog to digital converter (ADC) and a 2 bit DAC. Between
each of the 4 logic levels there is also an indeterminate band where the signal
should never be. Power line disturbances are 3x more likely to result in a false
read for MLC than SLC, but the overall error comparison gets worse. There's
also a bigger intrinsic risk (for MLC than in SLC) of an error creeping in with
the initial write charge. SSD designers deal with this by surrounding blocks
of MLC flash data with heavier error detection and correction codes than they
would normally use for SLC.
I found a good detailed discussion of ECC
potential problems in this Denali
article:-
Memory ECC: A curiosity for
decades, now essential for MLC NAND flash from which the quote below comes.
"With
the voltage levels closer together for MLC flash the devices are again more
susceptible to disturbs and transient occurrences, causing the generation of
errors which then have to be detected and corrected. If that is not enough for
the chip maker, it poses an even larger problem for the system designer, in that
there is more of a variety of technologies employed among competing flash chip
designs than DRAM makers, for example, would ever dream of."
For a
related discussion about what EMC (not the storage company) can mean for
signal integrity going into a flash SSD see the white paper -
Noise
Damping Techniques for PATA SSDs in Military-Embedded Systems (pdf) by
SiliconSystems. | |
| . |
| Squeak! - SSD
Myths and Legends - "write endurance" |
| Does
the fatal gene of "write endurance" built into
flash
solid state
disks prevent their deployment in intensive server acceleration
applications - such as RAID
systems? |
It
was certainly true as little as a few years ago.
What's the risk with
today's devices?
This article looks at the current generation of
products and calculates how much (or how little) you should be worried. |
 | |
RAM based SSDs have been
used alongside RAID for years - but
flash SSDs are
physically smaller and have bigger capacity (upto 412G in 2.5", 832G in
3.5") and are lower cost than RAM-SSDs and could actually be configured
in standard RAID boxes.
F-SSDs aren't as fast as RAM based products
but a single flash SSD can deliver 20,000 IOPs - which when scaled up in an
array - starts to look interesting.
...read the
article,
storage reliability
solid state disks | |
| . |
More Conclusions
Flash
SSDs are complex systems with a lot of stuff going on inside.
Like cars
(which use the internal combustion engine) all flash SSDs from all manufacturers
are not the same.
Even if they have the same capacity and
interfaces.
There are many different process and media management
technologies inside a a flash SSD which oems deal with (or not) in their own
proprietary ways. These are just some of the consequences:-
- best to worst wear leveling algorithms can vary product life by a factor of
3 to 1. (That's not too bad. Some so called "SSDs" - which are
actually dumb flash storage bolted to a disk interface - don't have wear
leveling and should not be used in servers at all.)
- best to worst SLC endurance can vary by 30 to 1.
- SLC to MLC endurance can vary from 10 to 1, upto 300 to 1
- intrinsic electrical noise susceptibility between SLC and MLC is hard to
quantify - but probably on the order of 10 to 1. Although hidden by wrap
around redundancy and error detection and correction - the possibility of
uncorrectable errors is still greater in MLC - which is unproven in enterprise
environments.
Buying flash SSDs for enterprise applications should be
regarded as an important qualifying process. Just as you wouldn't buy a
traditional RAID system
without knowing what type of hard disks were inside it, or without knowing
something about the experience of the vendor in enterprise apps - so too you
shouldn't buy flash SSDs without asking about the factors discussed in this
article.
The risk for users is that many oems who designed SSD
architectures for the notebook market - will try to capture business in the
enterprise market - with the same (or similar) products without dealing with
the datacenter's need for better resilience and data reliability.
And,
sadly, I know from my own inbox that some SSD marketers don't know how much
they don't know about their own market and how much more advanced their
competitors are in the field of reliability. | |
| . | |