wrapping up SSD endurance
selective memories from 40 years of thinking about endurance by
editor - StorageSearch.com - July 20, 2018
|The intertwined and evolving
mythical relationships between the write endurance of raw flash memory chips
and the reliability
of the SSD drive / array in which they are used as the primary storage
components - has been been one of the most
read by readers of StorageSearch.com for over 12 years. However my own
editorial coverage of that subject started several years before that - at a
time when SSD makers were still nervous about talking openly about the very
idea that their SSDs had any wear-out issues - which could lead to sudden death
of the entire SSD - at all. |
I must admit that the enduring interest in endurance and the high popularity of
these articles was at many times irritating for me - particularly when I had
just written about other aspects of SSD design architecture (which I thought
were just as important) - but the constant tides of memory cell shrinks and
SSD performance progress kept pulling me back to write again and again about
endurance. Including many articles I have now forgotten but which can be found
in the news
Each time that leading SSD thinkers had reached some
kind of consenus about the relationships between the different types of memories
and how best to manage and deploy them in SSDs a new innovation in flash
controller design would come along to facilitate a stretch to applications
elasticity which busted previous limits.
Early on in this long running
saga I told my readers that there were few hard rules except these.
- Raw memory endurance is not the same as SSD endurance.
SSD can live much longer or much shorter than the average life expectancy of a
typical memory cell - when viewed from the R/W perspective of host write
come from differences in understanding and differences in design of controller
architecture (which includes software).
The quality of designs and
their footprint (chipcount, power usage and IP complexity) vary by orders of
magnitude - even in SSDs which superficially are aimed at similar markets and
which are being sold at the same time.
- The risk of early burn out is real.
If you use an SSD in
a way which the designers didn't intend.
On the other hand the cost of
over specifying an SSD means that you may end up paying many times more than you
sell StorageSearch.com and will no longer be writing much about the SSD
market in 2019 I thought I'd write one last article which looks back at some
of my memories about endurance.
- That's why there is no such thing as an ideal endurance figure in a
flash memory, or an ideal DWPD for an SSD.
context and business case are important boundary factors which define how
endurance factors are managed in the optimally affordable SSD.
nvm endurance in 1978 to 1980
My first encounter with the idea of write endurance in semiconductor
memories came in 1978 as a theoretical warning in a datasheet for a
new memory product called EAROM.
In those days I used to read datasheets for chips and processors in the same way
that editors nowadays read blogs and news stories. Having digested the
datasheet (but not having any immediate need for that memory myself in my own
designs) I wasn't greatly surprised when a company
I later worked for
- in 1980 - recalled their memory modules which had used those memories
inside because of failures in the field - due to premature remanence or
wear-out - I didn't ask my colleagues which. Temperature may also have been a
factor too - because the AMD bit-slice processors to which the non volatile
memory had been attached in the 1979 PLC design ran hot. (The solution to that
design problem was battery backed CMOS RAM - an option which had been discounted
earlier because of its dependence on the reliability of the attached battery.)
The next time I met the subject of write
endurance - in 1984 - it was another incidental thing - and not
something I used in a production design. I noticed that when saving data to
Intel's 2816 (an EEPROM) some of the locations could be written to with much
fewer write pulses than others. This meant they had better cells and could be
written to more quickly. But Intel also cautioned anyone who might play around
with these chips that writing too aggressively could damage the chips. In later
non volatile memory chips - the write pulse mechanism was embedded in the chip.
This made writes more foolproof. And I don't think that most electronic
systems engineers gave any thought to the variability of what may be hidden
behind the write mechansim for another 20 years.
2004 - flash takes
aim at the server acceleration space of RAM SSDs
For me and my
working life the subject of write endurance in flash memory became a big deal
from 2004 onwards when
flash SSDs began
to infiltrate the server acceleration market. At first warnings from
experts in the SSD industry that users would experience short working lives with
flash SSDs due to burn out were proven to be correct. But this didn't deter
users who mostly liked the performance gains they were getting and in some cases
simply adjusted their buying behavior to refresh the early flash drives very
frequently. Also, early burnout wasn't inevitable in arrays which used
appropriate SSD controller architectures and related techniques.
- a classic article on wear leveling
Another angle on SSD endurance
was (and still is) longevity in industrial SSDs. In 2005
published a classic paper on
here on StorageSearch.com and invested a lot of resources in ensuing years to
educate industrial systems designers were familiar with the reliability factors
associated with different elements in the design of SSDs.
Also in 2005
I began a news thread on
which datamined related stories from
SSD news. In 2008
- as SSDs became a greater part of all the content here I collected up
SSD reliability papers
into one place. Even in those days endurance was just one part of the
reliability mix as you can see.
By 2010 the SSD market had become much
better acquainted with the idea that SSD controllers were an important and
separate part of every SSD design and specialist companies in that area were
surprised to learn how much hunger there was for trustworthy articles which
explained what they did and why. My article
Imprinting the brain
of the SSD noted how big a change that was compared to before.
that time most stories about SSD endurance became part of mainstream
SSD news - but you can
sample how some of the metrics and ideas appeared in
versions of the SSD
controller page and infrequent updates to my 2007 article -
myths and legends.
An important sanity check is that most of the
key people in the SSD industry (including designers of SSDs and founders of SSD
companies and their biggest customers) were reading these pages during this
period. And my self appointed aim was to help guide the industry forwards in
directions which aligned with my own
- and the start of the SSD market Bubble
2010 was Year 1 of the
SSD Market Bubble. (For the significance of other years - see
From this point the
hitherto unknown and
controller industry invested huge intellectual resources and amazing talent
to enable each successive generation of (less reliable) flash memory to be
used in reliable SSDs and systems. And as the SSD market continued to grow in
revenue and strategic
importance - the big manufacturers of memory - which earlier had little
reason to understand the SSD potential of the chips they had been making - began
to digest lessons from the SSD market and understand understand the
applications for their raw memories better.
a 2018 perspective
users were deploying SSDs in different ways to earlier types of storage and
memory the industry took another 5 to 10 years to characterize what a "good"
level of endurance would be for particular applications.
And every few
years when new types of 2D flash memory came into production with greater
capacity but lower endurance - the wear mitigation arguments and analysis began
again from new (and more challenging) starting points.
In the past 10
years the 5 factors which have done the most to set the stage for the market
acceptance of flash memory endurance in usable SSD roles have been:-
- adoption of DWPD
- drive writes per day - as a standard way to signal which applications a
new SSD has been optimized for.
Endurance became a knowable factor
and users didn't need to be scared about its existence - as long as they
chose the right SSD for the application.
The SSD market grew
alongside other markets which it helped to create. So for example the idea of
a low DWPD SSD - in cloud infrastructure - as a valued and desirable product
would never have been in anyone's SSD business plan in 2004 - when the primary
proposition for enterprise use was server acceleration.
flash care management & DSP integrity IP in SSDs - was a movement
in SSD controller design - to invest extremely sophisticated intelligence and
noise filtering techniques inside each SSD which - among other things -
enabled the use of light weight (and less damaging) write pulses to be used -
compared to traditional hard codable ECC.
- The adoption of big SSD controller
architecture and using software
(for example Software-Defined Flash (Baidu
Host Managed SSD Technology (OCZ -
and other techniques and names), to leverage array level intelligence and
intelligence flow symmetry (see article for citations) to manage the
movement of data and reliability in SSD arrays has become the normal way of
Each AFA company and cloud integrator use their own
brews of standard and proprietary IP tricks and this is a an area of design
which is still evolving with in-situ processing.
- Machine Learning as the discovery tool for the best ways to
explore the optimum settings for R/W (timings and pulse shapes) when
characterizing new generations of 3D flash.
This is a technique
(first widely disclosed in 2013) which promises to maximise flash endurance
when used in conjunction with lightweight SSD controllers. (As opposed to the
kind of heavyweight energy and CPU footprint required by adaptive DSP to achieve
It was pioneered by
- the endurance rot stopped with 3D flash
The slide to
worsening endurance ratings in raw flash memory seemingly paused and
improved during the transition from 2D to 3D due to the use of more expensive
materials and more charge being trapped in each cell and with higher capacity
coming from more planes of flash cells rather than a single plane of smaller
All nvms have endurance issues
- although some are more serious than others - compared to flash. For example
first generation 3DXpoint PCIe SSDs from Intel had similar or worse DWPD
ratings than best in class flash SSDs. Whereas other memory types such as MRAM
and FRAM have endurance which is orders of magnitude higher than flash -
although their data capacity per chip is currently orders of magnitude smaller
It seems likely that DWPD will remain a useful way to
select SSDs for storage. However the best way to characterize the reliability
(and performance) of memories in new tiered memory systems (DIMM wars and cloud
adapted memory) is a problem which is as far away from any commonly agreed
useful solutions as today's neatly ordered classifications and segmentations of
SSDs were 10 years ago.
The subject of
memory endurance and how that relates to the reliability of SSDs and tiered
memory is one which has provided much food for thought for millions of my
readers in past years.
But there have been lighter moments too.
combined a serious historic narrative with some attenpt at humor in the "naughty
flash" description in
for the enterprise.
Whereas my article
razzle dazzling flash SSD
cell care and retirement plans was intended to show just how ridiculous some
of the comparative endurance management claims in the SSD market had already
become in 2012.
popular SSD articles
SSD endurance myths
|We all know (or think we
know) that drinking a bottle of vodka every day might reduce your longevity
And maybe smoking 200 cigarettes a day should have the
same negative effect too.
But then we heard the story about that
Russian peasant who's been living in the mountains on a diet of vodka,
cigarettes and freeze dried goat - who sneaked down to the village to ask if it
was safe to move back.
Is Lenin dead yet? - he asked.
He had been asking that
very same question every Spring for over 90 years....
|razzle dazzling flash SSD
cell care and retirement |
|If you could go back in
time and take with you a factory full of modern memory chips and SSDs
(along with backwards compatible adapters) what real impact would that have?
|are we ready for
infinitely faster RAM?
|Choosing a slow interface
for a high capacity SSD is the route whereby one innovative enterprise SSD
maker was able to offer "no limits DWPD". |
the state of DWPD?|
|Enterprise DRAM has the
same latency now (or worse) than in 2000. The CPU-DRAM-HDD oligopoly
optimized DRAM for a different set of assumptions than we have today in the
post modern SSD era.|
reasons for fading out DRAM|
|Why can't SSD's true
believers agree upon a single coherent vision for the future of solid state
storage? (They never did.) |
|the SSD Heresies.|
|The memory chip count
ceiling around which the SSD controller IP is optimized - predetermines the
efficiency of achieving system-wide goals like cost, performance and
|size matters in
SSD controller architecture|