|This article surveys
SSD power management data survival strategies. The risk of data corruption
from power cycling isn't a random, unforseeable event. It's a direct result of
choices made (or not made) when that SSD was designed |
|SSD is going down! - We're
going down! |
If you've ever watched the movie Black Hawk Down - there's a
memorable scene in which
Super 64 has its tail hit by an RPG and becomes the 2nd chopper to go
From that moment it's clear to viewers that whatever the pilot
does at the controls - '64 will hit the ground real soon.
brain of the SSD - a
nerve ending tugs to say - forget your other priorities pal - the power rail
is going down.
Is this the end of this promising young SSD's career?
Will data will get corrupted?
That depends on what happens next and
the skill of the SSD's designer. Did the designer understand the range of slew
rates this product could see? Did they test for burst brownouts in which the
power comes back up and then drops again as standby generators or batteries
kick in and get hammered by delayed power surges.
This article looks
at what happens inside various types of SSDs when the power goes down. This is
an area in which products differ a lot. I'll explain some of the architectural
parameters which constrain the freedom of SSD designers. You may think that
solving marketing driven constraints like
/ security and
are challenging enough. But the hardest part of an SSD design to get right is
deciding exactly what should happen in the short time remaining when the
power goes down - while the connected circuits can still respond to
The article will also help you understand another reason
why SSDs with apparently similar performance and datasheet specs behave
differently inside. And why it's risky redeploying an SSD you may have used in
one applications environment to another. For example why some SSDs designed
for notebooks are more likely to fail in rackmount arrays - even when their
have been managed well within flash memory limits. The power management
system is actually the one of the most important parts of the SSD which
only to the memory
management system). But many digital systems designers don't give it the
scrutiny it deserves. That's because most SSD designers have a background in
digital systems design - and they don't have the conceptual background to
imagine, model and control the range of less deterministic interactions
between components and data in the wild world of analog power spikes.
|Precisely how many
milli-seconds the SSD has got to perform shut down operations and the nature of
the tasks to be done depends on 3 main factors.
RAM cache flash
There are a wide range of commercially feasible
architectures which can be summarized as skinny, regular, fat
and hulk (a true RAM
More RAM makes it easier for the SSD perfomance architect to boost
true random IOPS performance. These are headline datasheet characteristics which
are leveraged to sell the SSD to its intended markets. But more RAM in the SSD
also means that more data is in a vulnerable volatile state when the SSD power
goes down. The designer has to calculate worst case conditions to guarantee
saving the state of critical data using the local in-board technologies - which
nowadays are nearly always flash.
||Assumptions about the power supply design and
availability of power hold up circuits.|
In most SSDs the nature of
the power hold up circuit (and the decision of whether there is one at all) is
the direct consequence of the RAM cache flash architecture. In most markets
(in early 2011) such as the enterprise server acceleration market, and
notebooks - the power hold up architecture is not regarded as a headline
datasheet characteristic which sells the SSD. Most end users - if they give this
characteristic any thought at all - will regard it as neutral or not a
significant decision factor in their vendor qualification process. In contrast -
in many parts of the embedded SSD market - particularly in military, industrial
and telecomms markets - the power hold up design will receive much closer
scrutiny - because customers in these markets know from long experience that
appropriate management of power cycling events is critical to operational
reliability and ROI.
||Assumptions about whether the SSD and the host
processors it serves live in the same power rail zone are also critical
factors. This isn't simply predetermined by whether the storage device has a DAS
or NAS type of interface. For correct behavior in a power down situation - the
designer has to make assumptions about the operational environment in which the
SSD is used. This is another characteristic which can be a subtle difference in
the datasheet of the SSD product - but which can make a big difference to
operational reliability. For example the operating assumptions for an orderly
power shutdown in a 2.5" SATA disk are very different whether that disk is
living in a notebook PC - or whether it is part of an array of disks in a
rackmount SAN. |
|As you can see from the notes
above there are many permutations which derive from these factors. For
simplicity I'm going to look at a handful of different situations below. These
will give you an idea about the interplay between these power down management
|this SSD is going down -
the RAM SSD and fat flash SSD scenario|
RAM SSDs exist in a wide
range of form factors including:-
3.5" SSD and
in RAM systems is difficult to read back after power goes down. (But not
impossible if you work for a security agency or forensic
data recovery company
with the right technologies and with a strong enough need to know the RAM
contents to counter the expense of reading most of it back.)
in normal operation - RAM has to be treated as volatile. Data will be
lost when the RAM is powered down. To counter this - early RAM SSDs included
significant battery backup / UPS's. In the early 2000s many RAM SSD designs
also started to include internal
hard drives. That
enabled designers to reduce the charge capacity and physical size of
batteries - which now only had to last long enough to ensure that data could be
written reliably to the on board HDD. In
Texas Memory Systems
became the first vendor to switch to using
flash SSD as
backup - and that has now become the norm in the industry because of faster save
and restore times (compared to HDD) and better reliability too.
custom and practise in RAM SSD design is that power rail hold up times
which used to be hundreds of hours have now dieted and shrunk their way
down to seconds. Designers in this market don't have to count the cost of
each extra millisecond in the same bean counting way as flash SSD designers
and can afford to be sure that data is safely put away. Designers still have
to ensure that the flash they back up to shuts down itself in an orderly way -
but that's a 2nd order timing problem. Batteries and supercaps are the tools
of the RAM SSD shutdown trade and can't be avoided.
|this SSD is going down -
the regular flash SSD scenario|
These flash SSDs have RAM caches
the contents of which take multiple write cycles (written to flash) to store
safely. This takes many milliseconds. These designs therefore include
internal supercaps or similar technologies to hold up the power rail long
enough for these processes to complete. Examples of these products include
GTR family and Oracle's
F20 PCIe SSD.
One of the disadvantages of this scheme is that the
extra RAM memory and supercaps add to the physical space, cost and
unreliability of the product (compared to skinny SSDs). But the advantage is
that the supercaps enable less rigid design rules in the SSD controller actions
to achieve high IOPS - without needing such meticulous micro managed internal
It is theoretically possible for this type of architecture to
skip the need for supercaps - if the RAM cache is implemented by non volatile
RAM. This isn't currently economic. Advocates of skinny SSD architecture might
also argue that the relaxed data handling rules enabled by regular and fat
RAM architectures mean it would be difficult to transition their designs to nv
rams regardless of nv ram costs - because designers still have to implement
write completion tagging mechanisms even when the time slots are shorter.
|this SSD is going down -
the skinny SSD scenario|
These flash SSDs either don't have external
RAM or the capacity used is unbelievably small. (I have been quoted numbers of
bytes - but can't disclose them.)
The key thing in these designs is
that the SSD does not have to save the state of the SSD from RAM into flash when
the power goes down. The operational state of the SSD is always in flash and
it's always valid.
This requires a level of attention to detail in the
data management processes done by the SSD controller which is dramatically
different to that of all other types of SSD. It stems from a philophically
different viewpoint in the design of SSDs which starts with the flash memory
media - and asks the question - what happens when the power goes down? and
extrapolates all the design rules back from that. Those rules of flash block
engagement mean there is very tight integration between the SSD controller
and the power management system. The PSU management system is not a "bolt-on"
or afterthought - as it appears to be in many other common flash SSD designs.
flash SSDs consequently are the most reliable SSDs in the power down scenario.
And they are more reliable than other types of SSD because they have less parts
like capacitors and external RAM which can go wrong. Examples of companies which
design these products are:-
(all) and WD
However, as you'll see in a later part of this article
- when you combine multiple skinny SSDs into arrays the design thought which
went inside the SSD doesn't absolve the array designer from analyzing and
solving the power down problems introduced by the array logic.
|this SSD is going down -
some other special situations|
PSU of the SSD same as that of host
Typical examples of this are
and PCIe SSDs.
the notebook environment it's tempting for the SSD designer to assume that as
this is a low IOPS environment, and the directly attached host will mostly have
initiated the shut down, and as there is a battery in the system - which enables
a controlled rate of shutdown - there is little need for the SSD power
management to be sophisticated - no matter what the internal SSD architecture.
And due to price sensitivity in this market the power management in notebook
SSDs is often minimal and crude but works 99% of the time. A risk factor for
the SSD's data integrity is if the notebook hangs while doing disk I/O and the
user decides to remove the battery pack to force a reset. However notebook SSDs
are the simplest case for the SSD power management designer - because most of
the time - even if it didn't exist - the SSD would survive in this environment.
In the server acceleration environment of the PCIe SSD - there's a
wide range of RAM cache archietctures covering the full spectrum of design
choices. Examples include RAM SSD (with flash backup) from
flash SSD (with supercaps) - from
Oracle, skinny SSDs in
an array - from OCZ, and
skinny SSDs where the controller action and shutdown is controlled directly by
the host - from Fusion-io.
The successful operation of these products depends on the characteristics of the
host server power supply, the operating system, and the host processor speed.
The SSD designer has to be sure that the shutdown process (if loaded on the host
processor) has a high enough priority to complete - given all the other things
that it's got to do.
That's why qualification of PCIe SSDs of this
type with one server box and OS doesn't automatically guarantee that the
same SSD will work in a different server box design - or even in the same server
box but running a different OS.
Another approach is to offload all the
performance and shutdown functions from the host and have a more powerful on
board controller. That's done in designs by
Texas Memory Systems
and Virident Systems.
And there are many other SSD products which fill the spectrum between the
As always the question is - do I have enough
milliseconds to complete currently committed write operations - and if I don't -
then have I set up a semaphore which says I started a write process on this
block - but I haven't said it finished. The power has come up therefore roll
back to the previously known good version of this block.
PSU of the
SSD is different to that of host servers
Typical examples of this
are rackmount SSDs - both on the SAN / NAS and DAS connected (usually via SAS).
this case an additional requirement for the SSD is to shut down in a way that
prioritizes the completion of current data I/O requests - if possible - and
doesn't leave the host hanging around for data that it's not going to get.
There may also be multiple hosts accessing the same SSD rack. In many of these
enterprise configuations the host(s) will switch to an alternative SSD and carry
on serving apps - so a tidy switch-over is desrirable. A rackmount SSD can be
implemented with any type of RAM cache architecture - but an additional
demand on the power management system comes from the fact that the SSDs are
located in an external system. Even if the SSDs are skinny flash and can shut
down fast with no data loss - the rack itself will typically have some kind of
additional logic which manages the array - ranging from the simplest case of
network to DAS routers and RAID controllers to more complex systems. These
devices and appropriate control of the memory and logic states within them
also come into the scope of the SSD power management system. Rack logic which
may have worked OK for hard drive arrays - may need to be redesigned (or power
down cushioned) to work properly with SSDs. That's because the SSDs may still
try to respond to write requests lower down the power rail voltage droop (and
later in time) than hard drives.
design of an SSD's power down management system is a fundamental characteristic
of the SSD which can determine its suitability and compatibility with user
operational environments. Systems integrators must take this into account when
qualifying SSDs in new applications - because subtle differences in OS timings,
rack power loading and rack logic affect some types of SSDs more than others.
Users should be aware that power management inside the SSD (a factor which
doesn't get much space in most product datasheets) is as important to reliable
operation as management of
cost and other
Power Failure Protection (pdf) is an application note - published January
2011 - by SMART. It describes the 3 most vulnerable SSD areas which
can get corrupted due to sudden power loss - and describes typical
architectures to prevent it.
SMART's view is that supercaps aren't
reliable enough for enterprise SSDs. "For every 10°C of ambient
operating temperature rise, the life expectancy of a supercapacitor can be cut
approximately in half." So instead they use NbO capacitors in an array.
These have MTBFs 100x
better than Al based supercaps, have little degradation of capacitance with
temperature and fail to open circuit (which is acceptable) and the array
guarantees there is sufficient capacitance remaining if this happens.
Power Failure Recovery (pdf)- published in January 2010 - by
Fortasa Memory Systems
describes various techniques which the company uses in its SSDs.
example Fortasa's controller makes a redundant copy of the FAT structure when
doing a block write which is retained until after the write has been verified -
to "practically eliminate any chance of FAT table corruption." -
Fortasa also specifies that system designers should provide approximately "5mS
of reserve power to their SSDs to complete the NAND max program time, control
signal propagation delay and queuing." That means the designer doesn't have
to guess or over-design the power hold-up.
Whitepaper (pdf) - published in May 2010 - by
the the design approach which the company has taken to minimize data
corruption in their E-Disk Altima SSD product range.
It includes good
systems analysis and block diagrams and this quote - "Power is a
fundamental need but it can also be the biggest threat to the reliability and
operation of any system."
Technology Power Failure Data Protection (pdf) - (data safety features ready
for unexpected power-loss) - published in June 2011 - by
architecture and circuits which mitigate this problem - and reveals a unique
feature in the company's SSDs which enables the health of the hold-up capacitors
to be tested and logged - a process which is inititated by a host command.
DataWrite Assurance (pdf) - is a white paper which outlines power loss data
protection in OCZ's
Here's a quote - "In the case an OCZ SSD's
primary power source drops below a predefined threshold, the SSD will
automatically not accept any new commands from the OS. The power loss backup
circuitry, a self-contained secondary power source, is then activated ensuring
that any in-flight data is safely transferred and stored in the NAND flash."
It also warns about un-named "enterprise" competitors who store vital
metadata in vulnerable host server RAM.
Transaction Point Settings - overview - by
Datalight - describes
how its flash file system uses the concept of 2 states - the working state
and the committed state - to preserve the integrity of old data when new data
is written - to ensure that data is always preserved and valid - even through
a power disruption.
the Perfect Memory (pdf) - this 2009 white paper from
AgigA Tech - described
the architecture of the company's pioneering flash backed RAM DIMMs. It also
included a historic tour of how various companies had encountered and solved
the contradictory demands of low latency random access memory with various data
integrity solutions designed to cope with randon power loss.
|about the author|
Zsolt Kerekes is
the editor of StorageSearch.com. I first started giving serious thought to the
issue of data corruption in user programmable memory modules when I was
designing intelligent analog I/O in the one of the programmable controller
design groups in Square D in 1980. Although most of my
career was digitally focused I also spent more than a year involved in
pure analog design - which involved research into new process control sensors
and inventing new measurement techniques - where I returned to the theme of
fully characterizing sensitive electronics products at any power slew rate and
any operating temperature. I returned again to the power disturbance theme many
times later - such as when designing wire speed disk capture systems for
national power grid testing and modelling.
And in my current job
I've been privileged in talking to many of the world's leading thinkers in the
world of SSD design and architecture.
All electronics products benefit
from good power management and EMI compatibility. Data is a very sensitive
thing if you don't take care in the design to protect it.
If you are
in the SSD industry and wish to add useful comments to this article - email me
Anonymous emails will be disregarded.
|suggested SSD articles|
flash backed DRAM DIMMs
how fast can your SSD
the naughty history
of flash in the enterprise
Adaptive R/W and
DSP ECC in flash SSD IP
Efficiency - making the
same SSD - with less chips
MLC flash lives 10x longer in
my SSD care program
how will Memory
Channel SSDs impact PCIe SSDs?
|What did you say happens
when we run out of gas?
|what happens in the SSD
when the power goes down?|
|Why should users care what happens
inside an SSD when the power goes down?|
The simple answer is - it can
make the difference between how much data in your SSD app is corrupted and
whether the SSD itself is usable when the power comes back up.
may be surprised to learn that the ideal state for an SSD is when it's in the
powered up state. That's when it's at its most reliable. A well designed SSD
will look after itself and its data when its powered up. And apart from phsyical
environmental stresses (like being cooled or fried or zapped by static) it
should mostly last for a predictable number of years.
it'scommon to talk about SSDs as being "non volatile" memory /
storage devices - because they don't lose their data contents when the power
goes down (unlike most RAM) - getting from one state to the other is a risky
experience for all the chips in an SSD. What makes the difference - is the skill
of the SSD designer in understanding the operational environment and making sure
that the process is always controlled and predictable.
instrinsically more vulnerable to data loss with a sudden loss of power than
The 3 main reasons are:-
- SSDs are mostly capable of higher R/W intensive activities than HDDs.
SSDs also include many more internal housekeeping functions. Therefore the
state of a typical SSD at any point in time is much more complex than than of a
typical HDD. In many SSDs the complexit of the internal data management
processes is more like that inside a traditional RAID system. With complexity
comes risk. There's more to go wrong - if the designer hasn't throught through
the issues and understood what needs to be done.
- There are many more distributed storage elements in an SSD than in a
hard drive . There can be thousands of flash chips in an SSD. All of these
have to be protected from spurious data writes as the logic system changes
from the powered up state - where operations are well defined - and the power
rail drops through voltage regions where the operation of each chip is
undefined. In contrast - in a typical hard drive only a handful of heads and
write amplifiers have to be controlled at the critical periods.
the power goes off the purpose of the power management system is to save the
state of the SSD (or enough of the state) to ensure that data integrity is
- The hard disk industry can draw on the experience of many years (and
billions of units in the field) in which hard drive architecture hasn't
changed much. So it's reasonable to assume that power management designs and
lessons can safely span to new product generations with tweaks rather than
In contrast - the SSD experience in current market conditions is
that most product architectures are changing and evolving significantly from
one product generation to the next. This means that lessons learned from one
generation of SSD power management systems does't necessarily guarantee
coverage for all the critical events in the next. And field experience for
most SSD vendors is more limited too. That means designers still haven't got
the visibility of all the bad things which can kill their SSDs.
|"By creating an automatic
failure testing framework, we subjected 15 SSDs from 5 different vendors to more
than 3,000 fault injection cycles in total. Surprisingly, we find that 13 out
of the 15 devices, including the supposedly enterprise-class devices, exhibit
failure behavior contrary to our expectations."|
the Robustness of SSDs under Power Fault (pdf) - February 2013|
|"We had an SSD 320 600GB
2.5" SATA drive in for evaluation from our Intel rep. I was able to kill it
in 2 or 3 hours by power cycling it."|
|... from Intel's SSD
community site - June 2011|
consumption in SandForce driven SSDs|
- SandForce started
shipping its 2nd generation SF-2200 processors optimized for SSDs deployed in
client computing applications. One of the oem customizable features in this
family was the ability to set a maximum power budget.
Product Marketing Director Kent
Smith gave me this outline of how it works.
describe our new power management feature, our SSD manufacturers have the option
to set a max power envelope for the drive such that the drive will maximize the
performance it can get within that power envelope. One of the key levers in that
feature is controlling how many simultaneously active die are used at
one time. This feature would be set at the factory, but not controllable in the
field. Therefore the power spec of the drive can be set at the factory. If the
SSD manufacturer chooses not to enable this feature, the drive will always
maximize the performance with the maximum fully active simultaneously active
|re different types of
Here's a comment from Woody Hutsell
- "From a reliability point of view, you can start to see some
architectures in systems that use distributed super-capacitors and others that
are using centralized redundant batteries. I tend to prefer the centralized
redundant backup power myself. This approach allows system designers to more
carefully provide redundancy."
|SSD caching software has
to be power crash aware too|
In June 2011 - I
asked Ted Sanford,
founder/CEO of - FlashSoft
- a leading company in the
software market - what
are the steps taken to protect the state of the cached data and update the
external storage in the case of sudden power loss?
He said - "FlashSoft
employs a method called multi-level metadata management, which stores some cache
metadata in RAM, but most of it on the SSD itself (and employs a balanced tree
design for optimal efficiency). There are two benefits to this design: first, it
minimizes utilization of server memory. Only the hottest metadata runs in server
memory. The rest is cached in SSD. Also, the application regularly creates
snapshots of the metadata on the SSD, so that in the event of a server crash,
the cache metadata can be re-created from the snapshots + most recent metadata
almost immediately. Typical recovery is less than a second. (Keep in mind, our
team's background is at Veritas, Oracle, Symantec, etc. so
data recovery is a top
priority for the product design.)"
|cold boot times for
In March 2011 I spoke to
about their new K2 (a rackmount RAM SSD which internally uses battery backed RAM
and hard disk backup).
He said he had read my recent article about
SSD power down management (which you're reading now). So while we were on that
subject I asked...
How long does it take to rebuild data onto a new
blade's hard drive?, and
How long does it take to boot up a new K2
from cold assuming a flat / failed battery hsystem?
Dani Golan said
a couple of minutes for a single blade's HDD rebuild and about 20 minutes
for a cold boot from a battery failed systems, resepectively. He said he
thought the latter would be very rare event.
|the mysterious case of the
silent lightning strike|
the Source of a Power Surge - is an interesting article by
LWG Consulting which
discusses the differences in the damage to data storage systems caused by
natural and artificial causes such as lightning surges, power grid switching
faults and equipment failures.
This type of forensic detection is where
Sherlock Holmes partners with a
data recovery version
of Dr Watson in civil and criminal legal cases. Most of the experience in this
market relates to hard drives - but SSDs will come under the scrutiny of the
magnifying glass in greater numbers too. Maybe even weaknesses in SSD
scenarios of flash data vulnerability at power voltage collapse|
|A blog by Virtium -
Against Data Corruption - (May 2015) outlines their thinking about
protecting data integrity in
from power loss events.|
The author - Tony Pond, Director of
Marcomms - identifies 3 scenarios for data corruption:-
1 - Power
fail during a write, but before the SSD has acknowledged receipt of data.
2 - Power fail after the SSD acknowledges that it has data but before data has
been committed to NAND flash.
3 - Power fail after the SSD has data in NAND but before it has been
committed to the correct logical block address (LBA).
How does the company design around these exigencies? ...read the article
| hold up
capacitors in 2.5" military SSDs|
to be or not to be?
to three seconds are 2 numbers which demonstrate some of the
extreme diversity in
SSD design. My examples here being the hold up times inside 2 current
models of 2.5" SATA SSDs designed for the
- One from Microsemi
(HQ in Aliso Viejo, CA, USA).
I've touched on this kind of architectural design
difference many times before in earlier articles. But every time I revisit this
vast topic with fresh examples - I learn something new.
- And the other is from Solidata (HQ in
InnoDisk survives abnormal power events|
|Editor:- September 18, 2013 - Adding to the
growing body of
SSD data integrity in the event of sudden power loss - InnoDisk today
a new SSD white paper (pdf)
which outlines how its Power Secure Technology copes with abnormal power
failure - including inadvertent disengagement of a live drive.|
|A key assumption in InnoDisk's design is that
some data corruption is inevitable at the point when power is interrupted -
despite the best efforts of the hold up capacitors etc - because other parts of
the system - outside this power protected zone are also disturbed. So their
algorithms - on power up - begin by looking for such errors and data
inconsistencies and proceed to clean up and rebuild the mapping tables. ...read the article (pdf)|
|now where was I - before I
was so rudely interrupted?|
|Editor:- February 21, 2013 -
WD has recently
published a new white paper -
Art of SSD Power Fail Protection (pdf)|
If you've read up on the
subject of Surviving
SSD sudden power loss you may already be aware that the WD team has been
working on this theme for over 9 years - and even promoted educational
whitepapers on this subject
using banner ads in
In 2004 I was told that getting the SSD
to work reliably even when the SSD is subject to unexpected rapid power rail
disturbances was one of the starting points of the original SiliconDrive
designers - due to one of the founders having had a bad experience with an
earlier prototype flash drive failing such a test at an oem presentation while
at another company.
So what can WD tell us about this subject that's
Well - without mentioning names - there have been many examples
of other SSD companies who have got this factor wrong - and some of the reasons
why simplistic power protection schemes fail are mentioned in this paper.
key to validating a reliable SSD design is testing:- with variable types of
applied power line disruptions which are applied at any time in the SSD
software. WD aren't going to reveal all their hard won patented design
secrets in this white paper - but you can learn a lot from it which may help
you better evaluate other products too. ...read
the article (pdf)