|
|
leading the way to the
new storage frontier | |
. |
zero to three seconds - aspects
of
extreme diversity in
SSD design |
by
Zsolt Kerekes,
editor - StorageSearch.com
- March 23, 2015
An ever present tension in SSD designs has
always been - that making one thing better - can result in something else
inevitably getting worse.
In a past article -
11 SSD design
symmetries - I showed why stretching an SSD specification in one direction
to optimize behavior for one beloved application can render the same SSD much
less suitable for other starring roles.
That's one reason why there
can never be such a thing as the "perfect SSD".
And if
anyone tries to tell you otherwise - you know that you understand this a lot
better than they do.
I always find it interesting to think about the
extreme cases of SSD design (and
boundary
conditions in markets) because you can learn a lot about what to expect to
see in the market - by pushing parameters to their limits.
You can do
this in your imagination. It's rarely affordable (or advisable) to test all
these extreme limits in the lab (or in your own systems).
Luckily the
market provides us with many examples we can learn from.
Power hold
up time in 0 to 3 seconds
This note - the 0 to 3 seconds part
- is about the range of power hold up times I've seen in the market - in the
context of rugged / military 2.5" SATA SSDs.
Most (but not all)
flash SSDs use internal capacitors to hold up the power island of the flash
array and controller to enable the SSD it to save its state and stash data
and metadata securely in the event of sudden power loss.
Generally -
the bigger the hold up capacitance, and the longer the hold up time - the less
likely it is that power line disturbances will corrupt data.
The
firmware in the
SSD controller, the
type of flash used, and
the size and technology make up of internal
cached data -
factor together go into determining how long that minimum hold up time needs
to be. And hold up isn't the only concern - line noise can be too. For more
see the papers linked in
this artice.
the 3 seconds case
An extremely long example of hold
up time is the
Rana - a 2.5" rugged
SSD from
Solidata - which has
the longest hold up time I'm aware of in a device this size and with this type
of capacity, ruggedness and performance.
The
hold up time is about 3 seconds!
But such a long hold up time - due to
a big capacitor (3F) will also have an impact on what happens at power up as
well - as the effect will be to elongate the power up ready time too - and
require control of the charging current.
So I asked Solidata about
that aspect of the design.
Solidata said their Rana SSD includes a protection circuit to avoid
the current surges and it will take 2 to 4 seconds for the power on to ready
state.
That's similar to the power on ready time for hard drives -
and in most applications will be OK.
On the other hand - if you have
a military application which needs very fast cold boot - then this is not the
SSD for you.
Another question which arises in capacitive hold up
systems - is what are the failure modes of the capacitor?
The range of
important failure modes are :-
- fail to open circuit - in which case the failed capacitor no longer
provides the hold up protection - (which in some designs is mitigated by a
parallel array of caps).
- fail to short circuit - which requires current limiting protection
Some vendors choose caps which - due to their
internal layout and materials - are known to fail in a particular mode - that is
open circuit - which is the easiest mode to design around.
I didn't ask
Solidata about this aspect of their design - but there's nothing to stop you
asking them - if you need to know.
the 0 seconds case
This
is the type of design in which there are no hold up capacitors - or batteries -
assumed in the design of the data integrity system. (And any capacitance in or
around the SSD is purely for EMI compatibility - and not for power fail
protection.)
An example of this is the
TOR
from
Microsemi - which I
wrote about when the product was launched in
March 2011. (In
2016 the product line and business unit was acquired by
Mercury Systems.)
When I
interviewed Jack Bogdanski, Director of New Product Development in 2011
he didn't want to discuss the exact way they'd solved this design problem - but
my guess (based on my
experience and
judgement) is that the SSD probably uses a combination of 2 (or more) things
- a small amount of raw fast write
nvm (distinct from
flash) for the write activity metadata, and
- a guardian angel state machine which filters all R/W activities to check
that internal transcations which have started have been known to complete.
With the initial assumption always being saved as being incomplete -
unless proved otherwise.
And if the activities haven't completed -
rolling back to the last raw saved data fragements and resuming the rebuild
in flash - until it's known to be good. As you can imagine - you have
to ensure that all data in flight is controlled and monitored as well.
It
can be done - if you have control of all the firmware inside the SSD. And
invest a lot of work.
How much work?
Microsemi told me (in
2011) a team of 5 people had worked for nearly 2 years on the SSD power
management.
And - I suspect they haven't stopped working on it - as
they've added more security features in recent years - each of which carry their
own data integrity burdens.
The only downside in this design approach
(aside from the need to create the design IP and patents) is that a true zero
capacitance hold up time compatible design won't give you the highest data
throughput - unless you scale up the alternative nvm - which you can't do in a
very small footprint.
And another thing is that Microsemi's design is
skinny - which
means the ration of it nvm cache registers to flash capacity is close to zero.
Later (in 2017):-
Applications notes by
Everspin
described how its ST-MRAM has been used in some SSD designs to protect
data in flight and remove the need for large capacitors. "The use of
ST-MRAM enables improving the power fail window from 100mS in the case of NAND
Flash to less than 10µS." This demonstrates the principles involved in
using a small NVRAM alongside a flash arrays.
In a solo SSD
context the less components you start with the more reliable the design. But
there can be different considerations in an
FT SSD array.
For more about reliability modeling problems see
FITs - data architecture and
flaws in component based SSD failure analysis
summary
Even
within the narrow space of military oriented 2.5" SATA SSDs - you can find
a wide range of differences in design approaches in nearly every aspect of the
design.
I've used the power hold up architecture as my example today.
But there are other apects I could have chosen.
In any single
project - it's extremely unlikely that you'd be looking at the 2 SSDs above
for the exact same application - because of other factors such as:- power
consumption, IOPS performance, location of suppliers, and longevity of
supply - all of which would take precedence.
But I had noted these
differences in my own reading before (which are directly due to differences in
the RAM cache
flash architecture) and I was reminded if them today by an email from a
reader who is designing power loss protection for a another market and a
different form factor.
For more about this subject see my article and
related resources in -
Surviving SSD
sudden power loss
...Later:- prompted by my many
questions, Clark Yu, R&D Engineer - Solidata provided
more details about the thinking in the
Rana military SSD in this
article -
Integral
Power Loss Protection in Solidata's Rana series military SSD. The key
point here is that the design uses "an industrial grade 3F capacitor." | | |
. |
|
. |
SSD
news hybrid
DIMMs the
Fastest SSDs what happened in SSD
year 2015? /
2016? Fast purge and autonomous
data destruct flash SSDs data integrity
challenges in flash SSD design and emerging nvms why
was it so hard to compile a simple list of military SSD companies?
|
. |
| |
..... |
|
. |
|
. |
|
. |
Designers of military and
secure industrial systems for whom SLC is the only flash memory good enough -
but who also needed higher capacities in their 2.5" SATA slots have - until
recently - had little choice but to consider SSDs with significant internal
capacitor holdup for their toughest designs. And that, in turn means a complex
qualification process and really getting to know the internal ad hoc internal
details of SSD architectures and related firmware which might well change
considerably over the lifetime of their projects.
|
new MIL SSD for those
who loathe supercaps (July 16, 2015) | | |
. |
|
. |
In 2011 SMART said they
didn't think supercaps were reliable enough for enterprise SSDs because...
"For
every 10 degrees C of ambient operating temperature rise, the life expectancy of
a supercapacitor can be cut approximately in half."
So instead
they used NbO capacitors in an array in their XceedIOPS2.
|
SSD news story - April
2011 | | |
. |
Viking's DDR-3 flash back
DRAM DIMM - the ArxCis-NV - relied for power fail on an optional external 25F
supercap pack.
|
SSD news story - October
2011 | | |
. |
|
. |
|
. |
|
. |
"Power fail protection
is a differentiator for embedded SSDs, and many vendors tout solutions. However,
developing effective power fail protection is as much an art as it is a science,
and is not a trivial endeavor."
|
The above quote comes from a 2013 paper -
The
Art of SSD Power Fail Protection (pdf) - by
WD.
Among
other things it provides the justification and marketing support framework for
a technology called
PowerArmor
- which WD acquired when it bought
SiliconSystems.
SiliconSystems
was the first SSD company to invest in
user education
and branding about
surviving
sudden SSD power loss - and ways of testing such schemes in system
designs.
You can judge how much importance they attached to the
awareness of these reliability issues by the fact that they ran expensive
banner ads here on StorageSearch.com from 2005 to 2012 - to promote their
reliability whitepapers.
You can still see these old ads in my article
- the cultivation
and nurturing of "reliability" in a 2.5" SSD brand | | |
. |
The willingness to offer
customization and professional design engineering support opens doors to
valuable customers who are leaders in their own vertical markets but whose unit
volumes are too small to be of interest to high volume standard SSD vendors. |
some thoughts about SSD
customization | | |
. |
Do the power up / down
tradeoffs in other types of non volatile memories provide a better applications
fit?
It's tempting to think that the grass is greener with nvms which
have faster write cycles and therefore shorter power holdup requirements.
But
aside from density constraints there are other systems problems which they
introduce.
ECC architectures can't simply be migrated from the DRAM and
nand/nor flash experience into newer emerging nvms. They have their own
problems.
The scale of these difficulties with soft error rates were
discussed in an SSD
news story in February 2017 - Soft-Error Mitigation for PCM and STT-RAM. | | |