to be or not to be? hold up capacitor extremes in 2.5" MIL SSDs

click to visit StorageSearch.com home page


..	military SSDs

leading the way to the new storage frontier

To be? or Not to be?

hold up capacitors in 2.5" MIL SSDs

zero to three seconds - aspects of extreme diversity in SSD design

by Zsolt Kerekes, editor - StorageSearch.com - March 23, 2015

An ever present tension in SSD designs has always been - that making one thing better - can result in something else inevitably getting worse.

In a past article - 11 SSD design symmetries - I showed why stretching an SSD specification in one direction to optimize behavior for one beloved application can render the same SSD much less suitable for other starring roles.

That's one reason why there can never be such a thing as the "perfect SSD".

And if anyone tries to tell you otherwise - you know that you understand this a lot better than they do.

I always find it interesting to think about the extreme cases of SSD design (and boundary conditions in markets) because you can learn a lot about what to expect to see in the market - by pushing parameters to their limits.

You can do this in your imagination. It's rarely affordable (or advisable) to test all these extreme limits in the lab (or in your own systems).

Luckily the market provides us with many examples we can learn from.

Power hold up time in 0 to 3 seconds

This note - the 0 to 3 seconds part - is about the range of power hold up times I've seen in the market - in the context of rugged / military 2.5" SATA SSDs.

Most (but not all) flash SSDs use internal capacitors to hold up the power island of the flash array and controller to enable the SSD it to save its state and stash data and metadata securely in the event of sudden power loss.

Generally - the bigger the hold up capacitance, and the longer the hold up time - the less likely it is that power line disturbances will corrupt data.

The firmware in the SSD controller, the type of flash used, and the size and technology make up of internal cached data - factor together go into determining how long that minimum hold up time needs to be. And hold up isn't the only concern - line noise can be too. For more see the papers linked in this artice.

the 3 seconds case

An extremely long example of hold up time is the Rana - a 2.5" rugged SSD from Solidata - which has the longest hold up time I'm aware of in a device this size and with this type of capacity, ruggedness and performance.

The hold up time is about 3 seconds!

But such a long hold up time - due to a big capacitor (3F) will also have an impact on what happens at power up as well - as the effect will be to elongate the power up ready time too - and require control of the charging current.

So I asked Solidata about that aspect of the design.

Solidata said their Rana SSD includes a protection circuit to avoid the current surges and it will take 2 to 4 seconds for the power on to ready state.

That's similar to the power on ready time for hard drives - and in most applications will be OK.

On the other hand - if you have a military application which needs very fast cold boot - then this is not the SSD for you.

Another question which arises in capacitive hold up systems - is what are the failure modes of the capacitor?

The range of important failure modes are :-

fail to open circuit - in which case the failed capacitor no longer provides the hold up protection - (which in some designs is mitigated by a parallel array of caps).

fail to short circuit - which requires current limiting protection

fail either way

Some vendors choose caps which - due to their internal layout and materials - are known to fail in a particular mode - that is open circuit - which is the easiest mode to design around.

I didn't ask Solidata about this aspect of their design - but there's nothing to stop you asking them - if you need to know.

the 0 seconds case

This is the type of design in which there are no hold up capacitors - or batteries - assumed in the design of the data integrity system. (And any capacitance in or around the SSD is purely for EMI compatibility - and not for power fail protection.)

An example of this is the TOR from Microsemi - which I wrote about when the product was launched in March 2011. (In 2016 the product line and business unit was acquired by Mercury Systems.)

When I interviewed Jack Bogdanski, Director of New Product Development in 2011 he didn't want to discuss the exact way they'd solved this design problem - but my guess (based on my experience and judgement) is that the SSD probably uses a combination of 2 (or more) things

a small amount of raw fast write nvm (distinct from flash) for the write activity metadata, and

a guardian angel state machine which filters all R/W activities to check that internal transcations which have started have been known to complete.

With the initial assumption always being saved as being incomplete - unless proved otherwise.

And if the activities haven't completed - rolling back to the last raw saved data fragements and resuming the rebuild in flash - until it's known to be good.

As you can imagine - you have to ensure that all data in flight is controlled and monitored as well.

It can be done - if you have control of all the firmware inside the SSD. And invest a lot of work.

How much work?

Microsemi told me (in 2011) a team of 5 people had worked for nearly 2 years on the SSD power management.

And - I suspect they haven't stopped working on it - as they've added more security features in recent years - each of which carry their own data integrity burdens.

The only downside in this design approach (aside from the need to create the design IP and patents) is that a true zero capacitance hold up time compatible design won't give you the highest data throughput - unless you scale up the alternative nvm - which you can't do in a very small footprint.

And another thing is that Microsemi's design is skinny - which means the ration of it nvm cache registers to flash capacity is close to zero.

Later (in 2017):- Applications notes by Everspin described how its ST-MRAM has been used in some SSD designs to protect data in flight and remove the need for large capacitors. "The use of ST-MRAM enables improving the power fail window from 100mS in the case of NAND Flash to less than 10µS." This demonstrates the principles involved in using a small NVRAM alongside a flash arrays.

In a solo SSD context the less components you start with the more reliable the design. But there can be different considerations in an FT SSD array. For more about reliability modeling problems see FITs - data architecture and flaws in component based SSD failure analysis

summary

Even within the narrow space of military oriented 2.5" SATA SSDs - you can find a wide range of differences in design approaches in nearly every aspect of the design.

I've used the power hold up architecture as my example today. But there are other apects I could have chosen.

In any single project - it's extremely unlikely that you'd be looking at the 2 SSDs above for the exact same application - because of other factors such as:- power consumption, IOPS performance, location of suppliers, and longevity of supply - all of which would take precedence.

But I had noted these differences in my own reading before (which are directly due to differences in the RAM cache flash architecture) and I was reminded if them today by an email from a reader who is designing power loss protection for a another market and a different form factor.

For more about this subject see my article and related resources in - Surviving SSD sudden power loss

...Later:- prompted by my many questions, Clark Yu, R&D Engineer - Solidata provided more details about the thinking in the Rana military SSD in this article - Integral Power Loss Protection in Solidata's Rana series military SSD. The key point here is that the design uses "an industrial grade 3F capacitor."

SSD news
hybrid DIMMs
the Fastest SSDs
what happened in SSD year 2015? / 2016?
Fast purge and autonomous data destruct flash SSDs
data integrity challenges in flash SSD design and emerging nvms
why was it so hard to compile a simple list of military SSD companies?

When the SSD socket fits - but the datasheet doesn't.

BOM control and the mythical "standard" industrial SSD

.....

image shows Megabyte's lighter than air storage balloon - image for SSD PSU is going down article

What did you say happens
when we run out of gas?

Surviving SSD sudden power loss

"We run 500 power on / off cycles on our industrial SSDs as part of the production process to ensure 100% security in case of any unexpected power fail. Most SSD manufacturers dont do this at all or test much fewer cycles."

RecaData in their blog - Curious about how our industrial grade SSDs are produced and tested in the factory? (July 28, 2016)

Designers of military and secure industrial systems for whom SLC is the only flash memory good enough - but who also needed higher capacities in their 2.5" SATA slots have - until recently - had little choice but to consider SSDs with significant internal capacitor holdup for their toughest designs. And that, in turn means a complex qualification process and really getting to know the internal ad hoc internal details of SSD architectures and related firmware which might well change considerably over the lifetime of their projects.

new MIL SSD for those who loathe supercaps (July 16, 2015)

"No matter how much UPS you have...
power fails during writing to a page in NAND
are still possible."

Tony Pond, Virtium in his blog - which outlines the 3 main scenarios of flash data vulnerability at the instant power voltage collapse (May 28, 2015)

In 2011 SMART said they didn't think supercaps were reliable enough for enterprise SSDs because...

"For every 10 degrees C of ambient operating temperature rise, the life expectancy of a supercapacitor can be cut approximately in half."

So instead they used NbO capacitors in an array in their XceedIOPS2.

SSD news story - April 2011

Viking's DDR-3 flash back DRAM DIMM - the ArxCis-NV - relied for power fail on an optional external 25F supercap pack.

SSD news story - October 2011

how are big hold up caps used?

trading dc-dc converter architectures to suit different capacitor types in SSDs

In a 2012 brochure which describes its integrated power fail chips - Power Failure Protection for SSDs (pdf) - Lattice Semiconductor illustrates the traditional placement and role of big capacitors in enterprise SSD hold up schemes.

how healthy are the hold up capacitors?

Unigen's EnduraCharge™ Technology Power Failure Data Protection Scheme (pdf) includes a circuit which enables the host processor to test the health status of the SSD hold-up capacitors - and log the results - while it's in the deployed system.

"Power fail protection is a differentiator for embedded SSDs, and many vendors tout solutions. However, developing effective power fail protection is as much an art as it is a science, and is not a trivial endeavor."

The above quote comes from a 2013 paper - The Art of SSD Power Fail Protection (pdf) - by WD.

Among other things it provides the justification and marketing support framework for a technology called PowerArmor - which WD acquired when it bought SiliconSystems.

SiliconSystems was the first SSD company to invest in user education and branding about surviving sudden SSD power loss - and ways of testing such schemes in system designs.

You can judge how much importance they attached to the awareness of these reliability issues by the fact that they ran expensive banner ads here on StorageSearch.com from 2005 to 2012 - to promote their reliability whitepapers.

You can still see these old ads in my article - the cultivation and nurturing of "reliability" in a 2.5" SSD brand

The willingness to offer customization and professional design engineering support opens doors to valuable customers who are leaders in their own vertical markets but whose unit volumes are too small to be of interest to high volume standard SSD vendors.

some thoughts about SSD customization

Do the power up / down tradeoffs in other types of non volatile memories provide a better applications fit?

It's tempting to think that the grass is greener with nvms which have faster write cycles and therefore shorter power holdup requirements.

But aside from density constraints there are other systems problems which they introduce.

ECC architectures can't simply be migrated from the DRAM and nand/nor flash experience into newer emerging nvms. They have their own problems.

The scale of these difficulties with soft error rates were discussed in an SSD news story in February 2017 - Soft-Error Mitigation for PCM and STT-RAM.

To be? or Not to be? hold up capacitors in 2.5" MIL SSDs

To be? or Not to be?

hold up capacitors in 2.5" MIL SSDs