symmetries in SSD design - what they are and why you need to know


leading the way to the new storage frontier	.....


what's RAM really?	..


sizing SSD design	..


SSD controllers	..

the SSD Heresies


SSD history	..

..........

how fast can your SSD run backwards?

11 Key Symmetries in SSD design - what they are and why you need to know

by Zsolt Kerekes, editor - StorageSearch.com - April 20, 2012

Many of the important and sometimes mysterious behavioral aspects of SSDs which predetermine their application limitations and usable market roles can only be understood when you look at how well the designer has dealt with managing the symmetries and asymmetries which are implicit in the underlying technologies which are contained within the SSD.

Whether the designer consciously realizes that they are making a design decision or not isn't the pertinent issue. The customer - who wants to use that particular SSD in their applications environment - has to operate within the boundaries set by those architectural symmetry limits.

Some symmetries are intuitively obvious - and have long been part of the filtering process in specifying SSD shortlists.

Other symmetries are not so obvious or commonsensical - but when you think about them in the context of - why does this SSD work better than another? - the symmetry architecture is often the simplest explanation behind complex operational characteristics of SSDs which would otherwise appear to be mysterious.

There are many different ways to design an SSD to suit the purposes of exactly the same market. And due to the different starting points which SSD designers have in their initial IP strengths and weaknesses - customers will see a bewildering range of design techniques which when blended together in different combinations create usable SSDs. There's much genuine disagreement about the best way to design SSDs and where to put them in the apps environment.

This article isn't about those differences. This article instead describes the key symmetries which can be used to comparatively describe or evaluate any type of SSD using any memory technology and any type of interface.

In an ideal world - symmetry considerations would be on page one of the - how to design an SSD cookbook.

The fact that I've only written this article after more than 10 years writing about SSDs and more than 20 years thinking about them - shows that the need for a symmetry based view of SSD design has only become apparent after reading about and mentally evaluating thousands of actual SSDs in all types of markets and being dissatisfied with the understanding I could convey to my readers by using other ways to describe aspects of SSD design.

Another reason for this late introduction is that some of the symmetry models are actually abstractions of design concepts which didn't exist in the market or didn't have jargon to describe them until recently. And my reason for talking about these symmetries is to provide a practical way for readers to filter through the chaotic range of product offerings which they will see in the real world - rather than inventing hypothetical problems for them to worry about.

Although I may add more symmetries to my list later - the key symmetries in SSD design that I will discuss in this article are as follows.

R/W symmetry
power up/down symmetry
scalability symmetry
fault symmetry
age symmetry
sequential order symmetry
application type symmetry
roadmap symmetry
environmental symmetry
adaptive intelligence flow symmetry
security symmetry

Each of these is defined and discussed in a separate box and linked articles below.

SSD design symmetry is a big subject - whose scope will reach into every important aspect of SSD architecture and use.

Therefore this article isn't the last word on the subject. Instead it's my initial shot at launching a series of articles which may make everything else I've ever written about SSDs seem just like the prologue.

R/W symmetry

What are the symmetries in read/write behavior?

This includes such elements as throughput, latency and related characteristics like R/W IOPS.

This is the best known of all SSD symmetries - which is why I place it at the top of my list - to get you all comfortable.

Classical enterprise storage such as hard drives and RAM SSDs had truly symmetrical behavior which was a result of the underlying technologies.

In contrast - nand flash (which includes most SSDs on the market today - such as SLC, eMLC, MLC and TLC) have important R/W asymmetries which impact SSD operational behavior.

SSD designers work hard to disguise the strong implicit asymmetries (which are R/W life cycles and R/W latency) by many different design techniques.

You can read more in these classic articles:- SSD myths - endurance, sugaring MLC for the enterprise and the problem with write IOPS in flash SSDs.

....

power up/down symmetry

How long does it take for the SSD to power down safely?

And compare this to...

How long does it take for the SSD to power up and be ready?

Why worry? - Because SSDs which have bad asymmetries in this dimension prevent you from doing things which you'd like to do.

Here are some examples

ruin your day - an enterprise SSD which powers down correctly - but takes 40 minutes (or 6 hours) to power up and be ready - due to having to reload / rebuild internal data.

roadblock to new markets - if SSDs could power up as fast as they power down - they would enable new markets.

These subjects are discussed in my articles:- Surviving SSD sudden power loss, this way to the petabyte SSD, the Case for Fast Boot SSDs

....

sequential order symmetry

How sensitive is the performance of the SSD to factors like this

the up/down direction of R/W requests into address space?

out of order requests to the same (or contiguously proximate) adjacent address space?

....

scalability symmetry

How well does the overall SSD storage system perform (operationally and competitively) if any one of the following are changed?

size of the data blocks in R/W operations

more SSDs - for example installed in the same server

10x more memory chips vs 10x less memory chips

100x more capacity

many more examples are possible

Block size related performance asymmetries are well known in the SSD world. Some SSDs are much better than others at handling different sizes of data requests and also managing mixed size requests sequentially. Companies which handle this well have different marketing phrases to describe this - such as "spike-free performance" or "predictable performance" etc.

But it's all too easy for users to accidentally walk into scalability symmetry traps - which relate to the number of SSDs in the system.

They try and buy an SSD for their server. It works great. They buy a few more - still doing OK.

A bit later down the road they may find that they can't fill all their server or rack slots- because of PCIe server load, or electrical power generating too much heat. So the original SSD wasn't as scalable as they thought.

Or maybe there are limits in the managing SSD software which mean that they can't manage it above a certain capacity limit? Or that the speedup benefits decline after adding more than a few more units?

A related article is:- Why size matters in SSD architecture

....

fault symmetry

What is the performance of the SSD with zero faults? And how does this compare to the performance with recoverable faults of various degrees from light (such as bad blocks or recoverable ECC) - upto and including the failure of major subsystems.

There is usually a performance penalty which creeps out of the woodwork when the accrued level of survivable faults reaches a critical mass.

Does that mean performance drops off to 70% of its normal level? Or can it get down below 10%?

Both. These are real numbers for real enterprise products in the market today.

Related articles are:- SSD reliability and HA/ FT enterprise SSDs.

Age symmetry can interact with other symmetries.

For example:- if an SSD powers down with a RAID-like fault - this may adversely impact the power/up down symmetry - depending on how good the rebuild / hot-spares design is.

....

age symmetry

How does the SSD performance change relative to the time it has been running

in this system?
with this workload?
out of the system?

You already know about degradation due to wear-out effects and classical MTBF. Although it's good to remember that wear-out depends on the specific type of nvm and workload and doesn't occur at all with RAM SSDs.

But another really important age symmetry is related to caching effects.

When an SSD (or the data set) is new - there's no data in the caches and no knowledge about what data is hot and what data is not. Read performance is worst. Write performance may be best. (For many reasons such as high availability of pre-erased blocks - a notorious factor in early flash SSD benchmarks.)

In all SSD caches and even more so in auto tiering / SSD ASAPs the age with the data has a big impact on performance.

You may be thinking microseconds or milliseconds - but actually the performance can change (3x, 5x or 20x) when observed over time periods from tens of minutes to hours or days - depending on how well the caching system understands the behavior of the workload.

Finally in this category of SSD symmetries - you may be wondering what do I mean by - Age out of the system?

That means the behavior of the SSD after spending some time in an unpowered state. Suppose you preload data onto an SSD (it may be an OS with apps for a notebook - or a backup data set - or the control codes for a cruise missile).

If you go back after a few days, a few months, 6 months or 2 years - you will see very large differences in the data integrity on that SSD. SSDs which have high data integrity in the always or mostly powered state may be terrible if they are left for a long time unpowered. For flash SSDs - the cause is the difference in remanence between SLC and various flavors of MLC. These weaknesses can be resolved in the powered up state by healing processes - but when the power has been off for a long time - the intrinsic defects can be significant.

Age - in or out of the system - is also a factor in some RAM SSDs.

Age - out of the system - symmetries are also linked to power up / down symmetry - discussed elsewhere in this article.

....

application type symmetry

How sensitive is the SSD's performance relative to different types of applications which you may throw at it?

Maybe you've tested a new SSD for a critical project. You were happy with it. Then you bought some more for what you considered to be a very similar role. But the SSD performance wasn't as good as you expected.

You didn't know that but you now suspect that something about the SSD makes it work better on some workloads or in some racks than others. These apps asymmetries can be

interface type - such as FC-SAN vs iSCSI

apps type such as OS type, database vs web vs email vs VDI vs video on demand

The cause is usually related to caching assumptions, write amplification and other factors inside the SSD controller having been tweaked and optimized for one specific type of popular benchmark to make the SSD look good for marketing purposes - without adequate retesting to see how valid those assumptions were for a diverse range of apps.

Experienced buyers know that the indication of compatibility on a vendor's web site is no guarantee to its actual availability or quality (if it exists). This is especially true when it comes to how well the SSD will operate with different operating systems.

Some SSD solutions only work well with a single OS. Others work well for one OS and are mediocre or poor with others.

Now if you only ever think you're going to use one OS and one major apps type forever you may think that this type of symmetry isn't important for you. But think about what happens if the sole OS supplier drops or changes features in their product which were heavily relied on by the SSD designer who only supports that OS. An SSD with good apps symmetry - with multiple OS support - will not be so impacted - because their design has better internal symmetries.

....

roadmap symmetry

You like this SSD it passes all your tests. Then a year or so later the follow on models from the same company just don't look competitive at all. You have to start your vendor compatibility testing all over again.

In SSD market history there are countless examples of companies whose products outshone all the others at one point in time. Then a few years later they become almost irrelevant. They are what is known as - one trick ponies.

When choosing an SSD supplier you need to make a judgement about how well their SSD IP and skills will hold up over time when challenged by factors like:-

changing memory types and process geometries

over reliance on roadmap success in a critical part of the design which they don't control such as an externally sourced the embedded microprocessor or host interface

Like a VC - you're trying to judge how well will the company adapt to changes in the market over time. So you have to look at their track record - or if they're a new company - ask yourself questions like

do they have a credible plan for a 10x faster or 10x cheaper product?

do they have a plan which works in the memory generation 2-3 years done the road?

....

environmental symmetry

How well does the SSD perform when the physical environment is changed?

Performance and operation can be sensitive to a variety of environmental stress factors such as:-

temperature
vibration
humidity
RFI EMC
ionizing radiation
altitude

These are standard considerations in the selection of industrial SSDs and military SSDs.

You may think you don't need to worry about it in the controlled environment of a server data center. But it can impact you there too - if you make the wrong assumptions.

For example - in the past some vendors attached peltier effect heat sinks to the fastest CPU chips to freeze them to get ultimately fast performance. Should you freeze or super cool your SSD?

That depends on the memory technology. It may actually make it slower. If there's a range of temperatures where your flash SSD runs faster or more reliably - should you ask your vendors what it is?

....

adaptive intelligence flow symmetry

How adaptive is the SSD behavior to changes in itself?

Also - how bidirectional (or multi-directional) is the ability of the SSD - when it learns one attribute of the internal SSD internal state or data - at passing that knowledge or inference to another part of the SSD which can use that knowledge to optimize another aspect of performance or reliability?

All SSDs rely on processing data about the quality of the memory as part of their normal data integrity operations. They wouldn't work without it.

But some companies have SSD IP sets in which knowledge about different parts of the SSD can be optimized and fed back to control and enhance SSD functionality over and beyond the standard accepted SSD function block boundaries.

Here are some examples:-

PCIe SSD market - Fusion-io

All the intelligence for managing the flash is handled by the same software stack which talks to applications. The up/down flow of intelligence about the FTL and the application data is really just a different view seen by the same host processor. The ability to have data access for apps at the same latency level as raw flash management creates many opportunities to optimize system level behavior. ...read more

SSD controller IP market - DensBits

Intelligence flow symmetry is an essential requirement of adaptive R/W DSP flash care. The IP core from DensBits is a good example of what can be done when knowledge gleaned from one view is fed into controlling the actions of another.

rackmount SSDs market - Skyera

Every piece of knowledge about the memory or software state within a petabyte scale enterprise SSD rack which might be leveraged to improve reliability, efficiency or performance seems to be analyzed and leveraged within the design of Skyera's rackmount SSDs.

industrial SSD market - InnoDisk

InnoDisk's FlexiRemap - deployed in many of InnoDisk's COTS SSDs - leverages many years of insights into the interactions of the FTL with standard software. This has led to a design approach within the firmware in which the responsibility for flash management is partitioned between the CPU in the controller and the host CPU within the driver stack. These 2 levels collaborate. The lower ones passing up data about raw conditions and the upper levels passing down commands to trigger certain actions inside the SSD.

security symmetry

How easy is it to set up and enforce data security in the SSD?

How hard is it to defeat that security by using SSD recovery techniques?

What's the performance overhead of applying different levels of security?

For related articles see:- fast purge SSDs and storage security.

How important are SSD symmetries?

In the modern era of the SSD market - users have shown an admirable willingness to grapple with understanding many difficult concepts related to SSD design because they know that increasing their education and knowledge about SSDs is their safest defense in a fast changing disruptive market which is still experimenting with revolutionary new SSD designs which intrinsically go beyond the limits of proven reliability and safe design rules as part of how they advance the SSD envelope (unlike 40 years of evolutionary processor chip and RAM design).

Driving this quest for deep SSD knowledge are 2 factors.

One is the tacit belief that SSDs can have magnifying effects on enterprise performance and cost. So users can't afford to ignore SSDs.

But the other side to the SSD knowledge quest is fear of getting it wrong.

Users know from what they read on the web they can't blindly trust SSD suppliers to get things right. Too many SSD vendors clearly don't understand the SSD apps environment or SSD technology as well as they should. Many SSD vendors have come into the SSD market because it's a fast growing bubble with few established market leaders and declining barriers to entry.

I'd like to think that in the future - the symmetry view of SSD architecture - introduced in this article - will become one of the ways that serious people discuss SSDs.

And as I hinted before - "eleven" (the launch number of this key SSD symmetries article) isn't carved in stone. I'll be adding more symmetries and more linked articles later.

thanks for reading this - Zsolt Kerekes, editor

SSD jargon
SSD market history
The enterprise flash story... could it have been simplified?
SSD enterprise futures? - the bumpy road towards consolidation

The simple idea - that one new SSD thing can replace one old SSD thing - is rarely as simple as the advocates of the new thing say.

PCIe SSDs versus memory channel SSDs

....

"I still can't see your reflection!"
- shouted Megabyte to Terrorbyte
... from a safe distance.

....

I think "SSD symmetry" is the next "endurance" - something that many SSD articles will be talking about

....

....

"There is an asymmetry in the flashs read/write/erase operations, where read and write are in the unit of page size, usually 8KB, and erase is in the unit of block size, usually 2MB, i.e., 256 times larger.

To reclaim a block, the FTL must copy all of the valid data in a block to an erased block before the erase operation can be carried out, resulting in amplified write operations.

It is well known that amplified writes, though necessary with the asymmetry between write and erase units, can significantly degrade the devices performance and demand a substantial portion of the devices raw space to accommodate out-of place writes to achieve acceptable performance, as well as reduce the life of the flash by the amplification factor."

Baidu's SDF: Software-Defined Flash for Web-Scale Internet Storage Systems (pdf) (March 2014)

Editor's comments:- Baidu found that by modifying standard SSDs to be compatible with its workload optimized Software-Defined Flash SDF - which changes some of the management methods in the controller - the result was 2x the usable flash capacity and 3x the I/O bandwidth.

90% of the enterprise SSD companies which you know have no good reasons to survive

In one of the most highly read articles on StorageSearch.com in recent years - I looked at drivers, mechanisms and routes towards consolidation in the enterprise SSD systems market along with some other outrageous and dangerous ideas. The conclusion?

"90% of the enterprise SSD companies which you know have no good reasons to survive."

Before publication - I discussed these ideas with various readers for about 3 months and since publication you won't be surprised when I tell you it has been at the core of many conversations since. ...read the article

It's easy to see where you've gone wrong.

You won't need a petabyte of SSD to replace a petabyte of HDD.

With the new SSD software and efficient SSD server architectures I estimate 1PB of raw SSD will do the same job as 50PB of raw HDD.

And the replacement ratios will slash away at the installed flash base too.

This has already been happening in web scale installations - denting vendors' revenue expectaions.

meet Ken - and the enterprise SSD software event horizon

....

....

Cache jitter and latencies are more than simply performance quality issues - they can be the root of security vulnerabilities too.

side-channel attack breaches Amazon's cloud server security walls (Oct 2015)

....

....

Kaminario recommends you read this article

Editor:- June 15, 2012 - I discovered that Gareth Taube, VP of Marketing at Kaminario published a new blog in which he recommends my article about SSD Symmetries.

Gareth says "Flexibility, such as being able to integrate multiple memory technologies into a single box (like Kaminario's K2-H), is going to be increasingly important to customers who want efficiency and customization options. This is especially true because there are many memory innovations coming on the near horizon." ...read Gareth's blog

Editor's comments:- when I was writing the symmetry article one of the things I had in mind to do was to put more examples in it. Then I realized that having lots of examples would simply make the article unreadable.

One of the examples I was going to use for good roadmap symmetry (but then forgot to put anywhere) was in fact Kaminario - because they can leverage off whatever Fusion-io - or later SanDisk - does with flash memory) and furthermore Kaminario can also leverage off whatever server makers do with CPUs and RAM.

Roadmap symmetry is a long term consideration - important for big users who don't like supplier churn and important for VCs and investors too.

...Later:- the kind of roadmap symmetry which I referred to above - is also the idea behind vendors in the software defined storage market - for example Maxta, Nutanix, etc.

They don't have to worry about the details of flash trends - because their software and architecture leverages commodity enterprise SSDs - thereby avoiding having to make correct long range guesses in the enterprise SSD box riddle game.

....

....

big idea #2

There's no single best place to locate all the IO and management intelligence of a big SSD.

What were the big SSD ideas of 2015?

....

....

After more than 23 years of publishing SSD guides I thought it not unreasonable to consider this...

can we look forward to stability in SSD architecture and a slowdown in memory market disruptivity?

are we there yet? - the long view in mid 2017