optimizing CPUs for use with SSDs in the Post Modernist Era of SSD and memory systems

click to visit StorageSearch.com home page


SSD news	..

leading the way to the new storage frontier

......

optimizing CPUs for use with SSDs

in the Post Modernist Era of SSD and Memory Systems

by Zsolt Kerekes, editor - StorageSearch.com - March 22, 2017

As you may have guessed I talk to a lot of companies which design SSDs, and SSD controllers. From time to time I also discuss the SSD market with people who design processors who are interested in how their processors can best interoperate with new memory intensive systems - either by supplying the CPU used in future controllers - or as the the CPU which runs the applications.

You may be surprised (given the number of articles I've written about SSDs in the past 10 to 15 years) to learn that my conversations with designers of enterprise processors long predates my conversations with SSD companies. That was because of my work in the SPARC systems market - especially during the 1990s - when the most important tool in advancing the data economy - from the hardware point of view - was the evolution of faster and affordable processors for use in the servers which were creating the infrastructure of the emerging dotcom economy.

The first time I suggested to a processor design team that they should look at adding support for solid state storage in their new CPUs instead of just adding more cores was about 2000.

I got the response at that time - what's an SSD? And nothing more came of the matter.

I had seen the interplay of ultrafast solid state storage and CPU architectures and software in my previous work in real-time systems companies. And I was surprised that these trade-offs were not more widely deployed in commercial products. But as we can see from SSD history in 2001 / 2002 - when the computer market was in recession - it didn't seem so important that general purpose server CPUs had already hit their clock speed limits, and that hard drives had peaked their RPM, and that DRAM systems would thereafter choose the route of greater capacity (and inevitably worsening latency).

2003 marked the start of the modern era of the SSD market - from which point onwards the only product which would displace an SSD from its role was another better SSD or SSD software.

I've been thinking recently that maybe 2015 / 16 / 17 (it's not clear which yet) will mark another critical phase in the SSD market - which I'm tentatively thinking about as the Post Modernist Era of SSD and Memory Systems.

What do I mean by that?

It's more than the continuation of SCM DIMM wars which hotted up in 2015. DIMM wars are just part of this long term memory systems trend.

The simplest way to capture that feeling is something I said in the Q4 2016 edition of the Top SSD Companies...

"At the end of 2016 the SSD market reached a dominant technology position where its products defined everything important in the external computing environment. But innovation hadn't stopped and the SSD market had already begun looking inwards towards the memory space and outwards towards capturing data from a wider internet of things."

You can feel the Post Modernist Era of SSD in the air everywhere.

Momentum has been building during the past 4 years with signals coming from the appearance of memory channel SSDs, talk of in-situ SSD processing, and much practical rethinking about RAM architecture.

And as I indicated in All Flash Arrays - what next? (January 2017) - I think the next foreseeable end point will be that storage becomes less relevant as a product and will instead become a supported legacy emulation concept within persistent memory systems.

I'm still going to call those new persistent memory boxes "rackmount SSDs" - however - because they will be packaged in racks and I've stretched my use of the word "SSD" to include all big memory systems in which the characteristics of the deployed resource are shaped by controller IP and software rather than by the raw cell characteristics of the memory chips.

This - for me - is simply a continuation of the SSD everywhere idea. And I don't think we need new words for it. "SSD" is good enough.

AFA? - what's that?

I was never a fan of the term "AFA" and always thought that the "All Flash Array" was a marketing rather than a technology idea. And soon the industry will have to forget AFAs as the term loses its novelty appeal to those who momentarily believed it had any.

Despite that - I don't think that replacing the term with words which include NV or MRAM or Optane or other branding nomenclature which hints at a truly blue-blooded NVRAM geneology will assist our understanding of what's inside the box - beyond the fact that it's newer than the AFA box which came before. So let's just broaden our acceptance of "SSD" and have done with it.

This takes me back to processors...

When I think about processors working with SSDs - I see ranges of possibility which include:-

CPUs in SSDs - as controllers

SSD support in SSDs - such as persistent memory

CPUs (or FPGAs) as functional support units inside memory arrays

and the good old SSD-CPU equivalence of closely coupled SSD and CPU software

Some of this thinking obviously leaks out in my scribbles too - because it leads to interesting conversations.

In January 2017 I was approached by a new startup which has its own CPU architecture and IP who asked me what I could share with them about ways in which their product could be optimized for use in the SSD market. I won't say any more about them here - because that's confidential. But the question prompted me to update and aggregate some of my views on this - which I did in the words you can see below - which I posted temporarily on the home page of StorageSearch.com - because I was busy with other articles at the time and had nowhere else to put it.

So my apologies if you've seen it before. And I hope that the new introduction helps to make it easier to understand.

optimizing CPUs for use in SSDs

The characteristics of CPUs used within COTS SSDs varies widely.

One direction of influence comes from the anticipated market.

And this is the aspect which is easiest to understand.

So that results in different preferences for enterprise SSDs which have high solo performance (such as Mangstor) compared to 2.5" SSDs deployed in arrays.

And power consumption can be a key factor in industrial SSDs.

Hyperstone's controllers which are optimized for low power consumption are very different in their choice of CPU and necessarily flash algorithms too because they can't depend on the type of RAM cache which makes enterprise endurance management code easier to design.

But unfortunately I think that analyzing what happened in the past in the SSD controller / SSD processor market isn't a reliable predictor for future controllers.

An influence which has been trickling down from lessons learned in the array market is the powerful system level benefit of passing intelligent control from a viewpoint which is outside the ken of the SSD controller located in the flash storage form factor.

And partly due to that applications awareness and likely to tear apart many controller business plans is the contradictory requirements between custom and standard products. (Described in more detail in my 2015 SSD ideas blog (SSD directional push-me pull-yous).

And - as always with the SSD market different companies can take very different approaches to how they pick the permutations which deliver their ideal SSD.

Added to that I think a new emerging factor in memory systems will be whether the CPUs are able to deliver applications level benefits by integrating nvm or persistent memory within the same die as the processor itself.

That's partly a process challenge but also a massive architectural and business gamble.

In the past it has been obvious that some SSDs incorporated nvm registers or other small persistent storage memory (apart from the external flash) to deliver power fail data integrity features which didn't need capacitor holdup.

What is less clear is the direction of travel with tiered memory on the CPU.

When it comes to chip space budget - is it worth trading cores to enable bigger (slower) persistent memory upstream of conventional cache?

This was already complicated when external memory was assumed to be only DRAM and being fed from HDD storage buckets. The new latency buckets of SSD storage and bigger tiered semiconductor main memory change the latency of the data bucket chain and the ability to perform in-situ memory processing may change CPU architecture too.

For some applications that might be a good trade. Better integration at lower latency with the CPU and the memory system. And merging of CPU and SSD functionality. But this would be a risky experiment for a component vendor who doesn't have a systems level marketing channel to sell the enhanced merged feature set.

The only clear thing is that whatever made a good SSD in the past will no longer be good enough in the future.

More flexibility will be key.

It's not just the CPU making the SSD work better. The SSD makes the CPU work better too.

SSD-CPU equivalence and SSD and memory systems equivalence aren't new ideas - but the scope for innovative improvement is still massive.

Afterword...

I felt it was the right time for me to start talking about the Post Modernistic SSD Era - because this framework will be among of the background assumptions in my upcoming article - the Survivors Guide to Memory (in the Post Modernistic SSD Era).

This way of thinking about things helps me. And I hope that if you've reached this far in this blog that parts of it may help your thinking too.

If you know others who might be interested - please send them along. The mice don't bite.

In an old blog which I still remember fondly (because its technology content is ageless due to its non appearance) - Can you tell me the best way to get to SSD Street? - I described how through the progession of many seeemingly random conversations about the SSD market somehow useful clarifications do eventually emerge

If you'd like to ask your own questions or contribute your own anecdotes or insights to any of these future articles - the easiest way is to contact me by email.

either by linkedin, or (much better)

via good old fashioned regular email. My address can be seen in the about the publisher page

...

...

SSD news

SSD Controllers and IP

can you trust SSD market data?

Surviving SSD sudden power loss

introducing Memory Defined Software

how fast can your SSD run backwards?

Why size matters in SSD design architecture

nand flash memory and other SSD related nvms too

latency reasons for fading out DRAM in the virtual memory mix

...

....

they all run the same software

SSD controllers

after AFAs - what's next?

some aspects of processors used in SSDs

some thoughts about the custom SSD business

a not so simple list of military SSD companies

does persistent memory pose new security risks?

controllernomics and risk reward with big "flash as RAM"

RAM disk emulation in your new "flash as RAM" solution?

where are we heading with memory intensive systems and software?

SCM DIMM wars in SSD servers is closely related to several multi-year technology themes in the SSD accelerator market.

From an SSD history perspective it can be viewed as being the successor to 2 earlier centers of focus in the modern era of SSDs.

1999 to 2007 - the dominant focus and center of gravity was FC SAN SSD accelerators.

2007 to 2014 - the dominant focus and center of gravity was PCIe SSD accelerators.

Since 2015 - the new dominant focus and center of gravity has been DIMM wars - which is revisiting long held assumptions such as:-

what is memory?

where's the best place for it to go?

and how much memory of each latency is best to have in each location?

the road to DIMM wars

...

UPMEM's Processing In-Memory technology is part of an evolving ecosystem and spectrum of in-situ SSD processing solutions which spans latency bands and memory types.

funding for RISC CPUs in DRAM (September 2017)

...

"Order of magnitude differences between commercial products are rare in computer architecture which may lead to the TPU becoming an archetype for domain-specific architectures...

Among the success factors of the TPU were the large matrix multiply (65,536 8 bit systolic MACs) and the substantial software controlled on chip memory (28MB)..."

In-Datacenter Performance Analysis of a 92 TOPS Tensor Processing Unit ASIC (pdf) - a paper by Developers at Google (June 26, 2017)

...

ReRAM in the CPU

Editor:- June 7, 2017 - These are some of the ideas which emerge from a slideshare - Rethink with ReRAM from Crossbar based on a presentation at the recent Memory+ Conference.

Standard memory busses are too slow to support the computational needs of new distributed (and always on) AI applications which leverage IoT.

The only way to improve ultimate "time to get answers" performance is to integrate storage on the same die as the processor.

ReRAM can be embedded in SoCs in any CMOS fab to deliver battery friendly latency under 5nS.

...

Tachyum says it will blow the cobwebs off Y2K fossilized CPU performance

Editor:- April 7, 2017 - I was fortunate enough to have had close relationships with technologists and marketers of high end server CPUs in the 1990s who explained to me in detail the peformance limitations of CPU clock speeds and memories which would prevent CPUs getting much faster beyond the year 2000 due to physics and the lost latency due to the coherency of signals when they left silicon and hit copper pads.

That was one of the triggers which made me reconsider the significance of the earlier CPU-SSD equivalence and acceleration work I had stumbled across in my work in the late 1980s and write about it in these pages when I explained (in 2003) why I thought the enterprise SSD market (which at that time was worth only tens of millions of dollars) had the potential to become a much bigger $10 billion market by looking at server replacement costs and acceleration as the user value proposition for market adoption and disregarding irrelevant concerns about cost per gigabyte.

I was surprised these equivalencies weren't more widely known. And that's why I recognized the significance of what the pioneers of SSD accelerators on the SAN were doing in the early 2000s.

It's taken 17 years - but the clearest ever expression of the CPU GHz problem and why server achitecture got stuck in that particular clock rut (for those of you who don't have the semiconductor background) appears in a recent press release from Tachyum which says (among other things)...

"The 10nm transistors in use today are much faster than the wires that connect them. But virtually all major processing chips were designed when just the opposite was true: transistors were very slow compared to the wires that connected them. That design philosophy is now baked into the industry and it is why PCs have been stuck at 3-4GHz for a decade with "incremental model year improvements" becoming the norm. Expecting processing chips designed for slow transistors and fast wires to still be a competitive design when the wires are slow and the transistors are fast, doesn't make sense."

The warm-up press release also says - "Tachyum is set to deliver increases of more than 10x in processing performance at fraction of the cost of any competing product. The company intends to release a major announcement within the next month or two." ...read the article

Symbolic IO reveals more

Editor:- February 25, 2017 - Symbolic IO is a company of interest which I listed in my blog - 4 shining companies showing the way ahead - but until this week they haven't revealed much publicly about their technology.

Now you can read details in a new blog - Symbolic IO reveals tech - written by Chris Mellor at the Register who saw a demo system at an event in London.

As previously reported a key feature of the technology is that data is coded into a compact form - effectively a series of instructions for how to create it - with operations using a large persistent memory (supercap protected RAM).

Among other things Chris reports that the demo system had 160GB of raw, effectively persistent memory capacity - which yielded with coding compression - an effective (usable) memory capacity of 1.79TB.

Security in the system is rooted in the fact that each system evolves its own set of replacement codes computed on the fly and held in persistent memory - without which the raw data is meaningless. A security sensor module located in a slot in the rack "the Eye" can erase the data relationships codes based on GPS and other boundary conditions being crossed (as in some fast purge SSDs). ...read the article

Editor's comments:- The data compaction and therefore CPU utilization claims do seem credible - although the gains are likely to be applications dependent.

Throughout the data computing industry smart people are going back to first principles and tackling the embedded problems of inefficiencies and lack of intelligence which are buried in the way that data is stored and moved. The scope for improvement in CPU and storage utilization was discussed in my 2013 article - meet Ken - and the enterprise SSD software event horizon.

The potential for improvement is everywhere - not just in pre SSD era systems. For example Radian is picking away at inefficiencies caused within regular flash SSDs themselves by stripping away the FTL. Tachyum is aiming to push the limits of processing with new silicon aimed at memory centric systems. For a bigger list of companies pushing away at datasystems limitations you'd have to read the SSD news archive for the past year or so.

But all new approaches have risks.

I think the particular risks with Symbolic IO's architecture are these:-

Unknown vulnerability to data corruption in the code tables.

Partly this would be like having an encrypted system in which the keys have been lost - but the effect of recovery would be multiplied by the fact that each raw piece of data has higher value (due to compacting).

Conventional systems leverage decades of experience of data healing knowhow (and data recovery).

We don't know enough about the internal resiliency architecture in Symbolic IO's design.

It's reasonable to assume that there is something there. But all companies can make mistakes as we saw in server architecture with Sun's cache memory problem and in storage architecture when Cisco discovered common mode failure vulnerabilities in WhipTail 's "high availability" flash arrays.

Difficult to quantify risk of "false positive" shutdowns from the security system.

This is a risk factor which I have written about in the context of the fast purge SSD market. Again this is a reliability architecture issue.

I expect that Symbolic will be saying much more about its reliability and data corruption sensitivities during the next few years. In any case - Symbolic's investment in its new data architecture will make us all rethink the bounds of what is possible from plain hardware.

Rambus and Xilinx partner on FPGA in DRAM array technology

Editor:- October 4, 2016 - Rambus today announced a license agreement with Xilinx that covers Rambus patented memory controller, SerDes and security technologies.

Rambus is also exploring the use of Xilinx FPGAs in its Smart Data Acceleration research program. The SDA - powered by an FPGA paired with 24 DIMMS - offers high DRAM memory densities and has potential uses as a CPU offload agent (in-situ memory computing).

Today's hot chip management technology looks like steam punk when viewed from the future.

the enterprise SSD story why's the plot so complicated?