this is the home page of
leading the way to the new storage frontier .....
SSD news since 1998
SSD news ..
RAM image - click for RAM directory, articles and news
new RAM news ..
the fastest SSDs - click to read article
the fastest SSDs ..
SSD SoCs controllers
SSD controllers ..
image shows Megabye the mouse reading scroll - click to see the top 30 solid state drive articles
more SSD articles ...


is that even a real word?

and risk reward ratios with big memory "flash as RAM"

by Zsolt Kerekes, editor - - February 16, 2017
A recent conversation I had with Kevin Wagner at Diablo Technologies began with talking about the recent benchmarks they have been sharing related to their Memory1 (128GB flash as RAM DIMMs) when running large scale analytics software. But it finished somewhere unexpected.

I'll start with the benchmarks.

Kevin said that some of the results (for SPARK SQL performance) came from a real financial customer who had run these tests themselves using data from their own production environments.

They were able to achieve a 4 to 1 server reduction using Memory1 enhanced servers (3x M1 nodes versus 12 384GB DRAM only nodes) and still get 24% faster performance.
real customer benchmark Memory1 2017
Diablo also has other customer based benchmarks which show useful acceleration when using identical numbers of server nodes and comparing the results to alternative SSD implementations (NVMe and arrays of SATA SSDs).
real customer benchmark Memory1 2017

risk reward ratios

As in the early phases of flash array market 10 years ago - you have to filter yourself in or out of following up interest in this depending on whether you think you have the right type of problem.

The risk reward factors of using this DIMM based flash as RAM system like Memory1 is that users with big memory apps will be able to choose whether they prefer the idea of getting faster results (using a similar number of servers) or using less servers to save costs (or some combination).

But not all jobs will run faster.

See k-core benchmarks on Inspur servers (pdf) and sidebar article for details.

Small jobs which would have fitted into the DRAM comfortably anyway could run upto 30% slower in a Memory 1 server cluster. Although big jobs do run faster.

Users who are evaluating this new tiered memory approach can buy preconfigured supported servers from a variety of sources and Diablo says that no changes are required to the OS or applications.

Compared to many of the alternative emerging new semiconductor memory approaches - flash (as RAM) seems like it will be the mainstream safe choice for the next 2 or 3 years - because it will take that long for the newcomers to prove their reliability and even after that - we have the issue of software support for the tiering and caching.


There will of course be alternative competing implementations of "flash as RAM in the DIMM form factor". (Which is not the only form factor for this concept - but I'm trying to keep this article as short as I can.)

Companies like Xitore and Netlist have been saying they want to get into the "flash as RAM in the DIMM form factor" market for a while now.

I haven't seen details from these expected competitors from my guess is that - unlike Diablo's product - which leverages the DRAM which is already in other DIMM sockets in the same motherboard - that some of the later contestants in this market will take the approach of placing everything needed to provide transparent emulation and caching into a single DIMM.

That alternative approach might work better for smaller scale embedded systems which don't have a lot of DIMMs - but creates difficult design constraints - because the "all in a single DIMM" approach means there will be less flexibility about RAM flash cache ratios. The real RAM will be fixed into the design. (Unlike Diablo's solution where the ratio of flash to DRAM DIMMs is fluid.)

That lack of flexibility is why I predicted that the hybrid storage drive market would never succeed in the enterprise and so far no one has been silly enough to admit to stuffing JBODs with 2.5" hybrid flash-HDDs and instead the hybrid storage appliance market picked and chose components from a wide range of best of breed flash modules.

But I think the "all in a single DIMM approach to implement flash as a RAM tier will succeed as a viable market too. (In addition to the Memory1 approach.) Personally I think the all in one DIMM solutions will work better in small capacity memory systems but be less upwardly scalable for large capacity servers.

The product definitions will involve some very complex segmentation and application analysis.

I expect that the "flash as RAM in a DIMM form factor" market will fragment into:-
  • applications which only need a single such DIMM on each server and
  • the other segment will be those applications which tend to use the maximum number of DIMM slots.
the interesting thing? - dataflow controllernomics

Let's pause for some perspective...

What's data? - Now hold on - that's too philosophical. Encode data a different way... to make it work better - now you're talking engineering. But that's a discussion for another time.

Right here we don't care what the data means.

It just comes and goes.

And it's surprising how far or how little it may have traveled.

From the cloud? Another storage device? Maybe it was computed just now from an earlier matching of data. Sometimes the data arrives in a rush only to sadly discover that it's not needed after all. There's a lot of data shuffling happening around the world. Most of it isn't even for you.

It's when that data (or lack of it) is the next thing which the software is going to look at - that the economics of having data in the right or wrong place suddenly becomes very serious. Because if we have to wait too long to get the data then we may need a faster processor (or more processors) to get the next thing done.

You may like to think that data lives in cables, or in storage media or flying around on electromagnetic waves. But from a memory systems perspective the time when data really comes alive is when it's in our memory locality.

We care that data comes when we need it.

And if we haven't got it in our live place (the memory) then we really care about and want to know where it lives. (And not just addresses in memory spaces - but locations in between the memory spaces - in transit.)

Even better if we can tell the data where to live. And if its comings and goings heed our calls.

(Sadly other controllers too have a say in this matter. And even when they think they're trying to be helpful their understanding is based on past customs of politeness.)

back to my conversation with Diablo

Our conversation took an interesting diversion when Kevin Wagner said something about the techniques Diablo uses in the management of its data caching.

We had discussed its DMX software in depth before and I wrote something about it last summer.

The new point which I latched onto is that Diablo has used machine learning to not only get a better understanding of the applications it commonly works with - but also to reverse engineer and understand the behavior of some of the external controllers which it encounters - in particular memory controllers.

That enables DMX to sometimes predict the best way it should request and deliver new data.

The behavior of controllers is a very important factor in the modern digital economy.

I've touched on this aspect in the past as you can see in past stories in the SSD controllers page and my article about controller and caching impacts on DRAM latency.

big datasystems controllernomics?

Analyzing how to get optimal performance from tiered memory, tiered storage etc which will be at the center of future focus for much of our industry - especially in emerging fields like in-situ (SSD / memory) processing, fast elements and software.

Although latencies for raw data media and communications and interfaces have been well understood and managed in their own ways for some time. The science of how to manage large populations of different types of controllers in different localities is fragmented with differing purposes.

Every controller company has its own IP which does the best it can with the things it connects to and can control.

What is becoming more important - when you are in the memory zone - right in close with the RAM and processor - is getting a better understanding of the connected controllers in your space. Because application performance in the data world is limited by the complex interactions of controller-controller speak (from the cloud right down into each processor DRAM cache request ) to a much greater extent than ever before.

When storage was slower and memories were smaller and the software was older - all the controller designs looked good in comparison to the other devices surrounding them. Now with faster storage, bigger memories and modern apps software controllernomics has become the limiting factor.

So it's not how fast the intrinsic memory cells or blocks work... you never get that physical - because media controllers sit between you and noisy physics. (Remember the "memory modem" from DensBits - that encapsulated the problem brilliantly).

And if you are that media controller - speed (from the software's point of view - isn't just how well you and the host interface get along together.) And it's not just how fast your application's CPU works either - because other CPUs and other tasks are competing in the same data highways.

Datasystems controllernomics is like figuring out traffic patterns - some of which you can anticipate (the effects of predicted snow, or the rush hour) but most of which you just have to react to as best you can (a big truck took the wrong turn). And mixing up the two things at the same time. And BTW - each time you call it wrong - you contribute to the next controllernomics snafu.

So you might ask... hey why doesn't some software manage all this? And what about the role of operating systems?

Let's look at the OS first. If you've read any histories of computing you'll know that in the dotcom era (which was the last grand ball era when server CPUs, DRAM and hard drives all knew their place and were equally respected because they had all grown a little bit faster and fatter together up to that that). Chronologically that's upto about 1999-2000 - if you prefer a date. Well upto that time - the OSes took many of their responsibilities seriously.

After that we got into the causes of the great war (I mean the modern era of SSDs). I already dealt with most of the decline, fall and abandonment of the OS (in a useful memory systems context) in my 2012 article - where are we now with SSD software? (And how did we get into this mess?).

Rather than repeat that analysis here - and to be fair to the OS companies their traditional systems partners didn't know what was happening either. But in any case the OS companies had other distractions - like trying to be the next search engine destination, the next phone platform or trying to hang on while pesky open source OS startups were giving enterprise OSes away free to whoever could download them quickly enough.

Anyway that's how the critical software for SSDs got to be written by SSD companies themselves - because for a long time - no-one else was going to do it.

This brings us to the present day. And the SSD market has grown large enough to merit its own conferences, standards etc - which is how we got new form factors like M.2 and new software like NVMe. So the OS companies and the hypervisor companies are more than happy enough to gatecrash the SSD party . But...

And this is a big but... They have no real incentive to improve performance to the next level. And as their business models depend on remaining as hardware agnostic as possible - they have every reason to avoid tying themselves too closely to any quick changing deep piece of single sourced semiconductor trickery. And - even if that wasn't so - the enterprise OS companies have business models which depend on supporting hardware platforms which are already shipping in high volume - and not in creating new platforms.

Give them a problem like tiered memory - which can be solved with a purely software solution and yeah they'll support it eventually (or buy little software companies who can show them how to do it).

But give them a problem where the little pieces involve nanosecond hardware support in semiconductors and where the analysis comes from learning what they themselves have been doing wrong for years - and you can see why the OS companies are not where the best solutions are going to come from.

Diablo got into big datasystems controllernomics (that's my term for it - not theirs) because they spent a lot of time analyzing problems from a particular angle (in the memory close to the processors). And they discovered that even after you've understood the stacks and the apps and the architecture there's still another factor of modeling and predicting which it's worth getting to know - but only if you can do something about it.

And once you've done that - and are comfortably working in the memory and storage and controller-controller alternate universe - then just as Google found with search - you're in a better vantage point to learn more and stay ahead. And if you do - and occupy enough server boxes - then you might become the controller behavior which others in the controllernomics universe have to reverse analyze and understand.

And although this started out as a "flash as RAM" problem - the solution methodology isn't tied to flash.

Interesting times ahead.

SSD ad - click for more info

Storage Class Memory - one idea many different approaches; software upcycles flash endurance, NVMe and NVDIMM variations...
what were the big SSD ideas which emerged in 2016?

storage search banner
latency matters
Latency? - the devil is in the detail.

"latency" - mentions on the mouse site
controllernomics - joins the memory latency to do list

As predicted 8 years ago - the widespread adoption of SSDs signed the death warrant for hardware RAID controllers.

Sleight of hand tricks which seemed impressive enough to make hard drive arrays (RAID) seem fast in the 1980s - when viewed in slow motion from an impatient SSD perspective - were just too inelegant and painfully slow to be of much use in true new dynasty SSD designs.

The confidence of "SSDs everywhere" means that the data processing market is marching swiftly on - without much pause for reflection - towards memory centric technologies. And many old ideas which seemed to make sense in 1990s architecture are failing new tests of questioning sanity.

For example - is DRAM the fastest main memory?

No - not when the capacity needed doesn't fit into a small enough space.

When the first "flash as RAM" solutions appeared in PCIe SSDs - in 2010 - their scope of interest was software compatibility. Now we have them emerging as DIMMS in the memory channel.

This is a context where software compatibility and memory latency aren't the only concerns. It's understanding the interference effects of all those other pesky controllers in the memory space.

That was one of the interesting things which emerged in a recent conversation I had with Diablo Technologies about their Memory1.

See also:- how significant is Diablo's Memory1 for the enterprise data ecosystem? (August 13, 2015)

Your old style "AFA storage" is simply one of several software selectable emulation options in a future memory system - just as hard drives and RAID were in the years before the modern era of SSDs.
After AFAs? - what's next


"It all starts with memory and storage...

Programmable chips in the flow of data between server and data farm have the potential to meet some of the more rigorous requirements and solve some of the more vexing questions."
Ravi Thummarukudy, CEO - Mobiveil in his article - the Memory Evolution Starts at the Data Center (June 2017)

"We are at a junction point where we have to evolve the architecture of the last 20-30 years. We can't design for a workload so huge and diverse. It's not clear what part of it runs on any one machine. How do you know what to optimize? Past benchmarks are completely irrelevant."
Kushagra Vaid, Distinguished Engineer, Azure Infrastructure - quoted in a blog by Rambus - Designing new memory tiers for the data center (February 21, 2017)

What we've got now is a new SSD market melting pot in which all performance related storage is made from memories and the dividing line between storage and memory is also more fluid than before.
where are we heading with memory intensive systems?

Memory1 beats DRAM in big data multi box analytics
Editor:- February 7, 2017 - The tangible benefits of using flash as RAM in the DIMM form factor are illustrated in a new benchmark Apache Spark Graph Performance with Memory1 (pdf) - published today by Inspur Systems (the largest server manufacturer in China) in collaboration with Diablo Technologies.

The memory intensive tests were run on the same cluster of five servers (Inspur NF5180M4, two Intel Xeon CPU E5-2683 v3 processors, 28 cores each, 256GB DRAM, 1TB NVME drive).

The servers were first configured to use only the installed DRAM to process multiple datasets. Next, the cluster was set up to run the tests on the same datasets with 2TB of Memory1 per server.

The k-core algorithm (which is typically used to analyze large amounts of data to detect cross-connectivity patterns and relationships) was run in an Apache Spark environment to analyze three graph datasets of varying sizes upto a 516GB set of 300 million vertices with 30 billion edges.

Completion times for the smallest sets were comparable. However, the medium-sized sets using Memory1 completed twice as fast as the traditional DRAM configuration (156 minutes versus 306 minutes). On the large sets, the Memory1 servers completed the job in 290 minutes, while the DRAM servers were unable to complete due to lack of memory space.

Editor's comments:- As has been noted in previously published research by others - being able to have more RAM emulation flash memory in a single server box can (in big data computing) give similar or better results than implementing the server set with more processors and more DRAM in more boxes.

This is due to the traffic controller and fabric latencies between server boxes which can negate most of the intrinsic benefits of the faster raw memory chips - if they are physically located in another box.

The key takeaway message from this benchmark is that a single Memory1 enhanced server can perform the same workload as 2 to 3 non NVDIMM enhanced servers when the size of the working data set is the limiting factor.

More useful however (as you will always find an ideal benchmark which is a good fit to the hardware) is that the Memory1 system places lower (3x lower) caching demands on the next level up in the storage system (in this case the attached NVMe SSDs). This provides a higher headroom of scalability before the SSDs themselves become the next critical bottleneck.

In their datasheet about Memory1 enhanced servers Inspur give another example of the advantages of this approach - quoting a 3 to 1 reduction in server footprint and faster job completion for a 500GB SORT.

the road to DIMM wars
are you ready to rethink RAM?
DRAM's indeterminate latencies

is data remanence in NVDIMMs a new risk factor?
maybe the risk was already there before with DRAM

DRAM's reputation for speed is like the old story about the 15K hard drives (more of the same is not always quickest nor best)
latency loving reasons for fading out DRAM

It's by no means inevitable that the biggest memory companies will also go on to become the biggest SSD companies. That's like expecting Exxon to be the biggest car maker.
boom bust cycles in memory markets

Why would any sane SSD company in recent years change its business plan from industrial flash controllers to HPC flash arrays?
a winter's tale of SSD market influences

In SSD land - rules are made to be broken.
7 tips to survive and thrive in enterprise SSD

There's a genuine characterization problem for the SCM industry which is:- what are the most useful metrics to judge tiered memory systems by?
is it realistic to talk about memory IOPS?

Many of the important and sometimes mysterious behavioral aspects of SSDs which predetermine their application limitations and usable market roles can only be understood when you look at how well the designer has dealt with managing the symmetries and asymmetries which are implicit in the underlying technologies which are contained within the SSD.
how fast can your SSD run backwards?

The enterprise SSD story...

why's the plot so complicated?

and was there ever a missed opportunity in the past to simplify it?
the elusive golden age of enterprise SSDs

SSD ad - click for more info

I'm just saying (as I have been saying since 2003 in my article on SSD-CPU equivalence) why I think the TAM for server based SSDs is a percentage of the server market - and almost entirely decoupled from the cost of storage capacity on the SAN.
meet Ken and the enterprise SSD software event horizon


Compared to EMC...
ours is better

can you take these AFA companies seriously?

Now we're seeing new trends in pricing flash arrays which don't even pretend that you can analyze and predict the benefits using technical models.
Exiting the Astrological Age of Enterprise SSD Pricing

90% of the enterprise SSD companies which you know have no good reasons to survive.
market consolidation - why? how? when?

With hundreds of patents already pending in this topic there's a high probability that the SSD vendor won't give you the details. It's enough to get the general idea.
Adaptive flash R/W and DSP ECC IP in SSDs

Why buy SSDs?
6 user value propositions for buying SSDs

"Play it again Sam - as time goes by..."
the Problem with Write IOPS - in flash SSDs

If you spend a lot of your time analyzing the performance characteristics and limitations of flash SSDs - this article will help you to easily predict the characteristics of any new SSDs you encounter - by leveraging the knowledge you already have.
flash SSD performance characteristics and limitations

This is something which I forgot to ask Diablo in February 2017.

Do you have a supported RAM Disk emulation for Memory1?

And - if so - how do the benchmark numbers look? (compared to a similar quantity of flash - or maybe even the same physical devices) when they are configured as native flash SSDs?
a stupid question about RAMdisk emulation in flash DIMMs

The memory chip count ceiling around which the SSD controller IP is optimized - predetermines the efficiency of achieving system-wide goals like cost, performance and reliability.
size matters in SSD controller architecture