click to visit home page
leading the way to the new storage frontier
after AFAs - what's the next box?
a winter's tale of SSD market influences
Capacitor hold up times in 2.5" military SSDs
where are we heading with memory intensive systems?
controllernomics and risk reward with big memory "flash as RAM"
optimizing CPUs for use with SSDs in the 3rd Era of SSD Systems

"We are at a junction point where we have to evolve the architecture of the last 20-30 years. We can't design for a workload so huge and diverse. It's not clear what part of it runs on any one machine. How do you know what to optimize? Past benchmarks are completely irrelevant.
Kushagra Vaid, Distinguished Engineer, Azure Infrastructure - quoted in a blog by Rambus - Designing new memory tiers for the data center (February 21, 2017)
SSD ad - click for more info
Soft-Error Mitigation for PCM and STT-RAM
Editor:- February 21, 2017 - There's a vast body of knowledge about data integrity issues in nand flash memories. The underlying problems and fixes have been one of the underpinnings of SSD controller design. But what about newer emerging nvms such as PCM and STT-RAM?

You know that memories are real when you can read hard data about what goes wrong - because physics detests a perfect storage device.

A new paper - a Survey of Soft-Error Mitigation Techniques for Non-Volatile Memories (pdf) - by Sparsh Mittal, Assistant Professor at Indian Institute of Technology Hyderabad - describes the nature of soft error problems in these new memory types and shows why system level architectures will be needed to make them usable. Among other things:-
  • scrubbing in MLC PCM would be required in almost every cycle to keep the error rate at an acceptable level
  • read disturbance errors are expected to become the most severe bottleneck in STT-RAM scaling and performance
MRAM and PCM data integrity issues click to read the article (pdf)
He concludes:- "Given the energy inefficiency of conventional memories and the reliability issues of NVMs, it is likely that future systems will use a hybrid memory design to bring the best of NVMs and conventional memories together. For example, in an SRAM-STT-RAM hybrid cache, read-intensive blocks can be migrated to SRAM to avoid RDEs in STT-RAM, and DRAM can be used as cache to reduce write operations to PCM memory for avoiding WDEs.

"However, since conventional memories also have reliability issues, practical realization and adoption of these hybrid memory designs are expected to be as challenging as those of NVM-based memory designs. Overcoming these challenges will require concerted efforts from both academia and industry." the article (pdf)

Editor's comments:- Reading this paper left me with the confidence that I was in good hands with Sparsh Mittal's identification of the important things which need to be known.

If you need to know more he's running a one day workshop on Advanced Memory System Architecture March 4, 2017.

See also:- an earlier paper by Sparsh Mittal - data compression techniques in caches and main memory

Getting acquainted with the needs of new big data apps
Editor:- February 13, 2017, 2017 - The nature of demands on storage and big memory systems has been changing.

A new slideshare - the new storage applications by Nisha Talagala, VP Engineering at Parallel Machines provides a strategic overview of the raw characteristics of dataflows which occur in new apps which involve advanced analytics, machine learning and deep learning.

It describes how these new trends differ to legacy enterprise storage patterns and discusses the convergence of RDBMS and analytics towards continuous streams of enquiries. And it shows why and where such new demands can only be satisfied by large capacity persistent memory systems.
slideshare by Parallel Systmes - memory and storage demands from new real time analytics and other new apps
Among the many interesting observations:-
  • Quality of service is different in the new apps.

    Random access is rare. Instead the data access patterns are heavily patterned and initiated by operations in some sort of array or matrix.
  • Correctness is hard to measure.

    And determinism and repeatability is not always present for streaming data. Because for example micro batch processing can produce different results depending on arrival time versus event time. (Computing the right answer too late is the wrong answer.)
Nisha concludes "Opportunities exist to significantly improve storage and memory for these use cases by understanding and exploiting their priorities and non-priorities for data." the article

SSD software news
where are we heading with memory intensive systems?

SSD ad - click for more info

Xitore envisions NVDIMM tiered memory evolution
Editor:- February 7, 2017, 2017 - "Cache based NVDIMM architectures will be the predominant interface overtaking NVMe within the next 5-10 years in the race for performance" - is the concluding message of a recent presentation by Doug Fink , Director of Product Marketing - Xitore - Next Generation Persistent Memory Evolution - beyond the NVDIMM-N (pdf)

NVDIMM adoption and evolution paper Xitore

Among other things Doug's slides echo a theme discussed before - which is that new memory media (PCM, ReRAM, 3DXpoint) will have to compete in price and performance terms with flash based alternatives and this will slow down the adoption of the alt nvms.

Editor's comments:- Xitore (like others in the SCM DIMM wars market) is working on NVDIMM form factor based solutions and in this and an earlier paper they provide a useful summary of the classifications in this module category.

However, the wider market picture is that the retiring and retiering DRAM story cuts across form factors with many other permutations of feasible implementation possible.

So - whereas the NVDIMM is a seductively convenient form factor for systems architects to think around - the competitive market for big memory will use anything from SSDs on a chip upto (and including) populations of entire fast rackmount SSD boxes as part of such tiered solutions - if the economics, scale, interface fabric and software make the cost, performance and time to market sums emerge in a viable zone of business risk and doability.

SSD news
storage market research
RAM ain't what it used to be

SSD SoCs controllers
SSD controllers ..
image shows mouse building storage - click to see industrial SSDs article
industrial SSDs ..
pcie  SSDs - click to read article
PCIe SSDs ..
military storage directory and news
military SSDs ..

SSD news - February 2017

popular articles / SSD history / top SSD companies

Micron's SSDs tough enough for army use

Editor:- February 25, 2017 - Micron isn't a name that would sping to mind when thinking about military SSDs. Which is why I found a new applications white paper from Micron interesting.

Micron's IT SSDs withstand DRS' Toughest Tests (pdf) describes how DRS (which is a military SSD company) requalified an industrial SSD - M500IT - which had originally been designed for the automotive market so that it could be used in a large Army program. the article (pdf)

Symbolic IO reveals more

Editor:- February 25, 2017 - Symbolic IO is a company of interest which I listed in my blog - 4 shining companies showing the way ahead - but until this week they haven't revealed much publicly about their technology.

Now you can read details in a new blog - Symbolic IO reveals tech - written by Chris Mellor at the Register who saw a demo system at an event in London.

As previously reported a key feature of the technology is that data is coded into a compact form - effectively a series of instructions for how to create it - with operations using a large persistent memory (supercap protected RAM).

Among other things Chris reports that the demo system had 160GB of raw, effectively persistent memory capacity - which yielded with coding compression - an effective (usable) memory capacity of 1.79TB.

Security in the system is rooted in the fact that each system evolves its own set of replacement codes computed on the fly and held in persistent memory - without which the raw data is meaningless. A security sensor module located in a slot in the rack "the Eye" can erase the data relationships codes based on GPS and other boundary conditions being crossed (as in some fast purge SSDs). the article

Editor's comments:- The data compaction and therefore CPU utilization claims do seem credible - although the gains are likely to be applications dependent.

Throughout the data computing industry smart people are going back to first principles and tackling the embedded problems of inefficiencies and lack of intelligence which are buried in the way that data is stored and moved. The scope for improvement in CPU and storage utilization was discussed in my 2013 article - meet Ken - and the enterprise SSD software event horizon.

The potential for improvement is everywhere - not just in pre SSD era systems. For example Radian is picking away at inefficiencies caused within regular flash SSDs themselves by stripping away the FTL. Tachyum is aiming to push the limits of processing with new silicon aimed at memory centric systems. For a bigger list of companies pushing away at datasystems limitations you'd have to read the SSD news archive for the past year or so.

But all new approaches have risks.

I think the particular risks with Symbolic IO's architecture are these:-
  • Unknown vulnerability to data corruption in the code tables.

    Partly this would be like having an encrypted system in which the keys have been lost - but the effect of recovery would be multiplied by the fact that each raw piece of data has higher value (due to compacting).

    Conventional systems leverage decades of experience of data healing knowhow (and data recovery).

    We don't know enough about the internal resiliency architecture in Symbolic IO's design.

    It's reasonable to assume that there is something there. But all companies can make mistakes as we saw in server architecture with Sun's cache memory problem and in storage architecture when Cisco discovered common mode failure vulnerabilities in WhipTail 's "high availability" flash arrays.
  • Difficult to quantify risk of "false positive" shutdowns from the security system.

    This is a risk factor which I have written about in the context of the fast purge SSD market. Again this is a reliability architecture issue.
I expect that Symbolic will be saying much more about its reliability and data corruption sensitivities during the next few years. In any case - Symbolic's investment in its new data architecture will make us all rethink the bounds of what is possible from plain hardware.

NxGn Data is now called NGD Systems

Editor:- February 22, 2017 - NGD Systems (formerly called NxGn Data) today announced the availability of a new product aimed at the PCIe NVMe SSD market. The Catalina SSD has 24TB of 3D TLC flash which the company says uses less than 0.65 watts per terabyte.

Editor's comments:- I couldn't see any mention of DWPD or performance or price on the NGD Systems web site. This is reminiscent of consumer product launches in which the picture supposedly tells the story rather than serious enterprise / cloud marketing.

MRAM's fitness for high altitude and hot environments discussed in a blog by Everspin

Editor:- February 22, 2017 - A new blog - MRAM earns its stripes in HiRel applications - by Duncan Bennett, Product Marketing Manager at Everspin lists some of the intrinsic characteristics of MRAM and their advantages for roles in aerospace applications. Among other things Duncan Bennett says:-
  • "MRAM memory bits are immune to the effects of alpha particles."
  • "MRAM outperforms other non-volatile memory technology when it comes to data retention at high temperatures." the article

Editor's comments:- this article only talks about the virtues of MRAM but another recent paper which I mentioned recently in SSD news (see sidebar left) - a Survey of Soft-Error Mitigation Techniques for Non-Volatile Memories (pdf) - raised doubts about the simplicity of using MRAM due to its soft error sensitivity to read disturb errors. Admittedly this was looking at an enterprise memory context where the more memory you have the sooner you are likely to witness such errors. But it's just my way of reminding you that there are no magic products in the memory ecosystem.

To be fair Everspin's article also mentioned that some mission critical customers use screening processes to select "hardened" MRAM - because - just as with traditional memories - some devices are just better than others.

PS - A useful sanity check in this context is a (2013) paper - MRAM Technology Status (pdf) - by Jason Heidecker at JPL NASA which includes a history of MRAM upto the first commercial product availability and anecdotal data about the space readiness of the technology based on data integrity tests in various flights. the article (pdf)

is Toshiba salami slicing its memory heirloom?

Editor:- February 14, 2017 - Toshiba has today, again, topped mainstream media tech headlines due to the resignation of the company's chairman. In recent weeks - were I so inclined (to fan the flames of rumor) - I could have inserted some story or other about the sale of Toshiba's semiconductor business in this news page every day.

Instead I came to the conclusion that the real story was that there probably wouldn't be a single big story because the sale of the entire memory systems business to a single buyer (most likely another memory or SSD company) would inevitably introduce a delay due to antitrust hurdles. And Toshiba needs financial bandaids now.

Therefore, what we've been seeing is a fragmented approach - which on linkedin I described as "salami slicing" the memory business heirloom. Western Digital got some - but no one will get it all quickly due to caution about the impact of regulator antiacid.

SanDisk announces the arrival of flight 2.5 NVMe

Editor:- February 10, 2017 - SanDisk recently recycled the "Skyhawk" SSD brand - which had previously been associated with a rackmount SSD product (launched in October 2014) from Skyera - another SSD company - like SanDisk - which was acquired by Western Digital and by coincidence whose founder's new company emerged from stealth this week. (See the story about Tachyum after this.)

The new SanDisk Skyhawk is aimed at the 2.5" NVMe PCIe SSD market.

Although SSD brand names can be important the significant thing about SanDisk's new Skyhawk is that it fixes a longstanding strategic weakness in its enterprise PCIe SSD product line which I commented on in October 2015 (when WD announced it would acquire SanDisk).

The irony is that Fusion-io (which created the enterprise PCIe SSD market and by whose acquisition SanDisk hoped in June 2014 to broaden its flash presence in the enterprise market) had been one of the earliest companies to demonstrate a prototype 2.5" PCIe SSD (in May 2012). But Fusion didn't productize that concept and chose instead to move upscale in form factor to boxes.

Decoupling from the complex legacy of the past is why it has taken nearly 5 years for SanDisk to launch its me too Skyhawk 2.5" NVMe SSD now.

our impact could be 100x SandForce - says cofounder of Tachyum

Editor:- February 8, 2017 - Tachyum emerged from stealth mode today announcing its "mission to conquer the performance plateau in nanometer-class chips and the systems they power."

Tachyum (named for the Greek "tachy," meaning speed, combined with "-um," indicating an element) was cofounded by Dr. Radoslav "Rado" Danilak, who has invented more than 100 patents and spent more than 25 years designing state-of-the-art processing systems and delivering significant products to market.

Among other things Rado founded or cofounded 2 significant companies in SSD market history:- Skyera - an ultra efficient petabyte scale rackmount SSD company acquired by WD in 2014 - and SandForce which designed the most widely deployed SSD controllers. SandForce was acquired by LSI for $322 million in 2011 and in 2014 LSI's SSD business was acquired by Seagate.

Rado's past work in processor applications include:- at Wave Computing where he architected the 10GHz Processing Element of their deep learning DPU.

Explaining the technology void and market gap which Tachyum will focus on Rado said - "We have entered a post-Moore's Law era where performance hit a plateau, cost reduction slowed dramatically, and process node shrinks and CPU release cycles are getting longer. An innovative new approach, from first principles is the only realistic chance we have of achieving performance improvements to rival those that powered the tech industry of past decades, and the opportunity is a hundred times greater than any venture I've been involved in."

Editor's comments:- on linkedin I said "I don't know any details but with so many physics rooted data agility problems still needing to be solved anything that Rado Danilak does will be worthy of our future attention."

Rado replied - "Like always you are right on target. In fact Tachyum is 100x of SandForce opportunity and impact."

See also:- in-situ SSD processing

who's well regarded in networked storage?

Editor:- February 1, 2017, 2017 - IT Brand Pulse today announced the results of its recent survey covering brand perceptions in the networked storage market.

Among other things:- "By nearly a 2-to-1 margin, Seagate, outperformed second-place challenger (Western Digital) to capture its 5th Market Leader award for Enterprise HDDs.)" the article

See also:- Branding Strategies in the SSD Market, Storage SoothSayers
What happened before?

storage search banner

SSD news page image - click to  enlarge

Michelangelo found David inside a rock.
Megabyte was looking for a solid state disk.
SSD ad - click for more info
after AFAs? - the next box
Throughout the history of the data storage market we've always expected the capacity of enterprise user memory systems to be much smaller than the capacity of all the other attached storage in the same data processing environment.

after AFAs - click to read rhe articleA new blog on the home page of - cloud adapted memory systems - asks (among other things) if this will always be true.

Like many of you - I've been thinking a lot about the evolution of memory technologies and data architectures in the past year. I wasn't sure when would be the best time to share my thoughts about this one. But the timing seems right now. the article
controllernomics - joins the memory latency to do list
Editor:- February 20, 2017 - As predicted 8 years ago - the widespread adoption of SSDs signed the death warrant for hardware RAID controllers.

Sleight of hand tricks which seemed impressive enough to make hard drive arrays (RAID) seem fast in the 1980s - when viewed in slow motion from an impatient SSD perspective - were just too inelegant and painfully slow to be of much use in true new dynasty SSD designs.

The confidence of "SSDs everywhere" means that the data processing market is marching swiftly on - without much pause for reflection - towards memory centric technologies. And many old ideas which seemed to make sense in 1990s architecture are failing new tests of questioning sanity.

For example - is DRAM the fastest main memory?

No - not when the capacity needed doesn't fit into a small enough space.

When the first solutions of "flash as RAM" appeared in PCIe SSDs many years ago - their scope of interest was software compatibility. Now we have solutions appearing in DIMMS in the memory channel.

This is a context where software compatibility and memory latency aren't the only concerns. It's understanding the interference effects of all those other pesky controllers in the memory space.

That was one of the interesting things which emerged in a recent conversation I had with Diablo Technologies about their Memory1. See what I learned in the blog - controllernomics and user risk reward with big memory "flash as RAM"
5 years back, 5 years forward
In February 2012 - Kaminario said that the percentage of its enterprise SSD systems which were pure RAM SSDs had declined to 10%. And 45% of the systems it was shipping (at that time) were all flash arrays.

That was a useful way of assessing progress in the succession of flash in the enterprise over the original RAM SSD market.

From the perspective of 2017 we now see of course that what was good for storage (capacity and IOPS) is good too for latency - as flash has started replacing DRAM as random access memory in high capacity RAM systems ranging from single servers to multiple racks.

That's because the low energy requirement of nvms (which don't need gas guzzling refresh) means you can fit more raw memory capacity into a single motherboard. And even the higher raw access times of flash (compared to DRAM) look good in comparison to box hopping fabrics.

(And other cost savings kick in too.)

One day in the future on this page we will be reporting when DRAM (in external chips and DIMMs) has reached the point where it is only 10% of all native main memory too.
SSD ad - click for more info
"We are morphing from a storage hierarchy to a memory hierarchy. This is why I choose to work where I do. Memory rules."
Rob Peglar, Senior VP & CTO, Symbolic IO in a comment on LinkedIn (February 2, 2017).

what does "serverless software" really mean?
Editor:- February 20, 2017 - Did you want a side of SLBS (server less BS) with your software or hardware FUD? - is the title of an amusing new blog by Greg Schulz founder StorageIO.

Editor's comments:- I'm not going to quote from Greg's blog. To see what he says you'll just have to read it.

At one and the same time it provides a funny and cuttingly serious analysis of what happens when marketers stray too far off the edge - reminiscent of scenes in looney toons.

See also:- Marketing Views

All the marketing noise coming from the DIMM wars market (flash as RAM and Optane etc) obscures some important underlying strategic and philosophical questions about the future of SSD.
where are we heading with memory intensive systems?
Memory1 beats DRAM in big data multi box analytics
Editor:- February 7, 2017 - The tangible benefits of using flash as RAM in the DIMM form factor are illustrated in a new benchmark Apache Spark Graph Performance with Memory1 (pdf) - published today by Inspur Systems (the largest server manufacturer in China) in collaboration with Diablo Technologies.

The memory intensive tests were run on the same cluster of five servers (Inspur NF5180M4, two Intel Xeon CPU E5-2683 v3 processors, 28 cores each, 256GB DRAM, 1TB NVME drive).

The servers were first configured to use only the installed DRAM to process multiple datasets. Next, the cluster was set up to run the tests on the same datasets with 2TB of Memory1 per server.

The k-core algorithm (which is typically used to analyze large amounts of data to detect cross-connectivity patterns and relationships) was run in an Apache Spark environment to analyze three graph datasets of varying sizes upto a 516GB set of 300 million vertices with 30 billion edges.

Completion times for the smallest sets were comparable. However, the medium-sized sets using Memory1 completed twice as fast as the traditional DRAM configuration (156 minutes versus 306 minutes). On the large sets, the Memory1 servers completed the job in 290 minutes, while the DRAM servers were unable to complete due to lack of memory space.

Editor's comments:- As has been noted in previously published research by others - being able to have more RAM emulation flash memory in a single server box can (in big data computing) give similar or better results than implementing the server set with more processors and more DRAM in more boxes.

This is due to the traffic controller and fabric latencies between server boxes which can negate most of the intrinsic benefits of the faster raw memory chips - if they are physically located in another box.

The key takeaway message from this benchmark is that a single Memory1 enhanced server can perform the same workload as 2 to 3 non NVDIMM enhanced servers when the size of the working data set is the limiting factor.

More useful however (as you will always find an ideal benchmark which is a good fit to the hardware) is that the Memory1 system places lower (3x lower) caching demands on the next level up in the storage system (in this case the attached NVMe SSDs). This provides a higher headroom of scalability before the SSDs themselves become the next critical bottleneck.

In their datasheet about Memory1 enhanced servers Inspur give another example of the advantages of this approach - quoting a 3 to 1 reduction in server footprint and faster job completion for a 500GB SORT.

the road to DIMM wars
are you ready to rethink RAM?
DRAM's indeterminate latencies