click to visit home page
leading the way to the new storage frontier .....
image shows Megabyte reading a scroll - the top 50 SSD articles and recommended SSD blogs
SSD articles ..
"There is what looks like a living set of SSD reference documents
to be found at"
Dr. Terry Critchley in his book - High Availability IT Services, page 94, published December, 2014

high availability enterprise SSDs

SSD news
SSD software
the SSD reliability papers
this way to the Petabyte SSD
Surviving SSD sudden power loss
what do enterprise SSD users want?
11 key Symmetries in all SSD designs
the Survivor's Guide to Enterprise SSDs
Data Integrity Challenges in flash SSD Design
 dual ported  2.5 inch NVMe PCIe SSD from OCZ
.. 2.5" NVMe PCIe SSDs
dual ported for HA configurations
the Z-Drive 6000 series - from OCZ
Efficiency - making the same SSD - with less chips
4 key SSD ideas which emerged and clarified in 2015
failures in time and faulty thinking in SSD reliability analysis
can memory do more?

Editor:- June 17, 2016 - in a new blog on - I ask - where are we heading with memory intensive systems and software?

All the marketing noise coming from the DIMM wars market (flash as RAM and Optane etc) obscures some important underlying strategic and philosophical questions about the future of SSD.

When all storage is memory - are there still design techniques which can push the boundaries of what we assume memory can do?

Can we think of software as a heat pump to manage the entropy of memory arrays? (Nature of the memory - not just the heat of its data.)

Should we be asking more from memory systems? the blog

Infinidat has shipped over 400PB

Editor:- May 10, 2016 - Infinidat today announced that in Q1 2016 shipments of its InfiniBox enterprise storage array increased by 300% compared to the year ago period and now amounts to 422 petabytes worldwide. One of Infinidat's Fortune 500 customers now has over 10PBs of InfiniBox storage spread across multiple sites.

PrimaryIO ships applications aware FT caching

Editor:- March 8, 2016 - PrimaryIO (which changed its name from CacheBox in August 2015) today announced the general availability of its Application Performance Acceleration V1.0 (SSD aware software) for VMware vSphere 6.

PrimaryIO APA aggregates server-based flash storage across vSphere clusters as a cluster-wide resource and supports write-around and write-back caching with full fault-tolerance in face of node failures since writes to cache are replicated to up to 2 additional nodes.

Datalight's SSD firmware to boldly go

Editor:- September 17, 2015 - Datalight today said its embedded filesystem (Reliance Nitro) and FTL (FlashFX Tera) have been selected by NASA for use onboard future manned spacecraft in the Orion program.

3D TLC is good enough to last 7 years says Kaminario

Editor:- August 21, 2015 - One of the early new SSD ideas in 2014 was that 3D nand flash was tough enough to consider using in industrial SSDs so it was no surprise when 3D flash started to appear in volume production of enterprise SSD accelerators such as Samsung's 10 DWPD NVMe PCIe SSDs in September 2014.

So the recent announcement by Kaminario that it will soon ship 3D TLC (3 bits) flash in its K2 rackmount SSDs can be seen as a predictable marker in the long term trend of flash adoption in the enterprise.

Less predictable, than the price (under $1,000/TB for usable systems capacity) however, is that Kaminario is offering a 7 years endurance related systems warranty.

disk writes per day in enterprise SSDs
This factor - discussed in a Kaminario blog - tells us more about Kaminario's customer base than it tells us about flash endurance however.

Kaminario says its HealthShield "has been collecting endurance statistics for the past few years, and from analyzing the data we see that 97% of (our) customers are writing less than a single write per day (under 1 DWPD) of the entire capacity."

This is one aspect of a trend I wrote about a few years ago - thinking inside the box - which is that designers of integrated systems have more freedom of choice in their memories than designers of standard SSD drives - because they have visibility and control of more layers of software and can leverage other architectural factors.

A competent box level SSD designer can make better decisions about how to translate raw R/W intentions (from the host) into optimized R/W activities at the flash .

This is especially the case when the designers are also collecting raw data about the workloads used in their own customer bases. The customer experience is more important than slavishly designing systems which look good in artificial benchmarks.

"more lanes of SAS than anyone else" - new 4U SavageStor

Editor:- July 28, 2015 - As the rackmount SSD market heads towards future consolidation - new business opportunities are being created for those brave hardware companies which accept the challenge of providing simple hardware platforms (which provide high density or efficiency or performance or other combinations of valued technical features optimized for known use cases) while also being willing to sell them unbundled from expensive frivolous software.

In that category - Savage IO today launched its SavageStor - a 4U server storage box - which - using a COTS array of hot swappable SAS SSDs - can provide upto 288TB flash capacity with 25GB/s peak internal bandwidth with useful RAS features for embedded systems integrators who need high flash density in an untied / open platform.

Savage IO says it "products are intentionally sold software-free, to further eliminate performance drains and costs caused by poor integration, vendor lock-in, rigidly defined management, and unjustifiable licensing schemes."

Editor's comments:- I spoke to the company recently and most of you will instantly know if it's the right type of box for you or not.

Nimble video discusses 5 9's in 5,000 systems

Editor:- February 21, 2015 - Nimble Storage recently disclosed (in a sponsored video fronted by ESG) that its customer deployed rackmount storage systems are achieving better than 5 9's uptime - 99.999% availability.

news image Nimble reliabilityThis has been attained in a field population of 5,000 arrays representing 1,750 years of system run time thanks to a combination of factors including the crowd sourced intelligence of its InfoSight management system which can alert users to potential down time events so they can take evasive action before bad things happen.

Editor's comments:- While useful in telling us how many systems Nimble has sold it's less useful as an indicator of availability given that the average run time across the population is about 4 months.

It would be more impressive if they could repeat the disclosure in a few years time and selectively extract the up-time of systems over different run times, upto 1 year, 1 to 2 years etc.

If indeed Nimble is still in a position to do so, and if it would still meaningful given the consolidation in hardware and software which lies ahead for the enterprise SSD market may mean that vendors will be using the same hardware.

shared vulnerabilities may be another factor in pausing Cisco's UCS Invicta shipments

Editor:- October 24, 2014 - The discovery of single points of failure which could compromise the availability of the rackmount SSD family acquired by Cisco last year - are among several design issues contributing to the continuing pause in shipments - according to reports by CRN.

HA SSD arrays - are now mainstream

Editor:- October 13, 2014 - I've long had an abiding interest in the architecture of fault tolerant / high availability electronic systems - ever since learning that such concepts existed - when (in about 1976) our digital systems design lecturer Dr R G 'Ben' Bennetts at Southampton University suggested we should read a paper about how NASA's Jet Propulsion Labs used triple modular redundancy.

(I can't remember the details of that paper - but the JPL people and their collaborators and descendants have never stopped inspiring and writing a rich literature about the design aspects of computer systems which operate a long way from a service engineer.)

In the early part of my career - such ideas were good to know about - but far too exotic and expensive to incorporate into most products. But I was reminded about them in the 1990s - when in the publication which preceded - some of my customers were advertising their FT/ HA SPARC servers for the telco market.

The more you investigate the architecture of FT/ HA computer systems the more you realize it's a philosophy rather than a technology which you can implement as a plug and play inconsequentially within the cost goals of mere mortals.

The results are always compromises - which balance reliability (aka functionable survivability) against other tradeoffs - such as performance. (And performance itself has many internal dimensions of fault tolerance too.)

Violin's 6000 SSD and HA

3 years ago (in September 2011) when I was talking to Violin's CEO (at that time) Don Basile about the launch of Violin's first 6000 series (the first no single point of failure, competitively priced, fast flash rackmount SSD) he expressed some concern about how I would tell you (my readers) what was unique about this product and signal whether it was relevant to you or not - as it was competing for attention with thousands of other SSD stories for applications ranging from phones to drones.

I didn't see that as a problem - because my readers are smart - and I had been publishing a directory page dedicated to SSD Reliability since 2008.

But just to make sure that the systems embodiments of FT/HA/SSD architecture from a growing base of competitors didn't get washed away by other stories - I launched a dedicated ft/HA enterprise SSD directory in January 2012 - to serve an emerging base of reliability focused readers - which in those days measured around 10,000 readers / year in that niche topic. (Until recently HA SSDs have rarely entered the top 30 SSD articles viewed by my readers.)

But something in the market has changed.

I noticed this week that the topic of HA/FT SSDs has risen to be 1 of the top 10 topics that you've been looking at this month. Which means it's mainstream.

Looking back at other past niche topics...

10 years ago I didn't think that more than a few hundred people would be interested in the intricacies of flash endurance. And to begin with - SSD vendors were nervous about even acknowledging that there was such a thing as SSD wear out. Now you can't shut them up. They all want to show you how clever they are at handling it

The different types of flash memory and different generations of arcane flash care schemes spawned a huge industry literature of understanding and misunderstanding - so I wouldn't be surprised if the enterprise FT/HA flash array market now started to do something similar.

PS - After a communications gap of 37 years - I exchanged some emails with my old university lecturer - Ben Bennetts while writing this - to see if I had remembered things correctly.

He said - "Yes, that was me. I lectured on fault-tolerant systems and JPLs Self-Test And Repair, STAR, computer, based on triple modular redundancy, used to feature in my presentations."

So that enables me to pin point the original source of that inspirational IEEE Transactions paper about fault tolerant computing - which I remember having read in 1976 (although I haven't read it since) to Prof. Algirdas Antanas Avižienis - whose visionary work on - what is today called - "Dependable Computing and Fault-Tolerant Systems" - continues today.

You don't need to worry about the endurance of our FlashSystems - says IBM

Editor:- October 7, 2014 - Worried about endurance?

"None of the thousands of FlashSystem products (fast rackmount SSDs) which IBM has shipped has ever worn out yet! - says Erik Eyberg, Flash Strategy & Business Development at IBM - in his new blog - Flash storage reliability: Aligning technology and marketing. "And our metrics suggest that will remain true in almost all cases for many, many years (certainly well beyond any normal and expected data center life cycle)"

Erik goes on to explain that's the reason IBM can now officially cover flash storage media wear-out as part of its standard IBM FlashSystem warranty and maintenance policies - without changing the prices for these services.

And his blog has a link to a white paper about the reliability architecture underlying this product (although it's behind a sign-up wall - which seems counter productive to me.)

Editor's comments:- Don't expect all other flash array vendors to follow suit (with no cost endurance guarantees) - because this product range from IBM is based on design rules and memory reliability architectures experience in FC SAN compatible enterprise SSD racks which have evolved since the 1st generation RamSan from TMS (in 2000). And for more than a decade before that using other popular enterprise storage interfaces.

Holly Frost - who founded Texas Memory Systems - and who was the CEO when TMS was acquired - told me a revealing story about TMS's policies concerning the reliability of their SSD systems and customer care procedures.

This conversation took place in December 2011 - when the company was launching its first high availability SSD - which became the basis of IBM's FlashSystem.

It still makes interesting reading today. You can see it in this article - in the right hand column - scroll down to the box titled - "no single point of failure - except..."

HGST announces 2nd generation clustering software for FlashMAX PCIe SSDs

Editor:- September 9, 2014 - HGST today announced a new improved version of the high availability clustering capability previously available in the PCIe SSD product line acquired last year from Virident.

HGST's Virident Space allows clustering of up to 128 servers and 16 PCIe storage devices to deliver one or more shared volumes of high performance flash storage with a total usable capacity of more than 38TB.

HGST says its Virident HA provides a "high-throughput, low-latency synchronous replication across servers for data residing on FlashMAX PCIe devices. If the primary server fails, the secondary server can automatically start a standby copy of your application using the secondary replica of the data."

For more details see - HGST Virident Software 2.0 (pdf)

Editor's comments:- This capability had already been demonstrated last year - and ESG reported on the technology in January 2014.

But at that time - the clustering product called vShare - was restricted to a small number of servers - and the data access fabric was restricted to Infiniband only.

With the rev 2.0 software - the number of connected devices has increased - and users also have the lower cost option of using Ethernet as an alternative supported fabric.

say hello to high availability CacheIO

Editor:- June 10, 2014 - CacheIO today announced results of a benchmark which is described by their collaborator Orange Silicon Valley (a telco) as - "One of the top tpm benchmark results accelerating low cost iSCSI SATA storage."

CacheIO says that the 2 million tpm benchmark on CacheIO accelerated commodity servers and storage shows that users can deploy its flash cache to accelerate their database performance without replacing or disrupting their existing servers and storage.

Editor's comments:- The only reason I mention this otherwise me-too sounding benchmark is because although I've known about CacheIO and what they've been doing with various organizations in the broadcast and telco markets for over a year - I didn't list them on before.

That was partly because they didn't want me to name the customers they were working with at that time - but also because with SSD caching companies becoming almost as numerous as tv channels on a satellite dish - I wanted to wait and see if they would be worth a repeat viewing. (And now I think they they are.)

PS - I asked Bang Chang, CEO of CacheIO if he had a white paper which talked more about the company's cache architecture and philosophy. He sent me this - CacheIO High Availability Deployment (pdf) - from which I've extracted these quotes...
  • re network cache appliances - "At CacheIO we believe that network cache appliance is the best storage architecture to decouple performance from capacity and achieve the best of both worlds.

    Once deployed as a "bump in the wire" performance accelerator, our network cache appliance can also deliver additional value added services... Compared to server-side Flash cache, our network cache appliance is a shared resource that is more scalable, more reliable, supports clustered applications, and most importantly allows customers, especially cloud service providers, to monetize performance by dynamically allocating resources based on changing SLAs."
  • re operational transparency - "Implementing CacheIO network appliance requires no change to existing applications, servers, or storage. CacheIO can be slotted in, turned on to accelerate applications, and turned off if necessary, often without needing to stop the applications."
I found it interesting to see that in addition to conventional connections (SAN and InfiniBand) their HA paper also mentions emerging PCIe fabric.

Violin enters the fast HA SSD apps server market

Editor:- April 22, 2014 - Violin Memory today announced general availability of its Windows Flash Array (WFA) - a new 3U fast 10GbE high availability SSD accelerated apps server (64TB raw flash capacity) - for Microsoft enterprise software environments (with unique software developed by Microsoft) which opens up a new market opportunity for Violin and which enables it to compete head to head with vendors who sell enterprise servers accelerated by PCIe SSDs or memory channel SSDs with its own proprietary solution - which integrates servers preloaded and configured with Windows Storage Server 2012 R2 with fast RDMA memory access to an internal fabric of Violin VIMM flash arrays.

Editor's comments:- This product announcement is the nexus of several past business strands for Violin and also I think marks another major reset point for the company in terms of its positioning and business potential in the SSD market.

Here are a couple of things for you to be thinking about this new product.
  • Violin was already one of the leading companies in the rackmount SSD market before this product line. Its previous products provided various ways to accelerate apps by connecting their storage to existing servers.

    With this new product - if you're using enterprise apps from Microsoft - Violin can now provide the servers too. And the fast flash is already integrated in the server blades with a low latency software connection into server RAM via custom VIMM acceleration code written by Microsoft.

    While not as scalable as some of the new vendors in the general cloud / SDS market (Violin's new system currently scales to 4 arrays) - Violin's box will be one of the 2 or 3 fastest boxes in the market today.
  • Look Ma - no PCIe SSDs!

    Less than a year after Violin made its late entry into the PCIe SSD market (in January 2013) the company's new management in 2014 was saying that it would soon exit that business.

    That exit strategy made sense even before today's announcement - because it's a different type of business. - Violin was late to market - and PCIe SSDs requires different technical and marketing support than systems. (You saw a mirror image of some of those problems with Fusion-io's clunky boostrapping of itself into also being a systems company last year.)

    In Violin's case - seen from the company's business perspective - with its WFA product - you can now see why decoupling from the PCIe SSD market makes even more sense than it did before for 2 more reasons.

    1- If Violin isn't using its own PCIe SSD form factors in its own servers (but using its proprietary VIMMs instead) then why should anyone else use them?

    2 - Another reason in the balance is that the WFA will compete directly with server makers who are in the market for PCIe SSDs. If Violin found it hard trying to sell PCIe SSD to server makers before - the company's entry into the SSD server market would shrink such design prospects even more.
  • A new style of marketing.

    When I spoke last week - ostensibly about the WFA product briefing - to Eric Herzog, Violin's new CMO - I started by repeating something I'd already said to him earlier by email.

    Violin's big Achilles heel in recent years in my view had been its marketing.

    Since September 2011 (long before the company's IPO) I had been saying to readers on these pages as directly as I could that Violin's marketing communications and advertising was not fit for purpose.

    I told Eric that I regarded his arrival in the company as a way to change all that.

    He told me that he agreed - and had already taken steps to correct a lot of past obvious mistakes.

    On the business development side he said the new management team at Violin included a strong hybrid of skills in both big enterprise companies and startups.

    On a personal level - having left a lot of money on the table from his previous post in coming to join Violin - he believed there was a great opportunity for Violin to be much more successful than it has been upto now. And a part of what he would be doing is ensuring that the company communicated more clearly - in the same kind of language that its customers used - what Violin was all about and how its solutions made good business sense.
Going back to the WFA product launch. I think the WFA is the most significant new product direction for Violin since the company switched away from using DRAM in its rackmount SSD arrays - and over to using flash (in November 2008)."

coming soon - an incrementally denser rackmount SSD from Nimbus

Editor:- February 25, 2014 - What does a petabyte of pure flash enterprise SSD look like?

It depends on who makes it, the relative speed category of the storage itself and the user's own preferences for high availability schemes and associated software.

At the densest (best) level - a petabyte of enterprise SSD can occupy as little as 2U rack space (skyEagle from Skyera).

The upper limit in size depends a lot on the internal architecture and memory type - but typically occupies 10x to 20x more physical space than the example I've given above.

On that theme - Nimbus Data today announced that in April it will sample the next generation of its own fast software rich featured SSD systems - the Gemini X - which will provide an incremental 2x capacity density improvement over the company's previous offering.

news image - click to enlarge

The Gemini X implements 960TB (raw) unified SSD storage with 100 microseconds latency (at under 6W / TB) in 24U of rackspace (40GB/s internal bandwidth) with external connection via 10 dual ports which can be 16Gb FC, 40Gb Ethernet or 56Gb Infiniband.

Like most leading vendors - Nimbus likes to think that its product are better than some other named competitors. But all such comparisons are very selective.

new blog by PernixData describes the intermediate states of play for its HA clustered write acceleration SSD cache

Editor:- November 5, 2013 - In a clustered, SSD ASAP VM environment which supports both read and write acceleration it's essential to know the detailed policies of any products you're considering - to see if the consequences - on data vulnerability and performance comply with strategies which are acceptable for your own intended uses.

In a new blog - Fault Tolerant Write Acceleration by Frank Denneman Technology Evangelist at PernixData describes in a rarely seen level of detail the various states which his company's FVP goes through when it recognizes that a fault has occured in either server or flash. And the blog describes the temporary consequences - such as loss of acceleration - which occur until replacement hardware is pulled in and configured automatically by the system software.

Stating the design principles of this product - Frank Denneman says - "Data loss needs to be avoided at all times, therefore the FVP platform is designed from the ground up to provide data consistency and availability. By replicating write data to neighboring flash devices data loss caused by host or component failure is prevented. Due to the clustered nature of the platform FVP is capable to keep the state between the write data on the source and replica hosts consistent and reduce the required space to a minimum without taxing the network connection too much." the article

SSD ASAPs - auto tiering / caching appliances
high availability enterprise SSDs

McObject shows in-memory database resilience in NVDIMM

Editor:- October 9, 2013 - what happens if you pull out the power plug during intensive in-memory database transactions? For those who don't want to rely on batteries - but who also need ultimate speed - this is more than just an academic question.

Recently on these pages I've been talking a lot about a new type of memory channel SSDs which are hoping to break into the application space owned by PCIe SSDs. But another solution in this area has always been DRAM with power fail features which save data to flash in the event of sudden power loss. (The only disadvantages being that the memory density and cost are constrained by the nature of DRAM.)

McObject (whose products include in-memory database software) yesterday published the results of benchmarks using AGIGA Tech's NVDIMM in which they did some unthinkable things which you would never wish to try out for yourself - like rebooting the server while it was running... The result? Everything was OK.

"The idea that there must be a tradeoff between performance and persistence/durability has become so ingrained in the database field that it is rarely questioned. This test shows that mission critical applications needn't accept latency as the price for recoverability. Developers working in a variety of application categories will view this as a breakthrough" said Steve Graves, CEO McObject.

Here's a quote from the whitepaper - Database Persistence, Without The Performance Penalty (pdf) - "In these tests eXtremeDB's inserts and updates with AGIGA's NVDIMM for main memory storage were 2x as fast as using the same IMDS with transaction logging, and approximately 5x faster for database updates (and this with the transaction log stored on RAM-disk, a solution that is (even) faster than storing the log on an SSD). The possibility of gaining so much speed while giving up nothing in terms of data durability or recoverability makes the IMDS with NVDIMM combination impossible to ignore in many application categories, including capital markets, telecom/networking, aerospace and industrial systems."

Editor's comments:- last year McObject published a paper showing the benefits of using PCIe SSDs for the transaction log too. They seem to have all angles covered for mission critical ultrafast databases that can be squeezed into memory.

OCZ ships PCIe SSD based SQL accelerator

Editor:- July 23, 2013 - OCZ today announced the general availability of its ZD-XL SQL Accelerator - an SSD ASAP appliance - delivered as a PCIe SSD (600GB, 800GB or 1.6TB) and bundled software - which optimizes caching of SQL Server data in Windows environments - and can provide upto 25x faster database performance.

HA functionality works through Microsoft SQL Server AlwaysOn technology, so that in the event of planned or unplanned downtime, can continue operations from the stopping point, retaining all of its data as if no downtime had occurred.

"We believe that the industry is primed for this type of tightly integrated, plug-and-play use-case acceleration solution..." said Ralph Schmitt, CEO - OCZ Technology.

Editor's comments:- One of the differentiators in SSD caching products is the sophistication of their behavior when viewed from a time basis. This is 1 of the 11 key SSD symmetries - which I call "age symmetry".

In this respect - a key feature of ZD-XL SQL Accelerator is its business-rule pre-warming cache engine and cache warm-up analyzer that monitors SQL Server workloads and automatically pre-loads the cache in advance of critical, demanding or important SQL Server jobs. It achieves this by identifying repeated access patterns that enable DBAs to set periodic time schedules to pre-load the cache.

This product won Best of Show Award at an event called Interop in May.

Violin's new blog on PCIe vs FC SAN SSDs

Editor:- June 11, 2013 - How much confidence do you have in the fault tolerance of the SSD system which you're deploying? - And how different are the reliability costs when you scale PCIe SSDs compared to rackmount SSDs?

These are some of the issues discussed in a new blog by Violin - Flash Memory: too Array or to Card - written by the company's CTO of Software - Jonathan Goldick - who warns that when you're estimating the latency advantages between different ways of connecting SSDs - "Always be cognizant that you may just be moving the bottleneck to the software." the article

Kaminario drops PCIe and turns to SAS to get costs down in new HA rackmount

Editor:- April 18, 2013 - "You don't have to be an investment bank like JP Morgan to afford our style of fast, scalable high availability SSD systems any more" - was the key message I got talking to Phil Williams, VP Business Development at Kaminario earlier this week when discussing with me aspects of the company's newest series of FC SAN compatible SSD arrays - the K2 v4 which was launched yesterday.

Phil was referring to the expectation that their products - which in the first generation were entirely RAM based SSDs - and then moved onto RAM / flash hybrids and then mostly pure flash (the flash components being implemented in the previous generation of K2's by Fusion-io's PCIe SSDs - a relationship direction which I suggested in a much earlier briefing conversation with Kaminario's CEO few years ago BTW ) - had acquired a reputation of being out of reach pricewise - and not just in a class of their own for resilience and scalability.

One of the ways that Kaminario has pulled off the affordability trick is to drop PCIe SSDs as the internal flash components and use instead SAS SSDs.

I've said before that in the enterprise arrays space - "SAS is the new SATA" - because there are so many companies which have moved into this segment that there's stiff competition. Unlike the PCIe SSD market -which is mostly sold on high performance - the SAS market includes a number of vendors who have been using adaptive R/W ECC to enable them to use cheap flash to build reliable fast-enough SSDs

Because Kaminario still has a lot of RAM cache in its server based architecture - it doesn't need the raw endurance and performance of FIO's ioMemory to deliver multi-gigabyte throughput at the rack level. And another factor is that Fusion-io itself is on course to become a significant supplier of rackmount SSDs (although not aimed at the same kind of customers.)

Kaminario didn't want to say which SAS product they're using. They might say later. But it doesn't really matter.

The K2 v4 also demonstrates that the key IP component in Kaminario's box is SSD software. When I suggested that future boxes could equally well discard SAS SSDs if 2.5" PCIe SSDs offered a better set of characteristics - Phil agreed that the company wasn't tied to any particular internal SSD drive form factor or interface.

Kaminario has paid Taneja Group to do some new testing on the performance aspects of simulated hard faults. These will be very useful for customers - and take the uncertainty out of the picture - giving hard numbers for various scenarios.

For example - when running at just under 200K IOPS and 5GB/s throughput - an entire node (controller) was removed to simulate a fault. I/O resumed after 23 seconds and performance dropped by less than 15% for 2 minutes before recovering fully.

IBM paper on SSD's data challenges

Editor:- December 4, 2012 - I've been browsing the online papers from last week's Server Design Summit.

If you can only read one - it should be this:- How do we handle all the data? (pdf) - by Andy Walls, Distinguished Engineer, Storage Hardware Chief Architect, IBM.

I won't give away the plot - because I think you'll enjoy it more if you read it yourself. But here are a few bullet points which resonated with me.

Among other things - Andy Walls says...
  • "90% of all the data in the world today was created in the last 2 years."
  • "Flash is starting to free up the IO bottleneck - but the bottleneck is more complicated..." the article (pdf)

See also:- future SSD capacity ratios , SSD bottlenecks, where are we now with SSD software?

HA Support in Fusion-io's ION SAN kit

Editor:- August 2, 2012 - Yesterday - Fusion-io launched its new ION software - which is a toolkit for bulding your own network compatible SSD rack by adding some Fusion-io SSD cards and their new software to any leading server.

The concept isn't entirely new - because oems have been doing this with various different brands of PCIe SSDs for years and this is a well established alternative market segment for PCIe SSDs. What is new - is that it makes the whole thing much easier.

Fusion-io says this new software product "delivers breakthrough performance over Fibre Channel, InfiniBand and iSCSI using standard protocols." (1 million random IOPs (4kB), 6GB/s throughput and 60 microseconds latency in a 1U rack.)

It also supports fault tolerance between racks.

HA support in OCZ's PCIe SSD software

Editor:- July 3, 2012 - OCZ published a white paper today - Accelerating MS SQL Server 2012 with OCZ Flash Virtualization (pdf) which describes the performance of the company's PCIe SSDs (Z-Drive R4) and its VXL caching and virtualization software in this kind of environment.

The interesting angle (for me) was in the aspect of SSD fault tolerance rather than the 16x VM speedup.

The paper's author Allon Cohen (who has written many thought provoking performance blogs) explains in this paper - "VXL software has a unique storage virtualization feature-set that enables transparent mirroring of SQL Server logs between 2 flash cards, thereby assuring that the log files can be accessed with ultra high performance, while at the same time, are highly available for recovery if required." the article (pdf)

SSD FITs & reliability

Editor:- June 20, 2012 -the component level isn't always the best level of abstraction in modeling enterprise SSD reliability.

Extrapolating from the single SSD component level can give you a misleading idea - because SSDs are data architecture components.

A recent article on my SSD news page on this subject started with an email from a reader who knew a lot more about SSD component reliability than me.

GridIron's SSDs can serve hundreds of concurrent databases effectively

Editor:- May 30, 2012 - GridIron Systems describes the setup required to exceed 1 million (4kB) IOPS in a 40x MySQL environment with mirroring - all in a single cabinet (including servers) using its FlashCube SSD systems (upto 80TB in this configuration), and some 10GbE and 16GbFC fabric switches in a new whitepaper (pdf) published today.

"In large-scale MySQL environments it's not uncommon to see hundreds or even thousands of database servers," said Dennis Martin, President of Demartek (which tested this configuration). "This reference architecture opens a new, more efficient architectural approach for serving increasing numbers of users and database queries per cabinet."

Pure Storage unveils new HA deduped array

Editor:- May 16, 2012 - Pure Storage today unveiled a new generation of fast-enough (100K write IOPS) HA/FT SSD arrays today - with upto 100TB compressed capacity - which are clustered around InfiniBand.

new article on Enterprise SSD Array Reliability

Editor:- March 1, 2012 - Objective Analysis has published an article -Enterprise Reliability, Solid State Speed (pdf) - which examines the conflicts which arise from wanting to use SSD for enterprise acceleration - while also preserving data protection in the event of SSD failure.

New approaches and architectures are required - because traditional methods can negatively impact performance - or - as in the case of RAID - don't always work.

"RAID is configured for HDDs that fail infrequently and randomly. SSDs fail rarely as well, but fail predictably" says the author Jim Handy - who warns that "SSDs in the same RAID and given similar workloads can be expected to wear out at about the same time."

He examines in detail one of the many new aproaches to high availability enterprise SSD design - that's used in Kaminario's K2. the article (pdf)

See also:- the SSD reliability papers, storage reliability, high availability enterprise SSD directory and SSD market analysts.

TMS packs 24TB fastest HA eMLC in 1U

Editor:- February 28, 2012 - I was just getting used to getting the measure of how much enterprise flash capacity can fit into 1U rackspace - when Texas Memory Systems changed things yet again by doing even more.

TMS today announced a 24TB high availability system called the RamSan-820. This has similar internal architecture to their 720 which I discussed with their CEO Holly Frost last December - but it uses eMLC instead of SLC - hence the doubling of the storage density.

TMS today revealed more about the internal features of their proprietary rackmount SSDs. Their RamSan-OS has been in continuous development for over 5 years, initially shipping with the RamSan-500 flash SSD in 2007. The RamSan-OS is designed from the ground up to run on a cluster of CPU nodes and FPGAs distributed throughout the RamSan systems.

Speed is still a core differentiator from TMS.

"Many of our competitors claim they are software companies and that their products are Application Accelerators. While this may be fundamentally true, all TMS products are 2x faster than any other Application Accelerators shipping today," according to TMS CEO Holly Frost. "It comes down to very simple technical and business questions: Why put key functions into slow software when you can speed up these functions in fast hardware?"

Power consumption is an important part of the reliability budget - and to drive this point home TMS say they are happy to supply customers with a wattmeter so they can compare these new SSDs with competing products.

Huawei Symantec publishes SPC-1 results for Dorado2100 SSD

Editor:- January 12, 2012 - Huawei Symantec has published an SPC Benchmark report (66 pages pdf) for its high availability FC SAN rackmount SSD - the Oceanspace Dorado2100.

A 1 terabyte (approx) usable protected (mirrored) SSD system (2.4TB raw) delivered over 100K SPC-1 IOPS at a market price of$0.90/SPC-1 IOPS. Click here for summary (pdf)

Editor's comments:- these SPC reports are very technical and the $ per SPC-1 IOPS headline figures include a lot of detailed factors including 3 years of 4 hour on-site response warranty etc. But the documents also include market prices for everything which goes into these calculations. From which we learn that a 2.4TB Dorado2100 SSD system with 16x 8Gbps FC ports costs about $52,000. See also:- SSD pricing

will new RamSan rattle Violin?

Editor:- December 6, 2011 - Texas Memory Systems today announced imminent availability of the RamSan-720 - a 4 port (FC/IB) 1U rackmount SSD which provides 10TB of usable 2D (FPGA implemented) RAID protected and hot swappable - SLC capacity with 100/25 microseconds R/W latency (with all protections in place) delivering 400K IOPS (4KB), 5GB/s throughput - with no single point of failure (at $20K/TB approx list).

The new SSD uses a regular RAM cache flash architecture which in the event of sudden power loss has an ultra reliable battery array which holds up the SSD power for 30 seconds while automatically backing up all data in flight and translation tables to nonvolatile flash storage. On power up - the SSD is ready for full speed operation in less than a minute.

Aimed at HA tier 1 storage markets - the RamSan-720 consumes only 300-400 W - which makes it practical for high end users to install nearly 1/2 petabyte of SSD storage in a single cabinet - without having to worry about the secondary reliability and data integrity risks which can arise from high temperature build-ups in such enclosures.

Editor's comments:- I've been talking to TMS every month for over 10 years - and I've been writing about their memory appliances since the early 1990s - so you might think that I would have run out of things to say by now. When I saw the preliminary specs for the new RS-720 - the features which jumped out at me were:-
  • the low R/W latency for this class of SPOF product. Which is 2x as good as the next fastest product I know - the 6000 series fron Violin - and several times faster than some other tier 1 SSD vendors such as Kaminario and Huawei Symantec
  • the high storage density - over 3x better than Violin delivers in SLC - and close to the usable RAIDed capacity that a Fusion-io 1U server can deliver in MLC when using Octal.
A few days ago I spoke to Holly Frost, CEO and Dan Scheel, President of Texas Memory Systems about their new SSD, what they think about what's going on in the SSD market, and the philosophy that steers the design of their SSDs. In a hour long discussion I learned enough new stuff to write several new articles. So instead of condensing it down here into a couple of bullet points - I'm going to give you the benefit of what I learned in a new article tomorrow called - "StorageSearch talks SSD with Holly Frost."

Going back to my headline - will new RamSan rattle Violin?

I'm sure that Violin would say that this simply validates what they are doing (and shipping) already - and that the enterprise SSD market is big enough for all vendors in this category to keep growing at a healthy clip. It make you wonder how much a company like TMS might be worth too...

Violin unveils naked cost advantages in reliable SSD arrays

Editor:- September 27, 2011 - Violin Memory today announced new models and options in its range of fast iSCSI / FC SAN rackmount SSDs.

The new 6000 series - designed for high availability applications with no single point of failure and hot swappable "everything" - provides 12TB SLC, or 22TB MLC usable capacity with 200/600 microseconds mixed latency, 1 million / 500K sustained RAIDed spike free write IOPS, in 3U rackspace at a list price around $37K / $20K per terabyte.

For less demanding applications (but still featuring hot swap memory modules) the company has also extended its lower priced 3000 series to 16TB SLC usable capacity.

Editor's comments:- when I spoke to Violin's CEO - Don Basile about the new 6000 series he was curious about how I would tell you what's unique about this product and signal whether it's relevant to you or not.

I said - when it comes to reliability - you've either got it - or you haven't - and there aren't too many enterprise SSD systems which have hot-swap everything. That's one of the reasons the latency looks slow - compared to many other fast SSDs - because the figures quoted here include the latency of the internal factory built protection schemes.

Another angle - I said is your product is an example of "big SSD architecture". When I explained what I meant - Don agreed and said what it means for the customer is lower price. Because when you look at the raw capacity that's lost to over-provisioning and RAID like protection and get down to the usable capacity that the customer sees in an MLC rack - say - then Violin's 6000 delivers about 70% of the raw capacity - versus nearer to 30% in an array of 2.5" SSDs for example. That confers a 2 to 1 native cost and density (SSD TB/U) advantage.

I said Violin's density looks good too - compared say to Kaminario's K2.

I also said - that our SSD readers would recognize what was meant by "spike-free" IOPS - because of various past articles about this - and because another enterprise flash vendor - Virident Systems - had made that one of the differences they talk about compared to some other flash PCIe SSD companies. I knew that in Violin's case that was due to their patented non-blocking write architecture - which was explained to me when their first flash products came to market in 2008.

Don said - that inside their protection array they're actually doing 5x more IOPS than the customer is seeing outside the box and on the datasheet - and that helps too.

I also asked about price - and where they were relative to $30K / TB - which is the ballpark for this type of product - and you can see where Violin are above. That's a competitive figure for a no SPOF SSD.

I said that for people who are serious about enterprise SSDs it's relatively easy to decide what products you may want to focus in on after just seeing a couple of simple metrics.

Don did also mention a comparative write up - about their SSD versus another so called "tier 1" storage solution - from EMC. Violin think it makes them look pretty good - but I can't understand why anyone cares how they stack up to EMC - who never understood the SSD plot - which is why their (at one time) prime SSD supplier STEC has had a bumpy revenue stream in recent years.

I had one final question for Don - which wasn't about Violin's new SSD - but about something which had come to my attention while I was googling the company just before our conversation.

When can we expect to see a picture of a naked man featured on a Vmem poster ad? - I asked.

He laughed and indicated it wouldn't be anytime soon.
high availability image -
The world's first hot-swappable SCSI SSD was launched in November 2000 by BiTMICRO.
SSD market history
SSD ad - click for more info
Why should a customer pay for your version of RAID when they have another (better) way of doing it?
Decloaking hidden SSD segments in the enterprise
SSD ad - click for more info

Bottlenecks in the pure SSD datacenter will be more serious than in the HDD world - because responding slowly will be equivalent to transaction failure.
will SSDs end bottlenecks?


what happens if you pull out the power plug during intensive in-memory database transactions?

McObject shows in-memory database resilience in NVDIMM (October 9, 2013)

"Although flash storage is appearing nearly everywhere - according to Jim Handy, Objective Analysis - penetration is happening slightly slower in applications with high-resiliency requirements - because many support solutions are still unavailable from dominant storage vendors.
Exploring the Arguments for Flash-Based Storage
a blog in (January 23, 2015)

a new market for factory configured HA / FT SSDs
by Zsolt Kerekes , editor - January 26, 2012

It's always been relatively easy for users and systems integrators to configure high availability rackmount SSD systems by using legacy failover and clustering techniques designed for traditional FC SAN or IP SAN storage systems - so you may ask - why have a different directory page which is focused on factory designed HA SSDs?

The answer is:- fault symmetry (performance in the failed vs unfailed state), ease of use, risk, complexity, and scalability.

Customer designed fault tolerant wrap arounds usually operate outside the SSD controller loop. (The rare exceptions are big web / cloud entities like Baidu, Google etc.)

In cases - where the HA / FT scheme doesn't have native controller support - and simply engages data at the host interface level - these schemes incur considerable losses in latency and failure recovery time compared to systems where the HA fault tolerant architecture has been designed inside the SSD system - and is aware of what's happening between the host interface and the SSD memory arrays.

And customized HA SSD designs can introduce software complexities and controller configuration issues - because even if the native SSD systems look like virtual storage - the FT wraparound introduces its own peculiar characteristics.

Anyone who has done a formal hazard analysis or failure analysis in a critical industry knows that it's all too easy to think that a particular FT problem has been solved whereas in fact there are still common modes of failure.

One of the invisible risks of "configure your own" HA arrays is that the user may incur the cost of assembling a DIY HA configuration only to discover that when a fault does occur - their solution became part of the problem instead of solving it.

That's another reason that factory designed HA SSDs are superior. They reduce risk - due to the fact that they have been designed by people who spent more time thinking about the problems than you can afford to do yourself.

Vendors I've spoken to in the HA SSD market are excited that their products will open up new businesses - but a particular concern - first voiced to me in November 2011 by Don Basile, CEO of Violin was that HA SSDs could just get lost amidst a sea of other SSD announcements.

And if you're reading through a bunch of pages which talk about SSD performance and see some latency and performance figures for an HA SSD in the wrong context - you may well think - that doesn't sound so great - whereas in the context of a protected performance metric - it may instead be truly amazing.

In my past 20 years of publishing enterprise buyers guides - I've developed an instrinct for judging when the market is ready for a new focused directory. Sometimes I've been too early - but with the memontum in the SSD market and the number of HA SSD vendors dipping into double digits - I think this is exactly the right time for a new directory.

how fast can your SSD run backwards?
SSDs are complex devices and there's a lot of mysterious behavior which isn't fully revealed by benchmarks, datasheets and whitepapers.
SSD symmetries article Underlying all the important aspects of SSD behavior are asymmetries which arise from the intrinsic technologies and architecture inside the SSD. to read the article

Overview of PCIe topologies for enterprise SSDs
Editor:- July 17, 2013 - Taking PCIe Out of the Box - is just one of the many different topologies discussed in a new white paper about PCIe and enterprise storage from PLX Technology called - Enterprise Storage and PCI Express
image from PLX article about PCIe SSD topologies

PCIe SSDs can be designed to be hot pluggable - whereas MCS (like DRAM DIMMs) doesn't support hot pluggability.
memory channel SSDs versus PCIe SSDs are these really different markets?


SSD news
SSD videos
SSD education
SSD Bookmarks
the Fastest SSDs
the SSD Heresies
the SSD Buyers Guide
SSD Jargon Explained
SSD Reliability Papers
this way to the petabyte SSD
sugaring MLC for the enterprise
Data Recovery from Flash SSDs?
RAM Cache Ratios in flash SSDs
Surviving SSD sudden power loss
Big versus small - SSD architecture
flash SSD capacity - the iceberg syndrome
Power, Speed and Strength in SSD brands
the Problem with Write IOPS - in flash SSDs
SSD Myths and Legends - "write endurance"
Data Integrity Challenges in flash SSD Design
Market Trends in the Rackmount SSD Market
an introduction to SSD Data Recovery Concepts
the future of enterprise data storage (circa 2020)
RAM SSDs versus Flash SSDs - which is Best?
principles of bad block management in flash SSDs
Can you tell me the best way to get to SSD Street?
How Bad is - Choosing the Wrong SSD Supplier?
what will set the tone of the SSD market in 2012?
Clarifying SSD Pricing - where does all the money go?
Can you believe the word "reliability" in a 2.5" SSD ad?
if Fusion-io sells more does that mean Violin will sell less?

SSD ad - click for more info

classic SSDs from SSD market history
click for more info about the revolutionary auto tuning XcelaSAN SSD accelerator from Dataram
XcelaSAN is a "revolutionary" self optimizing
2U enterprise SSD accelerator
from Dataram
Nowadays there are many companies in the auto-caching enterprise SSD market. But the first systems product was the XcelaSAN - launched in September 2009 - which was advertised here on

storage search banner

STORAGEsearch is published by ACSL