| 
| 
|  |  
| CAEN launches high
availability 2U FC AFA 
 Editor:- September 15, 2017 - CAEN  Engineering today
introduced the
CAEN
CEI-826-FXD - a 2U  10GbE / 16G FC  AFA  (with 26 native 12Gb SAS bays)  for
applications such as  big data, HPC,   Hadoop etc.
 
 The CEI-826-FXD's
dual-active controller architecture enables both controllers to concurrently
provide storage services in real time. Active-Active architecture doubles the
available host bandwidth and cache hit ratio, ensuring the greatest utilization
of system resources and maximum throughput. If one controller fails, the other
controller transparently takes over all storage services. In addition to storage
services, management services can transparently pass to the secondary
controller.
 
 The CAEN array offers high availability with no single
point of failure. All critical components are hot pluggable andengineered with
full redundancy. Thanks to this robust design, this system can withstand
multiple component failures and achieves 99.999% availability. The CAEN
CEI-826-FXD solution supports RAID levels 0 ,1 ,0+1 ,3 ,5 ,6 ,10 ,30 ,50, 60,
and N-way mirror.
 
 
 Nimbus talks about SAS SSD array sauce
 
 Editor:-
August 10, 2017 - In a new article on StorageSearch.com -
sauce for the
SSD box gander - I discuss Nimbus's recent  entry into the merchant  SAS SSD
controller market.
 
 Based on conversations with Nimbus's CEO I asked 
among other things - What's the new business plan for Nimbus? Does selling
drives replace selling systems? ...read the
article
 
 
 HA lessons from BA?
 
 Editor:- June 15, 2017 - A
systematic  failure in fault tolerant architecture and processes at airline
BA  led to hundreds
of  UK  flights  being cancelled and delayed over the  holiday weekend  at the
end of  May. The scale of  disruption flashed  headlines in the  mainstream tv
and news outlets worldwide. But in the days during and immediately after the
story broke it seemed that  extracting a plausible explanation in the public
domain was like pulling  teeth from the definitely  reputation-damaged and
probably  litigation-sensitive airline.
 
 It was obvious to many experts 
from  the start that failure in  fault tolerant  architecture and human error
were  likely ingredients in the mix. But  I thought I'd wait  for a definitive
narrative to emerge before placing a note here. Because it can be useful  to
learn from the common mode failures of others.
 
 A
report
on Bloomberg - Engineer Pulled Wrong Plug provides a good summary
and says (among other things) - "An engineer had disconnected a power
supply at a data center near London's Heathrow airport, causing a surge that
resulted in major damage when it was reconnected."
 
 Editor's
comments:- to me that's a design architecture  fault and shows the failure
to  learn any useful lessons from the past 3 decades of enterprise computing
(and in particular the lessons from   companies affected by the terrorist
attrocity of     9/11). If BA  had  a disaster recovery plan it was   not fit
for purpose.
 
 
 some thoughts  about availability prompted by the emerging picture
of  Symbolic IO's revolutionary server storage architecture
 
 Editor:-
February 25, 2017 - In my comments to a  news story  this month about  Symbolic IO (which
led with a focus on performance and utilization efficiencies) I also identified
what I think may be unique  "availability"  susceptibilities which I
think  are  implicit in this kind  architecture. Among other things I said...
 
 All
new approaches have risks.
 
 I think the particular risks with Symbolic
IO's architecture are these:-
 
Unknown vulnerability to data corruption in the code tables. 
 Partly
this would be like having an encrypted system in which the keys have been lost -
but the effect of recovery would be multiplied by the fact that each raw piece
of data has higher value (due to compacting).
 
 Conventional systems
leverage decades of experience of data healing knowhow (and
data recovery).
 
 We don't know enough about the internal resiliency architecture in  Symbolic
IO's design.
 
 It's reasonable to assume that there is something there.
But all companies can make mistakes as we saw in server architecture with
Sun's
cache memory problem and in storage architecture when
Cisco discovered common
mode failure vulnerabilities in
WhipTail 's "high availability"
flash arrays.
 
...read the
article in SSD news February 2017Difficult to quantify risk of "false positive" shutdowns from the
security system.
 This is a risk factor which I have written about  in
the context of the  fast
purge SSD market. Again this is a reliability architecture issue.
 
 Editor's comments:-
thinking about this again from a high availability perspective - at the top
level  the fact that Symbolic IO  has designed   an efficiently  coded  server
which can replace many conventional servers  is itself a notable   
reliability gain. And Symbolic IO (being clever enough to solve the coding
architecture tradeoffs) is surely  clever enough to have given deep  thought to
a SPOF mitigation architecture too.  But  as Symbolic IO is still emerging from
stealth mode  we will just  have to wait  for details.
 
 after AFAs -
what's next?
 routes
to consolidation in the enterprise
 why do SSD
vendors still  fail to design  systems which meet needs?
 
 
 the new math of  AFA  fault tolerance
 
 Editor:-
November 14, 2016 - the old math regarding the
purchase price premium
of high availability storage was that more reliability requires 2x to 3x  more 
storage array hardware   and therefore  costs proportionately more.
 
 The
new math of faster SSD arrays supported by new SSD software is   they create
usable storage resources which can  deliver 
10x to 50x better
usable utilization of the raw storage than previously while also delivering
better application performance. Taken together with other TCO issuse the
hardware cost of providing good quality fast recovery can be  essentially free.
Because speed creates cost saving opportunities in other dimensions.
 
 This
was part of a discussion I had with a leading HPC flash array vendor about
changes in thinking throughout  the SSD market. For more about this  read my
article a
winter's tale of SSD market influences.
 
 
 Elastifile gets  patent for flash-aware adaptive cloud  scale data
management
 
 Editor:-  November 3, 2016  - Elastifile today 
announced
it has been granted a US patent (No. 9,465,558)  for  a method of  
flash-native, collaborative data storage when running on multiple interconnected
nodes.
 
 Elastifile's technology (which is integrated in  software
solutions) is aimed at the  hybrid cloud market.
 
 The patented
technology  enables efficient, distributed storage across full-mesh clustered
architectures in which all nodes interact with one another across multiple sites
and clouds, in complex or constantly varying network conditions, and/or at a
scale that may encompass thousands of diverse configurations.
 
 "One
of the greatest challenges for private and hybrid cloud data services has been
ensuring consistent performance for distributed data writing, especially due to
noisy and mixed environments," said Ezra Hoch, chief
architect at Elastifile. "Our patented approach adaptively and efficiently
manages how and where data is written, mitigating the constantly changing
conditionsat cloud scale."
 
 
 Microsemi's rad tolerant FPGAs orbit  Jupiter
 
 Editor:-
 September 20, 2016 - Microsemi
today 
announced
that its radiation-tolerant  FPGAs   are in use  on NASA's 
Juno Spacecraft within the
space vehicle's command and control systems, and in various instruments which
have now been deployed and are returning scientific data. Juno recently  entered
Jupiter's orbit after a 5 year journey.
 
 See also:-
Juno
mission (pdf),  
data
chips in space
 
 
 data dematerialization in the DIMM?
 
 Editor:- July 27,
2016 - Some of the big SSD ideas in recent years have been:-
One way to interpret the essence of  Symbolic IO's
architecture - which was partially unveiled in
May 2016 -  may
be  as a coming together of the 2 concepts in the same place...
 
 What
got me thinking this way was a recent blog  -
a
look at Symbolic IO's patents - by Robin Harris on
his site - StorageMojo.com .
 
 Symbolic IOs founder  Brian Ignomirello who  saw
and liked  Robin's post - said among other things  on 
linkedinpulse
- "yes we (do)  materialize and dematerialize data." ...read
the article
 
 Editor's comments:- That means less
hardware right? There's a reliability gain in there somewhere.
 
 
 PrimaryIO ships  applications aware FT caching
 
 Editor:-
March 8, 2016 -  PrimaryIO 
(which
changed
its name from CacheBox in
August 2015)
today
announced
  the general availability of its  Application Performance Acceleration  V1.0 (SSD  aware software)  for
VMware vSphere 6.
 
 PrimaryIO APA aggregates server-based flash storage
across vSphere clusters as a cluster-wide resource  and  supports write-around
and write-back caching with full
fault-tolerance
in face of node failures since writes to cache are replicated to up to 2
additional nodes.
 
 
 Datalight's SSD firmware  to boldly  go
 
 Editor:-
September 17, 2015 - Datalight
 today said   its  embedded   filesystem (Reliance
Nitro) and  FTL (FlashFX
Tera) have been  selected by NASA for use
onboard  future manned spacecraft in the  
Orion
program.
 
 
 3D TLC is   good enough  to last 7 years says  Kaminario
 
 Editor:-
August 21, 2015 - One of the early  
new SSD
ideas in 2014 was that  3D nand flash was  tough  enough to consider using
in industrial SSDs so  it was no surprise when 3D flash started to appear in
volume production of enterprise SSD accelerators such as 	Samsung's 10 DWPD 
NVMe PCIe SSDs in September
2014.
 
 So the recent 
announcement
by Kaminario
that it will soon ship 3D TLC (3 bits)  flash in its K2 rackmount SSDs can be
seen as a predictable  marker in the  long term trend of 
flash adoption in
the enterprise.
 
 Less predictable, than the price (under
$1,000/TB
for usable  systems capacity) however, is that  Kaminario is offering a  7
years endurance related systems warranty.
 
 This
factor  - discussed in a 
Kaminario
blog -  tells us more about Kaminario's customer base than it tells us about
flash endurance
however.
 
 Kaminario says its HealthShield "has been collecting
endurance statistics for the past few years, and from analyzing the data we see
that 97% of (our) customers are writing less than a single write per day  (under
1 DWPD) of the entire capacity."
 
 This is one aspect  of a trend I
wrote about a few years ago - 
thinking
inside the box - which is that designers of integrated systems have more
freedom of choice in their memories than designers of standard SSD drives -
because they have visibility and control of   more layers of software and  can
leverage other architectural factors.
 
 A competent   box level SSD
designer can make better  decisions about how to translate raw R/W intentions
(from the host)  into optimized  R/W activities at the   flash .
 
 This
is  especially the case     when the designers are  also collecting raw data
about the workloads used in their own customer bases. The customer experience is
more important than slavishly designing systems which look good in artificial
benchmarks.
 
 
 "more lanes of SAS   than anyone else" -  new 4U
SavageStor
 
 Editor:- July 28, 2015 - As the  
rackmount SSD market
heads towards
future
consolidation - new business  opportunities are being created for  those
brave hardware companies which accept  the challenge  of providing simple 
hardware platforms (which provide high density or efficiency or performance or
other combinations of valued technical features   optimized for known use cases)
 while also  being  willing to sell them unbundled from expensive frivolous
software.
 
 In that category -  Savage IO today 
launched
its SavageStor  - a 4U 
server storage box - which  - using a COTS array of hot swappable SAS SSDs - 
can provide upto 288TB flash capacity with 25GB/s peak  internal bandwidth  with
useful RAS features for embedded systems integrators who need high  flash 
density in an untied / open platform.
 
 Savage IO says it "products
are intentionally sold software-free, to further eliminate performance drains
and costs caused by poor integration, vendor lock-in, rigidly defined
management, and unjustifiable licensing schemes."
 
 Editor's
comments:- I spoke to the company recently and most  of you will
instantly know
if it's the right type of  box for you or not.
 
 
 High Availability Thinking in Pure's Flash Arrays
 
 Editor:-
June 7, 2015 -
Purity:
Building Fast, Highly-Available Enterprise Flash Storage from Commodity
Components (pdf) by   authors at Pure Storage   
describes  several interesting aspects of   Pure's flash arrays which internally
use consumer grade 
SSDs.
 
 The  paper     presented   at  
SIGMOD 2015 says  among
other things-
 
Purity can  tolerate the loss of  2 SSDs without losing availability. Pure 
encourages potential customers to pull drives and unplug controllers as part of
their evaluation. 
Due to efficiencies
in deduplication Pure's customers on average provision approximately 12x
more virtual space than physical storage. 
Commenting on the
differences
in capability  between flash management which is possible seen from a single
drive level and a  global array level the authors say - Pure's  controller has a
global view of the workload and much more computational power than the SSD FTL,
allowing it to apply optimizations and make global decisions that the drives are
incapable of.
 Pure  says that  "improvements"  in  consumer solo  drive  
benchmarks do not
always follow through to deliver better performance in the managed enterprise 
array context.  Sometimes the optimized drives perform worse in a Pure array.
 
...read the
article (pdf)Re quality of service    Pure says that to  avoid application failures
during controller failure, they  have to guarantee that recovery will complete
in under 30 seconds. 
 
 Nimble video discusses 5 9's in 5,000 systems
 
 Editor:-
February 21, 2015 - Nimble
Storage recently disclosed (in a 
sponsored video fronted
by ESG)  that its 
customer deployed rackmount storage systems are achieving better than
5
9's uptime - 99.999%
availability.
 
 
  This
has been attained in a  field population of 5,000 arrays representing 1,750
years of system run time thanks  to a combination of factors including the crowd
sourced intelligence of its
InfoSight
management system which can alert users to potential down time events so they
can take evasive action  before bad things happen. 
 Editor's
comments:- While useful in telling us how many systems Nimble has sold it's
less useful as an indicator of availability given that the average run time
across the population is about 4 months.
 
 It would be more impressive
if they could repeat the disclosure in a few years time and  selectively extract
the up-time of systems  over different run times, upto 1 year, 1 to 2 years etc.
 
 If indeed Nimble is still in a position to do so, and if it would 
still meaningful   given the
consolidation
in hardware and software which  lies ahead for the enterprise SSD market may
mean that   vendors will be  using the same  hardware.
 
 
 shared vulnerabilities may be another factor in pausing Cisco's
UCS Invicta shipments
 
 Editor:-  October 24, 2014 - The discovery of
  single points of failure which could compromise  the availability of  the
rackmount SSD
family acquired
by Cisco last
year - are among several  design issues contributing  to the continuing pause in
shipments - according to
reports
by  CRN.
 
 
 HA SSD arrays - are now mainstream
 
 Editor:-  October
13, 2014 - I've long  had an abiding   interest in the architecture of  fault
tolerant / high availability  electronic systems - ever since learning that such
concepts existed - when   (in about 1976)    our   digital systems design
lecturer Dr
R G 'Ben' Bennetts  at  Southampton
University  suggested we should  read a paper about how
NASA's Jet Propulsion Labs used triple
modular redundancy.
 
 (I can't remember the details  of that paper - but
the JPL people and their collaborators and descendants have never stopped 
inspiring  and writing a rich  literature about the design aspects of computer
systems which operate a long way from a service engineer.)
 
 In the early
part of my career - such ideas were good to know about - but far too exotic and
expensive to incorporate into most products. But  I was reminded about them  in 
the 1990s -  when in the publication
which preceded StorageSearch.com  -   some of my  customers were  advertising
their FT/ HA SPARC servers for the telco market.
 
 The more you
investigate the architecture of FT/ HA computer systems the more you realize
it's a philosophy rather than a technology which you can  implement  as a plug
and play  inconsequentially  within the cost goals of mere mortals.
 
 The
results are always  compromises  - which balance reliability (aka functionable
survivability)  against other tradeoffs -  such as  performance. (And
performance itself has many internal 
dimensions of
fault tolerance too.)
 
 Violin's   6000  SSD     and HA
 
 3
years ago (in September
2011) when I was talking to Violin's CEO (at that
time) Don
Basile about the launch of Violin's first 6000 series  (the first no
single point of failure,  competitively priced, fast flash rackmount SSD)   he
expressed some concern about  how I would tell you (my readers)  what was unique
about this product and signal whether it was relevant to you or not - as it was
competing for attention with thousands of  other SSD stories for applications
ranging from phones to drones.
 
 I didn't see that as a problem - because
my readers are smart - and I had been publishing a directory page dedicated to
SSD Reliability
since 2008.
 
 But just to make sure that the systems embodiments of
FT/HA/SSD architecture  from a growing base of competitors didn't get washed
away by other  stories - I launched
a dedicated ft/HA
 enterprise SSD directory in
January 2012 - to
 serve an emerging  base of  reliability focused  readers - which  in those days
measured  around  10,000 readers / year in  that  niche topic. (Until recently
HA SSDs have rarely entered the
top 30 SSD
articles viewed by my readers.)
 
 But something in the market has
changed.
 
 I noticed this week that the topic of
HA/FT SSDs
has  risen to be 1 of the top 10 topics that you've been  looking at this month.
Which means it's  mainstream.
 
 Looking back at other past niche
topics...
 
 10 years ago I didn't think that more than a few hundred 
people would be interested in the intricacies of
flash endurance.
 And to begin with - SSD vendors were nervous  about even    acknowledging that
there was such a thing as SSD wear out. Now you can't shut them up. They all
want to show you how clever they are at handling it
 
 The  different
types of flash memory and
different generations of arcane
flash care schemes  spawned a huge industry literature of understanding and
misunderstanding  - so I wouldn't be surprised if  the  enterprise FT/HA flash
array market  now started to do  something similar.
 
 PS -  After a
communications  gap of 37 years  -  I exchanged some emails with my old
university lecturer - Ben Bennetts  while writing this - to see if  I had
remembered things correctly.
 
 He said - "Yes, that was me. I
lectured on fault-tolerant systems and JPLs Self-Test And Repair, STAR,
computer, based on triple modular redundancy, used to feature in my
presentations."
 
 So  that  enables me to pin point the original
source  of that inspirational IEEE Transactions   paper  about fault tolerant
computing  - which I  remember having read in 1976 (although I haven't read it
since)  to   Prof.
Algirdas Antanas Aviienis - whose visionary work on  - what is
today called -  "Dependable Computing and Fault-Tolerant Systems" - 
continues today.
 
 
 You don't need to worry about the endurance of our FlashSystems -
says  IBM
 
 Editor:-  October 7, 2014 - Worried about 
endurance?
 
 "None of the thousands of
FlashSystem
products (fast rackmount SSDs) which IBM has shipped has ever
worn out yet!  - says Erik
Eyberg, Flash Strategy & Business Development at IBM - in his new
blog -
Flash
storage reliability: Aligning technology and marketing.  "And our
metrics suggest that will remain true in almost all cases for many, many years
(certainly well beyond any normal and expected data center life cycle)"
 
 Erik
goes on to explain that's the reason  IBM can  now officially cover flash
storage media wear-out as part of its standard IBM FlashSystem warranty and
maintenance policies - without changing the prices for these services.
 
 And
his blog has a
link
to a white paper about the reliability architecture underlying  this product
(although it's behind a sign-up wall - which seems counter productive to me.)
 
 Editor's
comments:- Don't expect all other flash array vendors to follow suit (with
no cost  endurance guarantees) - because this product range from IBM is based on
design rules and memory reliability architectures experience  in FC SAN
compatible enterprise SSD racks which have  evolved  since the 1st generation
RamSan from TMS  (in
2000).   And for more than a decade
before that
 using other popular enterprise storage interfaces.
 
 Holly Frost - who founded
Texas Memory Systems - and who was the CEO when TMS was acquired  - told me a 
revealing  story about TMS's policies concerning the reliability of their  SSD
systems and customer care procedures.
 
 This conversation  took place 
in  December 2011
- when the company was  launching  its first high availability SSD - which
became  the basis of IBM's FlashSystem.
 
 It still makes interesting
reading today. You can see it  in
this  article -
in the right hand column - scroll down to the box titled - "no single point
of failure - except..."
 
 
 HGST announces 2nd generation clustering software for  FlashMAX
PCIe SSDs
 
 Editor:-  September 9, 2014 - HGST today 
announced
a new improved version of the
high availability
clustering capability previously available in the
PCIe SSD product line
acquired last year from Virident.
 
 HGST's
Virident Space
allows clustering of up to 128 servers and 16 PCIe storage devices to deliver
one or more shared volumes of high performance flash storage with a total usable
capacity of more than 38TB.
 
 HGST  says  its Virident HA provides a "high-throughput,
low-latency synchronous replication across servers for data residing on FlashMAX
PCIe devices. If the primary server fails, the secondary server can
automatically start a standby copy of your application using the secondary
replica of the data."
 
 For more details see - 
HGST
 Virident Software 2.0 (pdf)
 
 Editor's comments:- This
capability had already been demonstrated last year - and 
ESG  reported on the
technology in January
2014.
 
 But at that time - the clustering  product called vShare -  
was restricted to a small number of servers - and the data access fabric  was
restricted  to Infiniband
only.
 
 With the rev 2.0 software - the number of connected devices has
increased - and users also have the lower cost option of using
Ethernet as an alternative
supported fabric.
 
 
 say hello to high availability CacheIO
 
 Editor:- June
10, 2014 -  CacheIO
today
announced results of a
benchmark  which is
described by their  collaborator Orange
Silicon Valley (a telco)  as -  "One of the top tpm  benchmark results
accelerating low cost iSCSI
SATA storage."
 
 CacheIO says that the 2 million tpm  benchmark on
CacheIO accelerated commodity servers and storage shows that users   can   
deploy its  flash cache to accelerate their database performance without
replacing or disrupting their existing servers and storage.
 
 Editor's
comments:- The only reason I mention this otherwise me-too sounding  
benchmark is because although I've known about CacheIO and what they've been
doing with various organizations  in the broadcast and telco markets for over a
year - I didn't list them on  StorageSearch.com before.
 
 That was 
partly because they didn't want me to name the customers they were working with
at that time - but also  because with
SSD caching companies
becoming  almost as numerous as tv channels on a satellite dish - I wanted to
wait and see if they would be worth a repeat viewing. (And  now I think they 
they are.)
 
 PS - I asked Bang Chang,
CEO of  CacheIO if he had a white paper which talked more about the company's
cache  architecture and philosophy. He sent me this -
CacheIO High
Availability Deployment (pdf) - from which I've extracted these quotes...
 
re network cache appliances - "At CacheIO we believe that network
cache appliance is the best storage architecture to decouple performance from
capacity and achieve the best of both worlds. 
 Once deployed as a "bump
in the wire" performance accelerator, our network cache appliance can also 
deliver additional value added services... Compared to server-side Flash cache,
our network cache appliance is a shared resource that is more scalable, more
reliable, supports clustered applications, and most importantly allows
customers, especially cloud service providers, to monetize performance by
dynamically allocating resources based on changing SLAs."
 
I found it 
interesting to see that in addition to conventional connections (SAN and
InfiniBand) their HA
paper also mentions emerging  PCIe fabric.re operational transparency - "Implementing CacheIO network appliance
requires no change to existing applications, servers, or storage. CacheIO can be
slotted in, turned on to accelerate applications, and turned off if necessary,
often without needing to stop the applications."  
 
 new blog by PernixData describes the intermediate  states of play
for its    HA clustered  write acceleration  SSD cache
 
 Editor:-
November 5, 2013 - In a clustered, 
SSD ASAP VM 
environment which supports both read and write acceleration it's essential to
know the detailed policies of any products you're considering - to see if the
consequences - on data vulnerability and performance comply with strategies
which are acceptable for your own intended uses.
 
 In a  new blog -
Fault
Tolerant Write Acceleration by Frank Denneman
Technology Evangelist at PernixData
describes in a rarely seen level of detail  the various states which his
company's FVP  goes through when it recognizes that a fault has occured in
either server or flash. And the blog describes the temporary consequences - such
as loss of acceleration - which occur until replacement hardware is pulled in
and configured automatically by the system software.
 
 Stating the design
principles of this product - Frank Denneman says - "Data loss needs to be
avoided at all times, therefore the FVP platform is designed from the ground up
to provide data consistency and availability. By replicating write data to
neighboring flash devices data loss caused by host or component failure is
prevented. Due to the clustered nature of the platform FVP is capable to keep
the state between the write data on the source and replica hosts consistent and
reduce the required space to a minimum without taxing the network connection too
much."  ...read
the article
 
 SSD ASAPs - auto tiering /
caching appliances
 high availability
enterprise SSDs
 
 
 McObject shows  in-memory database resilience in NVDIMM
 
 Editor:-
October 9, 2013 -  what happens if you pull out  the power  plug  during
intensive in-memory database transactions?  For those who don't want to rely on
batteries - but who also need ultimate  speed - this is more than just an
academic question.
 
 Recently on these pages I've been talking a lot
about a new type of 
memory channel
SSDs which are hoping to break into the application space owned by 
PCIe SSDs. But another
solution in this area has always been DRAM with power fail features which save
data to flash in the event of
sudden power
loss. (The only disadvantages being that the memory density  and cost are 
constrained  by the nature   of DRAM.)
 
 McObject  (whose
products include   in-memory database software)   yesterday 
published the results of
benchmarks using AGIGA
Tech's NVDIMM in which
they did some  unthinkable things which you would never wish to try out for
yourself - like  rebooting the server while it was running... The result?
Everything was OK.
 
 "The idea that there must be a tradeoff
between performance and persistence/durability has become so ingrained in the
database field that it is rarely questioned. This test  shows that mission
critical applications needn't accept latency as the price for recoverability.
Developers working in a variety of application categories will view this as a
breakthrough"  said Steve Graves, 
CEO McObject.
 
 Here's a  quote  from the whitepaper -
Database
Persistence, Without The Performance Penalty (pdf) -  "In these tests 
eXtremeDB's inserts and updates with AGIGA's  NVDIMM for main memory storage
were 2x as fast as using the same IMDS with transaction logging, and
approximately 5x  faster for database updates (and this with the
transaction log stored on RAM-disk, a solution that is (even)  faster than
storing the log on an SSD).   The possibility of gaining so much speed while
giving up nothing in terms of data durability or recoverability makes the IMDS
with  NVDIMM combination impossible to ignore in many application categories,
including capital markets, telecom/networking, aerospace and  industrial
systems."
 
 Editor's comments:- last year McObject
published a paper showing the benefits of using PCIe SSDs for the transaction
log too. They seem to have all angles covered for mission critical ultrafast
databases that can be squeezed  into memory.
 
 
 OCZ ships PCIe SSD based SQL accelerator
 
 Editor:-
July 23, 2013 -  OCZ
today
announced
 the general availability of its
ZD-XL SQL
Accelerator - an SSD
ASAP appliance - delivered as a PCIe SSD  (600GB, 800GB or 1.6TB) and
bundled software - which  optimizes  caching of SQL Server data in Windows 
environments - and can provide upto 25x faster  database performance.
 
 HA
functionality works through Microsoft SQL Server AlwaysOn technology, so that 
in the event of planned or unplanned downtime, can continue operations from the
stopping point, retaining all of its data as if no downtime had occurred.
 
 "We believe that the industry is primed for this type of tightly
integrated, plug-and-play use-case acceleration solution..."  said Ralph Schmitt,
CEO  - OCZ Technology.
 
 Editor's comments:- One of the  
differentiators  in     SSD caching products is the sophistication of their
behavior when viewed from a time basis. This is 1 of the
11 key SSD
symmetries - which I call "age symmetry".
 
 In this respect
- a   key feature of ZD-XL SQL Accelerator is its business-rule pre-warming
cache engine and cache warm-up analyzer that monitors SQL Server workloads and
automatically pre-loads the cache in advance of critical, demanding or important
SQL Server jobs. It achieves this by identifying repeated access patterns that
enable DBAs to set periodic time schedules to pre-load the cache.
 
 This
product won  Best of Show Award at an event called Interop in
May.
 
 
 HA Support in Fusion-io's ION SAN kit
 
 Editor:- August
2, 2012 - Yesterday - Fusion-io
launched
 its    new   ION software
- which is a toolkit for bulding your own  network compatible
SSD rack by 
adding some Fusion-io SSD cards and their  new software to any leading server.
 
 The concept isn't entirely new - because  oems have been doing this
with various different brands of
PCIe SSDs for years
and this is a well
established alternative  market segment  for PCIe SSDs.  What is new - is
that it makes the whole thing much easier.
 
 Fusion-io   says this new
software product   "delivers breakthrough performance over
Fibre Channel,
InfiniBand and
iSCSI using standard
protocols." (1 million random IOPs (4kB), 6GB/s throughput and 60
microseconds latency in a 1U rack.)
 
 It also supports fault tolerance
between racks.
 
 
 HA support  in OCZ's PCIe SSD  software
 
 Editor:- July
3, 2012 - OCZ
published a white paper today  - 
Accelerating
MS SQL Server 2012 with OCZ Flash Virtualization (pdf) which describes   
the performance of the  company's
PCIe SSDs (Z-Drive R4)
and  its
VXL
caching and virtualization software in this kind of environment.
 
 The
interesting angle  (for me) was in the aspect of 
SSD  fault
tolerance rather than the 16x VM speedup.
 
 The paper's
author Allon Cohen 
(who has written many thought provoking
performance blogs) 
explains in this paper - "VXL software has a unique storage virtualization
feature-set that enables transparent mirroring of SQL Server logs between 2
flash cards, thereby assuring that the log files can be accessed with ultra high
performance, while at the same time, are highly available for recovery if
required." ...read
the article (pdf)
 
 
 SSD FITs & reliability
 
 Editor:- June 20, 2012
-the component level isn't always the best level of abstraction in modeling
enterprise SSD reliability.
 
 Extrapolating from the  single SSD
component level  can give you a misleading idea - because SSDs are data
architecture components.
 
 A
recent article on my SSD news
page on this subject started with an email from a reader who knew a lot more
about SSD component reliability than me.
 
 
 GridIron's SSDs can serve hundreds of concurrent  databases
effectively
 
 Editor:- May 30, 2012 -  GridIron Systems 
describes the setup required to exceed 1 million (4kB) IOPS in a 40x  MySQL
environment with mirroring - all  in a single cabinet (including servers) using
its FlashCube
SSD systems (upto 80TB in this configuration), and some 10GbE and 16GbFC
fabric switches  in a new 
whitepaper
(pdf)
published
today.
 
 "In large-scale MySQL environments it's not uncommon to see
hundreds or even thousands of database servers," said Dennis Martin,
President of Demartek
(which tested this configuration).  "This   reference architecture opens a
new, more efficient architectural approach for serving increasing numbers of
users and database queries per cabinet."
 
 
 Pure Storage unveils new HA deduped array
 
 Editor:-
May 16, 2012 - Pure
Storage today  
unveiled
a new generation of fast-enough
(100K write IOPS) 
HA/FT SSD arrays
today -  with upto 100TB compressed capacity - which are clustered around
InfiniBand.
 
 
 new article on Enterprise SSD Array Reliability
 
 Editor:-
March 1, 2012 - Objective Analysis
has published an article  -Enterprise
Reliability, Solid State Speed (pdf) - which examines the conflicts which
arise from wanting to use SSD for enterprise acceleration - while also
preserving data  protection in the event of SSD failure.
 
 New approaches
and architectures are required - because traditional methods can  negatively
impact performance - or -  as in the case of  RAID - don't always work.
 
 "RAID is configured for
HDDs that fail
infrequently and randomly. SSDs
fail rarely as well, but fail predictably" says the author Jim Handy -
who warns that "SSDs in the same RAID and   given similar workloads can be
expected to wear
out at about the same time."
 
 He examines in detail one of the
many new  aproaches to high availability enterprise SSD design - that's used in
Kaminario's
K2.
...read
the article (pdf)
 
 See also:-
the SSD reliability
papers, storage
reliability, high
availability enterprise SSD directory and
SSD market analysts.
 
 
 TMS packs  24TB fastest HA eMLC in 1U
 
 Editor:-
February 28, 2012 -  I was just getting used to getting the measure of how much
enterprise  flash capacity can fit into 1U rackspace - when Texas Memory Systems
changed things yet again by doing even more.
 
 TMS today 
announced a 24TB
high availability
system called the
RamSan-820.
This has  similar internal architecture to their 720  which
I discussed with
their  CEO Holly Frost  last December - but it uses
eMLC instead of
SLC - hence the doubling of the storage density.
 
 TMS today revealed
more about the internal features of their proprietary rackmount SSDs. Their
RamSan-OS has been in continuous development for over 5 years, initially
shipping with the RamSan-500
flash SSD  in 2007.
 The RamSan-OS is designed from the ground up to run on a cluster of CPU
nodes and FPGAs distributed throughout the RamSan systems.
 
 Speed
is still a core differentiator  from TMS.
 
 "Many of our competitors
claim they are software companies and that their products are Application
Accelerators.  While this may be fundamentally true, all TMS products are 2x
faster than any other Application Accelerators shipping today,"
according to TMS CEO Holly Frost. "It comes down to very simple
technical and business questions:  Why put key functions into slow software when
you can speed up these functions in fast hardware?"
 
 Power
consumption  is an important  part of the
reliability budget
- and to drive this point home TMS  say they are happy to supply customers with
a wattmeter so they can compare these new SSDs with competing products.
 
 
 Huawei Symantec publishes SPC-1 results for Dorado2100 SSD
 
 Editor:-
 January 12, 2012 -  Huawei
Symantec has published an
SPC
Benchmark report (66 pages pdf)  for its high availability FC SAN rackmount
SSD - the 
Oceanspace
Dorado2100.
 
 A 1 terabyte (approx) usable protected (mirrored) SSD
system (2.4TB raw) delivered over 100K SPC-1 IOPS at a market price of$0.90/SPC-1
IOPS.   Click
here for summary (pdf)
 
 Editor's comments:- these
SPC
reports are very technical and the  $ per SPC-1 IOPS  headline
figures include a lot of detailed factors including 3 years of 4 hour on-site
response warranty etc. But the documents also include   market prices for
everything which goes into these calculations. From which   we learn that a 
2.4TB Dorado2100 SSD system with 16x 8Gbps FC ports costs about $52,000. See
also:- SSD pricing
 
 
 Violin unveils naked  cost advantages in reliable SSD arrays
 
 Editor:-
September 27, 2011 -  
Violin Memory
today  announced
new models and options in its range of fast 
iSCSI /
FC SAN   rackmount  SSDs.
 
 The  new
6000
series -  designed for high availability applications with no single point
of failure and hot swappable "everything" -  provides 12TB SLC, or
22TB MLC usable capacity with 200/600  microseconds mixed  latency, 1 million /
500K sustained RAIDed spike free  write IOPS, in 3U rackspace at a list price
around $37K / $20K per terabyte.
 
 For less demanding applications (but
still featuring hot swap memory modules) the  company has also extended its
lower priced 
3000 series
to 16TB SLC usable capacity.
 
 Editor's comments:- when I spoke
to Violin's CEO - 
Don
Basile about the new 6000 series   he was curious about how I would tell
you what's unique about this product and signal whether it's relevant to you or
not.
 
 I said - when it comes to reliability -
you've either got it - or you haven't - and there aren't too  many enterprise
SSD systems which have hot-swap everything. That's one of the reasons the
latency looks slow - compared to many other fast SSDs - because the figures
quoted here include the latency of the internal factory built  protection
schemes.
 
 Another angle - I said is   your product is an example of
"big SSD
architecture". When I explained what I meant -  Don agreed and said
what it means   for the customer is
lower price.
Because when you look at the raw      capacity that's lost to over-provisioning
and RAID like protection
and get down to the usable capacity that the customer sees in an MLC rack - say
- then Violin's 6000 delivers about 70% of the raw capacity - versus nearer to
30% in an array of 2.5"
SSDs  for example. That confers a 2 to 1 native cost and density
(SSD TB/U) advantage.
 
 I said Violin's density looks good too - compared
say to Kaminario's K2.
 
 I
also said - that our   SSD readers would recognize what was meant by "spike-free"
   IOPS - because of various
past articles
about this - and because another enterprise flash vendor -
Virident Systems -
had made that one of the
differences they
talk about compared to some other flash 
PCIe SSD companies. I
knew that in Violin's case that was due to their patented non-blocking write
architecture  - which was explained to me when their
first flash 
products came to market in 2008.
 
 Don said - that inside their
protection array they're actually doing 5x more IOPS than the customer
is seeing outside the box and on the datasheet - and that helps too.
 
 I
also asked about price - and where they were relative to $30K / TB - which is
the ballpark for this type of product - and you can see where Violin are above.
That's a competitive figure for a no SPOF SSD.
 
 I said that for people
who are serious about enterprise SSDs it's relatively easy to decide what
products you may  want to focus in on after just seeing a couple of simple
metrics.
 
 Don did also mention a  comparative write up -  about their
SSD versus another so called "tier 1" storage  solution - from
EMC.   Violin think it
makes them look  pretty good - but I can't understand why anyone cares how they
stack up to EMC -  who never understood  the  SSD plot  - which is why their (at
one time)  prime SSD  supplier
STEC has   had a bumpy
revenue   stream in recent years.
 
 I had one final question for Don -
which wasn't about Violin's new SSD - but about
something
which had come to my attention while I was googling the company just before
our conversation.
 
 When can we expect to see a picture of a naked man
featured  on a
Vmem
poster ad? - I asked.
 
 He laughed and indicated   it wouldn't be
anytime soon.
 |  |  
| . |  
|  |  
| . |  
| ... |  
|  |  | ... | 
|  |  
| 2 wheels good - 8 wheels better |  
| ... |  
|  |  
| ... |  
|  |  
| ... |  
|  |  
| ..
 |  
| 
| 
| Bottlenecks in the pure SSD
datacenter will be  more serious than in the HDD world - because responding
slowly will be equivalent to transaction failure. |  
| will SSDs end
bottlenecks? |  |  |  
| ...
 |  
|  |  
| ...
 |  
| 
| 
| a new
market for factory configured HA / FT  SSDs |  
| by
Zsolt Kerekes ,
editor - January 26,
2012 
 It's always been relatively easy for users  and
systems integrators to configure high availability rackmount SSD systems by
using legacy 
failover and clustering techniques designed for traditional 
FC SAN or
IP SAN storage  systems -
so you may ask - why have a different directory page which is focused on factory
designed HA SSDs?
 
 The answer is:- 
fault symmetry
(performance in the failed vs unfailed state), ease of use, risk, complexity,  
and   scalability.
 
 Customer designed fault tolerant wrap arounds
usually   operate outside the
SSD controller loop.
(The rare exceptions are  
big web / cloud 
entities like Baidu, Google etc.)
 
 In  cases - where the HA / FT
scheme doesn't have native controller support -  and simply engages data  at the
host interface level -  these schemes   incur considerable losses in latency and
failure recovery time compared to systems where the HA fault tolerant
architecture has been designed inside the SSD system  - and is aware of what's
happening between  the host interface and   the SSD memory arrays.
 
 And
 customized  HA SSD designs  can  introduce software complexities  and
controller configuration issues   - because even if the native SSD systems look
like virtual storage - the FT wraparound introduces its own peculiar 
characteristics.
 
 Anyone who has done a formal  hazard analysis   or
failure analysis in a critical industry knows that it's all too easy to think
that a particular FT  problem has been solved whereas in fact there are still
common modes of failure.
 
 One of the invisible risks of "configure
your own" HA arrays is that the user may incur the cost of assembling a DIY
HA configuration only to discover that when a fault does occur - their solution
became part of the problem instead of solving it.
 
 That's another
reason that factory designed HA SSDs are superior. They reduce risk - due to the
fact that they have been designed by people who spent more time thinking about
the problems than you can afford to do yourself.
 
 Vendors I've spoken to
in the HA SSD market are excited that their products will open up new businesses
- but a particular concern - first voiced to me in November 2011 by  
Don
Basile, CEO of Violin
 was that  HA SSDs  could just get lost amidst a sea of other SSD announcements.
 
 And if you're reading through a bunch of pages which talk about
SSD performance and
see some latency and performance figures for an HA SSD in the wrong context -
you may well think - that  doesn't sound so great - whereas in the context of a
protected performance metric - it may instead be truly amazing.
 
 In  my
past 20 years of publishing enterprise buyers guides - I've developed an
instrinct for judging when the market is ready for a new focused directory.
Sometimes I've been too early - but with the memontum in the SSD market and the
number of HA SSD vendors  dipping into double digits -  I think  this is 
exactly the right time  for a new directory.
 |  |  |  
| ...
 ...
 |  
|  |  
| ...
 ...
 |  
|  |  
| ...
 ...
 |  
|  |  
| ...
 ...
 ...
 |  
|  |  
| ...
 ...
 ...
 |  
|  |  
|  |  
| ...
 ...
 ...
 |  
|  |  
| ...
 ...
 ...
 |  
|  |  |