what's in a number?
a new shorthand to describe any SSD accelerated serverby Zsolt Kerekes,
editor - March 5, 2014
|In this article I
propose a new shorthand terminology which can usefully describe any
enterprise server in an SSD architecture - from an SSD software latency
envelope point of view - by a single rating number - from 0 to 7. |
need for a precise but efficient way to describe the performance of any server
(in a cluster, array, farm or cloud) from an
operational context - has become clear in recent conversations I've had about
clusters and groups of SSD enhanced servers - when trying to define briefly and
concisely what the exact minimum characteristics of each server has to be to
support various software defined configurations.
When you're having a
conversation - about complex SSD configurations on the web - or in an email - or
in a voice conference - it's important to be sure that everyone has the same
mental picture of what's going on.
With existing terminology - there
are important factors which become cumbersome when building up layers of these
concepts. Words get in the way. Pictures can help. But they become messy too -
as you scale up the numbers of servers which you're talking about.
article will propose a simple scheme which enables the essential
characteristics of an SSD enhanced server to be communicated in a simple manner.
It's aimed at architects who need to specify the assumptions they're making
about servers in the base sets of new system configurations. Hopefully it will
be useful to marketers and users too - when they have dialogs about the entry
level configuration assumptions for new software products and projects.
SSD market is good at this.
We've already invented lots of
the jargon - it's impossible to build on new concepts.
The new server
language is easy and doesn't require any knowledge about flash technology or
It has 2 variations:-
- a lean description - which says whether a function is available or not, and
The language also extends to include
boot options and fabrics.
- a rich description - which includes more detail - which may be needed in
complex high availability groups
The starting principle is to condense the
essential characteristics of any server into a number - which is extracted
from a matrix of key characteristics.
The example below gives you
the basic idea of the lean model.
There are 3 main columns which define
the main types of SSDs which may or may not be in the system, differentiated by
latency and market characteristics:-
channel SSD - low latency fast flash or other nvrm which fits into a DDR3,
DD4 or HMC socket and uses the RAM interface for non DRAM memory. (This includes
storage class memories such as Memory Channel Storage, Optane and Memory1 but
excludes flash backed DRAMs - aka
- PCIe SSDs - any
type of fast PCIe SSD. The fabric and boot options go in a different place.
- SAS SSDs and
SATA SSDs - these are
both in the same column - because from the apps architecture point of view -
they have similar connections and latency. Differences in porting and HA options
will go into a different part of the label.
|SSDs inside this server?
means no, 1 - means yes
||SAS / SATA
|what do these lean SSDserver
||how to read it|
||This server has no SSDs installed in the usual
It maybe only has a hard drive, or maybe boots from the
network or another type of (slower) SSD - like USB. A type "0" server
can play a part in some HA SSD configurations, as we'll see later.
||This server has SATA or SAS SSDs installed.|
software architect can depend on an entry level class of apps performance
||This server has PCIe SSDs installed.|
this part of the dscription we don't know what it boots from. But the software
architect can rely on a PCIe SSD type of performance.
||This server has both PCIe SSD and SATA/SAS SSD
It may be that the software architect is specifying this option
because they can tier between these different types and price bands of SSD. Or
it may be that the SATA/SAS SSDs are required for boot.
||This server only has memory channel SSDs
We might guess that it boots from the network or a hard
drive. We can infer this server is aimed at high performance.
||This server has memory channel and SATA/SAS SSD
At this stage we may guess that the different SSDs are tiered,
or maybe the slower SSD is simply there for boot and housekeeping.
||This server has memory channel and PCIe SSD
At this stage we may infer that the SSDs are tiered. It's
possible that the PCIe SSDs are also part of a fabric or HA scheme. We'll
confirm that in the next part of the identification scheme.
||This server has memory channel and PCIe SSD and
Although this looks like an unlikely configuration -
it may be that neither the memory channel nor PCIe SSDs are assumed to be
|adding more details|
lean rating above - 0 to 7 - is just one part of the picture.
- a higher number means - more assets installed, higher cost and more
On the other hand - if your software works in a lowly
rated SSDserver number - that means it will be affordable by more user
future-proofing the matrix
computer architects create an entirely new type of bus or socket into which a
different type of SSD can be installed - which is much faster say - than memory
channel - and which has markedly different characteristics - what can you do?
answer might be to add another column to the left - which means the numbers
would be in the range from 0 to F (hex), instead of 0 to 7.
the numbering scheme backwards compatible too.
What about the
In the market today - we have 4 general fabrics for SSD
So we can simply append these letters to the
server's lean number as in the following examples.
- GbE - Ethernet
- FC - Fibre-Channel
- IB - Infiniband
- PCIe - PCI Express
1/GbE (or I
prefer 1/E as the "Gb" is assumed by the context - is a
server with SAS / SATA SSD specified which uses ethernet as the fabric.
minimum configuration for a typical software defined storage cluster - might
look like this:-
Cluster(1E, 1E, 0E) - is a 3 server ethernet linked
cluster which includes 2 servers with SAS/SATA SSD inside plus a 3rd server
with either HDD or no drives.
2/E is a server which has PCIe
SSD inside - but which uses ethernet as the fabric.
2/FC is a
server which has PCIe SSD inside - but which uses fibre-channel as the fabric.
get the general idea.
how do we describe high availability?
it comes to specifying the assumed requirements to support the software in a
context - we need to be more specific about the number of devices we're saying
which must be present.
That's where the "rich" version of the
SSDserver description comes in.
For simplicity - we can use the same
type of table to construct the numbers - but instead of using binary values
in each cell we populate the matrix with the minimum number which is
required by the architecture definition.
What the system architect is
saying here is - you can have more - but the software won't deliver the quality
of services if you have less.
Here are some examples of how to create
the rich SSDserver shorthand base numbers.
|SSDs inside this server?
number required to support system
||SAS / SATA
|what do these rich SSDserver
number examples mean?|
||how to read it|
||This server has no SSDs installed in the usual
||This server has at least 1 SATA or SAS SSD
If this is also part of a high availability configuration -
you can infer something useful about the protection scheme by the fact that only
a single SSD is atached to this server.
||This server has at least 1 PCIe SSD but also 2
(or more) SATA/SAS SSD too.|
Is the architect saying that this server
is offering some kind of simple failover within the server at the SATA/SAS
drive level? That's when you need to read the detailed architecture notes.
||This server has at least 2 PCIe SSDs.|
1 is enough for speed - 2 is telling you that there's something else going on in
this system. Maybe the dual PCIe SSDs are supporting failover - or fabric. Time
to read the detailed plan.
level - if all your SSD projects look the same - the suggestions in this
article are simple and trivial.
But if you spend all day discussing the
design options for new system architectures - or if you're planning an entirely
new software package - and arguing about the merits and complexities of drawing
the support line at different sets of minimum capability boundaries - then
having a simple language on your whiteboard to describe the key SSD variations
in your server boxes - is essential.
the next level?
next level of abstraction is when you start with a population of SSDservers and
start to add different types of SSD system software.
That's the reason
you need to get the hardware level clear.
Because when you start to
analyze the business and market permutations you can get by installing
different "software defined functions" to different classes of
SSDserver boxes (some of which are viable but some of which aren't) you need
to be clear about the fundamentals.
As to what we'll be calling all
those new arrays of software defined SSD enhanced servers - when they work
together in tandem... and as to which ones merely emulate what has been
before - and which ones are indeed a new way of doing data processing
Those debates are still to come.
If you've found any of these ideas
interesting - then feel free to spread the word around and credit me and
I don't expect this to be the last word on this
subject. Rather, I hope it may be another new beginning.
replace words with numbers in a systematic way which enables useful analysis
- then complex "what-if" problems - become easier to talk about.
RAM cache ratios in
introduction to enterprise SSD silos
how fast can your SSD
analysis in SSD market forecasting
Why size matters in
SSD controller architecture
ratios and the enterprise software event horizon
|Don't worry honey. I'll be home soon with the
new wheels. What do you mean by? - I hope you picked something sensible...|
|how fast can
your SSD run backwards?|
|SSDs are complex devices and there's a
lot of mysterious behavior which isn't fully revealed by benchmarks, datasheets
and whitepapers. |
Underlying all the important aspects of SSD behavior
are 11 key
asymmetries in SSD design which arise from the intrinsic technologies and
architecture inside the SSD.
|The traditional big bang
ultrafast PCIe SSD doesn't seem as impressive as it used to be and instead is
viewed as a segment along a continuous performance continuum ...|
|what's changed in enterprise
|"Flash wear out still
presents a challenge to designers of high IOPS flash SSDs as the intrinsic
effects at the cell level get worse with each new (planar) chip generation.
Although 3D nand may be the turning point at which raw intrinsic
memory endurance stops worsening (or gets better) - 3D could also introduce
new types of failure mechanism and R/W distrurbance sensitivities too."
|SSD endurance - the
|Raw speed is no longer the
same guarantee to market success for SSDs as it once used to be. But since you
|the Fastest SSDs |
|Creating the climate for
The DRAM market's new clothes had long been invisible.
the SSD market was too preoccupied with lower hanging fruit in storage.
the secret is out and the effect of DIMM wars will brutal and swift and erode
decades of collective wisdom about the shape of next generation memory.
reasons for fading out DRAM|
| tiering between memory
|Editor:- December 8, 2016 - A new blog -
Tiering: the Future of Hyper Converged - by Adam Zagorski,
Marketing at Enmotus
- discusses how hyper-converged infrastructure has evolved along with the
associated impacts from data path latency and CPU overhead. Among other things
Adam notes that...|
"Very soon well have HCI clusters with
several tiers of storage. In-memory databases, NVDIMM memory extensions and
NVRamdisks, primary NVMe ultrafast SSD storage and secondary bulk storage
(initially HDD but giving way beginning in 2017 to SSDs) will all be shareable
across nodes. Auto-tiering needs a good auto-tiering approach to be efficient,
or else the overhead will eat up performance." ...read
where are we
heading with memory intensive systems and software?
|In the year 2000 no one
caught a cold from the Y2K bug - but 3 things did happen which would shape
enterprise server performance for the next 16 years.|
1 - hard drives
reached the latency limit set by waiting for a 15K RPM rotating platter (they
never got faster)
2 - 64 bit processor clock speeds reached almost
their maximum clock speeds (they got more cores and the pressure was for cooler
3 - in the RAM market - the fastest server motherboard
memory latencies in 2000 were similar to what they are today (in 2016)
all know that SSDs came to the rescue of latency constrained advances in
computing which had been stalled by (1) and (2) above.
help from software the next target is (3).
Why's DRAM so bad?
of us thought that DRAM was the gold standard for latency you can rely on
(unlike that cheap flipperty gibbet flash).
indeterminate latencies and the virtual memory slider mix - a new blog
(March 2016) from StorageSearch.com|
| How big will the SSD
market have to be for SSDs replace to replace hard drives in the enterprise?
How will it be possible? When will it happen?
|meet Ken - and the
SSD event horizon|