the Problem with Write IOPS - in flash SSDs
Do
you remember that famous misquotation from the movie Casablanca?
"Play it again Sam - as time goes by..."
Repeating
write operations in some apps and some flash SSDs can take orders of magnitude
longer than predicted by IOPS specifications. Time does indeed "go by"
- potentially discrediting a long established performance modeling metric. | |
| ..... |
|
by Zsolt Kerekes,
editor - StorageSearch.com
- December 16,
2009 |
"Random IOPS"
has been long established as a useful concept for modeling server
applications performance and
bottlenecks
when using different types of storage.
As long ago as 1993 -
the world's 1st
NAS company
Auspex Systems quoted a
figure of "675 NFS IOPS" to illustrate the capability of its NS 3000
SPARC based server.
And
in 1995
a RAM SSD
manufacturer called CERAM was quoting a figure of "2,000 IOPs" for
its SPARC compatible SSD.
Random R/W IOPS had become a very common
metric used in the marketing parlance of high end SSD makers by
2003.
And the good thing was... it didn't matter if your background was in
rotating disk storage - because the figures being quoted gave a realistic idea
of what you could achieve if you used SSDs instead.
In those days -
nearly all enterprise SSDs were "RAM SSDs" - and like hard disk arrays
- they had symmetric
R/W latencies and throughput performance. Therefore vendors didn't have to
differentiate between "read IOPS" or "write IOPS" - it was
rightly assumed that they were very similar. |
SSD IOPS - start of the
slippery slope to a devalued metric
In
2007 the
SSD IOPS performance picture was starting to get hazy - with some
flash SSD makers
quoting high IOPS specs which actually exceeded those of entry level RAM
SSDs.
This seemed too good to be true - especially as the price
difference in those days (between RAM and flash SSDs) was more than 20 to 1 for
the same capacity.
So I asked some leading SSD makers from both sides
of the fence to contribute short pieces for a very popular article called -
RAM SSDs versus
Flash SSDs. That article (published in August 2007) listed important
factual data about the state of the art. Data based on real products was
compared for RAM SSDs, flash SSDs and hard disks.
2 important points
were clarified by that article
1 - flash SSDs (generally) had much
worse write IOPS than read IOPS - and the way that some flash SSD vendors
quoted a "single" IOPS metric was confusing or misleading.
Following
publication of this article StorageSearch.com adopted a new editorial a
practice of differentiating whether IOPS figures quoted in vendor press releases
and datasheets were really "read", "write" or "combined"
(based on an assumed R/W ratio). Before that - some vendors had misleadingly
just quoted read IOPS (the best figure) without attaching the important "read"
appellation.
2 - for those already experienced with IOPS in HDDs and
RAM SSDs who wanted to model and predict flash SSD performance - the key metric
to look at was "write IOPS".
The critical effect of write
IOPS on overall application performance - was analyzed in a sub-article - by
Douglas Dumitru, CTO EasyCo - called
Understanding
Flash SSD Performance (pdf). In various tables Dumitru pointed out the
sensitivity of overall performance to write IOPS for various R/W mixes
encountered in real applications.
Back in those days - the ratio of
read to write IOPS for commercially available flash SSDs could be anywhere in
the region from 10-to-1 to 100- to-1 due to the integral asymmetries in nand
flash memory.
Quoting
"read IOPS" and "write IOPS" clears things up - but not for
long
For me as editor - that seemed to settle the issue for a
while. As long as stories related to enterprise flash SSDs always included
separate figures for read and write IOPS - (or pointed out when they were
missing) experienced readers could do useful mental filtering between different
types of products.
Personally I found it a useful shortcut to
ignore read IOPS data entirely and just focus on write IOPS to rank the
fastest products.
I assumed that the ratio of R/W IOPS - in the fastest flash SSDs - might
still improve in ensuring years - but would hover in the range 5 or 10 to 1.
But I was wrong.
Little more than a year later (in
November 2008)
- Violin Memory narrowed
that gap to 2 to1 in a 2U SLC SSD which had over 200K random Read IOPS and
100K random Write IOPS. And then, in November 2008,
Fusion-io started
talking about 1 to 1 R/W IOPS in one of its
PCIe SSDs.
And
this R/W IOPS symmetry soon after made an appearance in
2.5" SSDs - when
in April 2009 -
SandForce unveiled
its SF-1000 family of SSD
Processors with 30,000 R/W IOPS.
Where Are We Now?
You
might think that - with R/W IOPS symmetry in the fastest flash SSDs we can
now forget all about the underlying stuff that happens inside them. Just plug in
the numbers - compare to hard disk arrays or RAM SSDs - and the predicted
application speedups will follow as expected.
But that would be wrong
too. Because - as users are discovering when they test these newer super flash
SSDs in their applications - the IOPS model gives good results in some
applications (like video servers) - but is extremely unreliable at predicting
speedups in other applications (like traditional database transaction
processing).
The way that random write IOPS test software is written
can produce results which inflate the results for flash SSDs and produce
numbers which exaggerate performance.
For hard disks - writing to a
range of addresses which are spatially scattered is a good stress test -
because it forces head to head movements and rotational seeks- and these
latencies get included in the results. So random write IOPS for an HDD produces
a lower numeric result than repeatedly writing to a spatial addresses in close
proximity.
For enterprise flash SSDs with small amounts of RAM cache
the opposite is true. Writing to a range of addresses which are spatially
scattered is a bad stress test - because each write effectively goes to an
erased block in the storage pool. (Assuming active garbage collection and over
provisioning.) The resultant latency "measured" by the IOPS test is
simply the buffer or cache latency - and is much less than the time taken to
complete a write-erase cycle to the flash media. In contrast to the situation
with HDDs - writing repeatedly to a spatial addresses in close proximity in
skinny flash SSDs
produces a worse result - which can be orders of magnitude worse than measured
on the same device for "random IOPS".
If repeat writes to
the same small address range (same flash blocks) is interspersed with read
operations (the play it again Sam scenario - which occurs in database
tables) the performance outcome varies significantly between fat and skinny
flash SSDs. A fat flash SSD may produce results which more closely follow the
behavior seen in HDDs and RAM SSDs - as predicted by the write IOPS spec. But
performance in most skinny flash SSD designs will collapse. Not only will it be
orders of magnitude lower than expected from the write IOPS spec - but it may
only be 3x better than an HDD - instead of 100x or 1,000x - predicted by the
spec for the product.
The paradox for the potential use - is that 2
different flash SSD designs which have apparently identical throughput, latency
and random IOPS specs in their datasheets - can behave in a way that is orders
of magnitude different in real applications.
The sensitivity is due to
the type of application and the implementation of the RAM cache algorithms in
the SSD. None of these differences are explicitly stated in benchmark
performance figures.
That's because traditional storage benchmarks
and test suites don't model the type of behavior which correlates with common
applications - and instead of stressing the weakest parts of the SSD design -
they play to the tune of the SSD designer who has focused on key metrics which
make the product look good in such benchmark tests.
Conclusions
Random
R/W IOPS data for flash SSDs is a poor predictor of likely application
performance in some common types of enterprise server applications. Users need
to proceed cautiously when shortlisting SSD products which are only based on
paper or benchmark evaluations. New types of SSD benchmark tests are needed
which stress the weakest parts of flash SSD designs - instead of recycling
storage I/O tests designed for HDDs or RAM SSDs.
For a unified overview
of SSD architecture see -
11 key symmetries in
SSD design. |
| |
Can you trust flash SSD
specs & benchmarks? the
Importance of Write Performance to RDBMS Apps Calling for an
End to Unrealistic SSD vs HDD IOPS Comparisons Legacy vs New Dynasty
- the new way of looking at Enterprise SSDs
|
| . |
|

| |
|
|
| .. |
 |
| .. |
| Be wary of arguments for
enterprise SSD adoption which cite IOPS per dollar (or the other way round) as a
justification for filling a gap in some cleverly drawn curve. I've seen this
kind of thing from leading SSD companies who should know better. |
| Clarifying SSD Pricing -
where does all the money go? | | |
| .. |
RAM SSDs the Fastest SSDs the Top 20 SSD companies Are you ready to
rethink RAM?
|
| . |
| getting 100% of 400K IOPS
is better than getting 25% of 500K IOPS. What really matters is attainable
performance and realizable IOPS at given load/latency. |
| IOPS
schmIOPS! (pdf) (2013) | | |
| . |
what changed in SSD
year 2015? RAM
Cache Ratios in flash SSDs how fast can your SSD
run backwards? flash wars in the
enterprise SSD market factors which influence
and limit flash SSD performance |
| . |
|
| . |
Zsolt re your article
the Problem with Write IOPS - in flash SSDs |
| comments by:-
Ron
Bianchini
CEO, Avere
Systems |
This article raises an
excellent point regarding SSD performance, testing and fundamentally, their
design.
The first SSD storage devices were very transparent in their design,
allowing for very simple tests, like Random R/W, that could clearly highlight
the differences between vendors. Modern SSD storage devices are much more
complicated and include performance enhancements, like data caches, that can
hide any performance deficiencies of the underlying media under certain bounded
workloads.
At this point, the SSD storage device must be treated as a system,
rather than as a transparent single media type.
Performance is now a
function of the underlying storage media and the algorithms used to manage the
various media in the system (including any internal caches).
At Avere Systems, we faced this exact problem when trying to evaluate
flash SSD media for use in our Avere
FXT NAS server.
Without a uniform set of benchmarks, we needed to test multiple solutions to see
how they performed under load in our system. Only then could we make an
informed decision.
We also addressed this problem in terms of the product that we
delivered to market. Our Avere FXT is a tiered NAS server that stores data in
one of several tiers of media, two of which are SSD - one RAM and one flash
based. We based our decision for which media to store data on application access
patterns.
Most of our competitors require that the storage
administrator decides when different media tiers are used for different
applications.
The administrators are required to set policies that
direct data to the various media. Administrator set policies is a losing
proposition as performance is completely dependent on workload at the time of
the operation, primarily due to the
complexity
of the design of modern SSDs.
By characterizing the performance of the SSDs in our system and
making the decision internally, we can use them optimally for application
workload and outperform our competition.
In summary, as the
complexity of SSD storage devices increase, so must the testing strategies used
to evaluate the components. For Avere, this meant a thorough evaluation and
crafting our algorithms to leverage the best of various media.
What is needed for the more general purpose user is a uniform test, or
set of tests, that users can use to accurately compare products from the various
vendors under typical user load. Only then, can end users understand how the
various components will perform in their system.
| | |
. |
 |
. |
|
|
. |
| "One million IOPS.
Yawn! Is that all you've got? In fact, I would argue if the extent of your
marketing message is your IOPS you don't have enough marketing talent." |
| Woody Hutsell
- (after more than 10 years of marketing enterprise SSDs) in his
first SSD blog
(December 20, 2010) | | |
. |
| "Its a measure of
capability, but not necessarily timeliness. You could get 4,000 MB/s delivered
after a 500 millisecond delay." |
| Jeremiah Peschka -
in his blog -
IOPS Are A
Scam (September 11, 2013) | | |
| |