leading the way to the new storage frontier	.....


the fastest SSDs	..


SSD symmetries	..


SSD controllers	..


SSD software	....


SSD news	..


auto caching SSDs	..

.....

the Problem with Write IOPS - in flash SSDs

Do you remember that famous misquotation from the movie Casablanca?

"Play it again Sam - as time goes by..."

Repeating write operations in some apps and some flash SSDs can take orders of magnitude longer than predicted by IOPS specifications. Time does indeed "go by" - potentially discrediting a long established performance modeling metric.

.....

the Problem with Write IOPS - in flash SSDs

by Zsolt Kerekes, editor - StorageSearch.com - December 16, 2009

"Random IOPS" has been long established as a useful concept for modeling server applications performance and bottlenecks when using different types of storage.

As long ago as 1993 - the world's 1st NAS company Auspex Systems quoted a figure of "675 NFS IOPS" to illustrate the capability of its NS 3000 SPARC based server.

And in 1995 a RAM SSD manufacturer called CERAM was quoting a figure of "2,000 IOPs" for its SPARC compatible SSD.

Random R/W IOPS had become a very common metric used in the marketing parlance of high end SSD makers by 2003.

And the good thing was... it didn't matter if your background was in rotating disk storage - because the figures being quoted gave a realistic idea of what you could achieve if you used SSDs instead.

In those days - nearly all enterprise SSDs were "RAM SSDs" - and like hard disk arrays - they had symmetric R/W latencies and throughput performance. Therefore vendors didn't have to differentiate between "read IOPS" or "write IOPS" - it was rightly assumed that they were very similar.

SSD IOPS - start of the slippery slope to a devalued metric

In 2007 the SSD IOPS performance picture was starting to get hazy - with some flash SSD makers quoting high IOPS specs which actually exceeded those of entry level RAM SSDs.

This seemed too good to be true - especially as the price difference in those days (between RAM and flash SSDs) was more than 20 to 1 for the same capacity.

So I asked some leading SSD makers from both sides of the fence to contribute short pieces for a very popular article called - RAM SSDs versus Flash SSDs. That article (published in August 2007) listed important factual data about the state of the art. Data based on real products was compared for RAM SSDs, flash SSDs and hard disks.

2 important points were clarified by that article

1 - flash SSDs (generally) had much worse write IOPS than read IOPS - and the way that some flash SSD vendors quoted a "single" IOPS metric was confusing or misleading.

Following publication of this article StorageSearch.com adopted a new editorial a practice of differentiating whether IOPS figures quoted in vendor press releases and datasheets were really "read", "write" or "combined" (based on an assumed R/W ratio). Before that - some vendors had misleadingly just quoted read IOPS (the best figure) without attaching the important "read" appellation.

2 - for those already experienced with IOPS in HDDs and RAM SSDs who wanted to model and predict flash SSD performance - the key metric to look at was "write IOPS".

The critical effect of write IOPS on overall application performance - was analyzed in a sub-article - by Douglas Dumitru, CTO EasyCo - called Understanding Flash SSD Performance (pdf). In various tables Dumitru pointed out the sensitivity of overall performance to write IOPS for various R/W mixes encountered in real applications.

Back in those days - the ratio of read to write IOPS for commercially available flash SSDs could be anywhere in the region from 10-to-1 to 100- to-1 due to the integral asymmetries in nand flash memory.

Quoting "read IOPS" and "write IOPS" clears things up - but not for long

For me as editor - that seemed to settle the issue for a while. As long as stories related to enterprise flash SSDs always included separate figures for read and write IOPS - (or pointed out when they were missing) experienced readers could do useful mental filtering between different types of products.

Personally I found it a useful shortcut to ignore read IOPS data entirely and just focus on write IOPS to rank the fastest products. I assumed that the ratio of R/W IOPS - in the fastest flash SSDs - might still improve in ensuring years - but would hover in the range 5 or 10 to 1. But I was wrong.

Little more than a year later (in November 2008) - Violin Memory narrowed that gap to 2 to1 in a 2U SLC SSD which had over 200K random Read IOPS and 100K random Write IOPS. And then, in November 2008, Fusion-io started talking about 1 to 1 R/W IOPS in one of its PCIe SSDs.

And this R/W IOPS symmetry soon after made an appearance in 2.5" SSDs - when in April 2009 - SandForce unveiled its SF-1000 family of SSD Processors with 30,000 R/W IOPS.

Where Are We Now?

You might think that - with R/W IOPS symmetry in the fastest flash SSDs we can now forget all about the underlying stuff that happens inside them. Just plug in the numbers - compare to hard disk arrays or RAM SSDs - and the predicted application speedups will follow as expected.

But that would be wrong too. Because - as users are discovering when they test these newer super flash SSDs in their applications - the IOPS model gives good results in some applications (like video servers) - but is extremely unreliable at predicting speedups in other applications (like traditional database transaction processing).

The way that random write IOPS test software is written can produce results which inflate the results for flash SSDs and produce numbers which exaggerate performance.

For hard disks - writing to a range of addresses which are spatially scattered is a good stress test - because it forces head to head movements and rotational seeks- and these latencies get included in the results. So random write IOPS for an HDD produces a lower numeric result than repeatedly writing to a spatial addresses in close proximity.

For enterprise flash SSDs with small amounts of RAM cache the opposite is true. Writing to a range of addresses which are spatially scattered is a bad stress test - because each write effectively goes to an erased block in the storage pool. (Assuming active garbage collection and over provisioning.) The resultant latency "measured" by the IOPS test is simply the buffer or cache latency - and is much less than the time taken to complete a write-erase cycle to the flash media. In contrast to the situation with HDDs - writing repeatedly to a spatial addresses in close proximity in skinny flash SSDs produces a worse result - which can be orders of magnitude worse than measured on the same device for "random IOPS".

If repeat writes to the same small address range (same flash blocks) is interspersed with read operations (the play it again Sam scenario - which occurs in database tables) the performance outcome varies significantly between fat and skinny flash SSDs. A fat flash SSD may produce results which more closely follow the behavior seen in HDDs and RAM SSDs - as predicted by the write IOPS spec. But performance in most skinny flash SSD designs will collapse. Not only will it be orders of magnitude lower than expected from the write IOPS spec - but it may only be 3x better than an HDD - instead of 100x or 1,000x - predicted by the spec for the product.

The paradox for the potential use - is that 2 different flash SSD designs which have apparently identical throughput, latency and random IOPS specs in their datasheets - can behave in a way that is orders of magnitude different in real applications.

The sensitivity is due to the type of application and the implementation of the RAM cache algorithms in the SSD. None of these differences are explicitly stated in benchmark performance figures.

That's because traditional storage benchmarks and test suites don't model the type of behavior which correlates with common applications - and instead of stressing the weakest parts of the SSD design - they play to the tune of the SSD designer who has focused on key metrics which make the product look good in such benchmark tests.

Conclusions

Random R/W IOPS data for flash SSDs is a poor predictor of likely application performance in some common types of enterprise server applications. Users need to proceed cautiously when shortlisting SSD products which are only based on paper or benchmark evaluations. New types of SSD benchmark tests are needed which stress the weakest parts of flash SSD designs - instead of recycling storage I/O tests designed for HDDs or RAM SSDs.

For a unified overview of SSD architecture see - 11 key symmetries in SSD design.

Can you trust flash SSD specs & benchmarks?
the Importance of Write Performance to RDBMS Apps
Calling for an End to Unrealistic SSD vs HDD IOPS Comparisons
Legacy vs New Dynasty - the new way of looking at Enterprise SSDs

There is a genuine problem for the SCM industry which is - what are the most useful metrics to judge tiered memory systems by?

is it realistic to talk about memory IOPS?

Be wary of arguments for enterprise SSD adoption which cite IOPS per dollar (or the other way round) as a justification for filling a gap in some cleverly drawn curve. I've seen this kind of thing from leading SSD companies who should know better.

Clarifying SSD Pricing - where does all the money go?

RAM SSDs
the Fastest SSDs
the Top 20 SSD companies
Are you ready to rethink RAM?

getting 100% of 400K IOPS is better than getting 25% of 500K IOPS. What really matters is attainable performance and realizable IOPS at given load/latency.

IOPS schmIOPS! (pdf) (2013)

what changed in SSD year 2015?
RAM Cache Ratios in flash SSDs
how fast can your SSD run backwards?
flash wars in the enterprise SSD market
factors which influence and limit flash SSD performance

Zsolt re your article

the Problem with Write IOPS - in flash SSDs

comments by:- Ron Bianchini CEO, Avere Systems

This article raises an excellent point regarding SSD performance, testing and fundamentally, their design.

The first SSD storage devices were very transparent in their design, allowing for very simple tests, like Random R/W, that could clearly highlight the differences between vendors. Modern SSD storage devices are much more complicated and include performance enhancements, like data caches, that can hide any performance deficiencies of the underlying media under certain bounded workloads.

At this point, the SSD storage device must be treated as a system, rather than as a transparent single media type.

Performance is now a function of the underlying storage media and the algorithms used to manage the various media in the system (including any internal caches).

At Avere Systems, we faced this exact problem when trying to evaluate flash SSD media for use in our Avere FXT NAS server. Without a uniform set of benchmarks, we needed to test multiple solutions to see how they performed under load in our system. Only then could we make an informed decision.

We also addressed this problem in terms of the product that we delivered to market. Our Avere FXT is a tiered NAS server that stores data in one of several tiers of media, two of which are SSD - one RAM and one flash based. We based our decision for which media to store data on application access patterns.

Most of our competitors require that the storage administrator decides when different media tiers are used for different applications.

The administrators are required to set policies that direct data to the various media. Administrator set policies is a losing proposition as performance is completely dependent on workload at the time of the operation, primarily due to the complexity of the design of modern SSDs.

By characterizing the performance of the SSDs in our system and making the decision internally, we can use them optimally for application workload and outperform our competition.

In summary, as the complexity of SSD storage devices increase, so must the testing strategies used to evaluate the components. For Avere, this meant a thorough evaluation and crafting our algorithms to leverage the best of various media.

What is needed for the more general purpose user is a uniform test, or set of tests, that users can use to accurately compare products from the various vendors under typical user load. Only then, can end users understand how the various components will perform in their system.

How does the SSD performance change relative to the time it has been running? That's age symmetry...

11 Key Symmetries in SSD design

"One million IOPS. Yawn! Is that all you've got? In fact, I would argue if the extent of your marketing message is your IOPS you don't have enough marketing talent."

Woody Hutsell - (after more than 10 years of marketing enterprise SSDs) in his first SSD blog (December 20, 2010)

"Its a measure of capability, but not necessarily timeliness. You could get 4,000 MB/s delivered after a 500 millisecond delay."

Jeremiah Peschka - in his blog - IOPS Are A Scam (September 11, 2013)

1.0" SSDs	1.8" SSDs	2.5" SSDs	3.5" SSDs	rackmount SSDs	PCIe SSDs	SATA SSDs
SSDs all	flash SSDs	hybrid drives	flash memory	RAM SSDs	SAS SSDs	Fibre-Channel SSDs

StorageSearch.com is published by ACSL