|
Choosing the right Solid-State Flash Disk
Beware, choosing
the right F-SSD solution isn't that simple! There's much more to it than
just choosing the right storage interface, disk capacity and performance. NAND
flash technology has inherent technology limitations and as mission-critical
applications require top reliability, F-SSD must be accompanied by special
mechanisms to overcome these potential problems.
Flash is non re-writable and, meaning that a data bit must
first be erased before it can be written again. Flash is erased in blocks (a
typical block size is 4 to 64 Kbytes), which are much larger than disk sectors
(512 bytes). In addition, flash has a limited number of erase cycles of
about 300,000 depending on the process. But flash has no limitation for read
operation.
...Later:- see also the March 2007 article-
SSD Myths and
Legends - "write endurance"
Flash manufactures such as Toshiba and Samsung continue to
improve their process by using less silicon to reduce costs, NAND Flash media
may accumulate up to 2% of bad blocks during its manufacturing
process and additional bad blocks during the flash operational usage. Flash
manufacturers guarantee less than 0.04% of bad blocks out of the total
available. Deterioration in the number of accumulated bad blocks is also a
factor over time. Flash accumulates bad blocks during the write/erase operation
as electrons are captured in an oxide layer and create internal electrical
stress due to potential voltage difference between different blocks inside the
flash chip. The stress failure probability, due to this increased voltage
difference, is increased when only one block is being accessed by erase/write
operations over and over again while the rest of the blocks are untouched. The
stress due to the uneven erase/write operation among the flash blocks increases
the bad blocks accumulated.
Enhancing Solid-State Flash Disk Endurance
Some F-SSD manufacturers incorporate methods to enhance SSD
endurance. A most common technique is known by the name of "Erase
before write" or "Counter wear leveling". This method
implements a counter for every flash block, counting the number of erase write
cycles. Every time that a new write operation is being executed, an erase
operation is being done first to the block where the data will be stored and the
block counter number is being updated. When the specific counter reaches the
flash erase cycles limitation the block is being marked as a bad block. As a
result the data in that block can be continued to be read unlimited number of
times, but the block cannot be written used anymore. for additional write
operations.
When the application tries to execute a new write
operation to a specific block which has already reached its erase cycles limit,
the new data is stored in a different block, taken from a pool of spare blocks,
and a pointer is assigned pointing the location of the new data. The "erase
before write" algorithm is intensively wearing out the flash during time,
especially as the whole erasable block is being erased every time, even though
only part of the data in that block is being updated. If the application
forces write operations to the same disk location over and over again, those
blocks will eventually reach the erase cycles limitat and over time total disk
capacity available for write operation will be decreased. More advanced methods
of enhancing F-SSD endurance to overcome these flash limitations are known by
the names TrueFFS® (True Flash File System), "virtual
mapping" and "Dynamic Wear Leveling" algorithms and "Garbage
Collection". M-Systems uses these techniques in its Fast Flash Disks
(FFD).
The "Dynamic Wear Leveling" algorithm
guarantees the use of all flash components in the disk at the same level of the
erase cycles. The algorithm eliminates situations where the application
repeatedly writes to the same location over and over again until flash blocks
wear-out. The dynamic wear leveling algorithm guarantees that all flash blocks
will be erased the same number of times and as a result the available capacity
for write operation is unchanged. The Dynamic Wear Leveling algorithm eliminates
situations where the application repeatedly writes to the same physical location
over and over again until flash blocks wear-out and the capacity available for
write operation is being reduced over time. The TrueFFS® Dynamic Wear
Leveling algorithm is performed by a dynamic virtual mapping of logical sectors
to physical blocks, transparently to the user's application.
The "Garbage Collection Process" eliminates
the need to perform erasure of the whole block prior to every write. The "Garbage
Collection process" accumulates data marked for erase as "Garbage"
and perform whole block erase as space reclamation in order to reuse the block.
Once a block reaches its own limit of erase cycles or indicates that a problem,
has been found, the embedded
"Bad Block Mapping-Out" algorithm (BBM) marks the block as a "bad
block" and the TrueFFS® does not use that block anymore and instead
replaces it with a spare block. Larger pools of spare blocks (reaching up to 4%
of the F-SSD capacity) are used to replace bad blocks and thereby increase
disk endurance. Incorporating TrueFFS® "Dynamic Wear Leveling", "Garbage
Collection" process and the "Bad Block Mapping-out" algorithms
optimize flash usage with minimum erase cycles, enhancing F-SSD endurance while
keeping media size available for write operation without a decrease in capacity
over time.
Improving Solid-State Flash Disk reliability
Some F-SSD manufacturers use DRAM/SRAM data buffers in
their design to increase disk performance. As DRAM/SRAM is a volatile memory,
powering down when the while disk is being written or when data resides in the
cache may cause an incomplete write sequence. (Although that risk can be
eliminated in a good system design - editor.) DRAM/SRAM cache buffer also
causes disk performance to decline when the cache buffer is full (during write
operations) and if the data does not reside in the cache (during read
operations). F-SSD has no volatile data caching so disk reliability is
increased under unstable power conditions failure situations and will provide
sustained R/W rates undisturbed by cache status.
Power cycling
may cause data corruption even if no volatile caching is used as a data caching.
It is important to verify test that the F-SSD mechanism does not tolerate "in-between"
states of data caused by a power failure when only part of the data was
transferred written to the disk during the write operation until a power failure
occurred but the disk mapping indicates a different scenario. However, data
reliability can be preserved during unstable power cycling conditions using the
following scheme. During a disk write operation, the disk controller should
verify that the new data has been stored, by a transparent internal read
operation, and only then will the flash mapping information be updated. If
the block was not completely transferred to the disk media during the write
operation then the F-SSD must not update the mapping as "block successfully
transferred". Mapping must reflect all time all time the correct status of
the write operationof the disk. Error Detection Code (EDC) and Error
Correction Code (ECC) are used in mechanical disks and SSDs to detect and
correct errors occurs during read/write operation. In general the EDC/ECC
algorithms required used by mechanical disks are more powerful than the ones
required used by SSDs, as the probability of error with magnetic media is much
greater than with flash media due to their design. A most common algorithm
incorporated in F-SSDs for EDC/ECC is the
Reed-Solomon implemented by H/W and S/W using 24bit/sector,
32bit/sector or 48bit/sector. For example 48-bit Reed-Solomon algorithm provides
read bit error rate equals to 10^-14.
(Editor's note:- Since this
article was written, one manufacturer,
AMD, has also announced
error correction at the raw flash bit level. In AMD's MirrorBit cell, code or
data is stored in two discrete and independent locations. By physically
separating each bit and maintaining its individual integrity, AMD's MirrorBit
devices are inherently more stable and reliable than competing multi-level cell
(MLC) devices. This is transparent to any overlying error detection scheme.)
Some F-SSD manufacturers are providing hybrid designs
by incorporating several units of Compact Flash (CF) or units of PC-Cards
(PCMCIA ATA Cards) to compose a flash solid-state disk. The Hybrid design is
less expensive as it enables the manufacturers to use most common CF/ & PCC
units available in the marketplace at a very attractive prices. Some hybrid
F-SSD designs may cause reliability problems under shock & and vibration
conditions as the hybrid F-SSD is a LEGO type product based on several
sub-units of CF/PCC. As CF and PCC supports the ATA/IDE protocol, a SCSI F-SSD
based on CF/PCC sub-units faces the need for protocol
conversion. This IDE to
SCSI conversion may lower disk performance.
As F-SSDs are designed to operate in mission-critical systems
for many years, and taking out the disk for status checking is unacceptable;
remote monitoring of the internal status is needed. One example of remote
monitoring feature is the "SMART"(Self-monitoring, Analysis and
Reporting Technology). By activating the "SMART" software command
feature, the disk performs internal monitoring tests and reports back the latest
results, indicating the status of the disk. The "SMART" command is
common in mechanical ATA/IDE disks and it tests, among other things, the
mechanical disk rotation. As F-SSD does not have moving parts, but on the other
hand does accumulate bad blocks over time, some F-SSD manufacturers have used
the "SMART" command to analyse the F-SSD bad blocks status. The total
number of bad blocks that have been accumulated since the F-SSD was
manufactured, relative to the disk total capacity can be returned as status
information to the user. Accumulating
bad blocks
over time provides the user with an indication of the F-SSD reliability and
expected life span in that system.
Summary
Mechanical disks continue to be the
bottleneck in total systems reliability under harsh environment conditions.
Solid-state flash disks are being designed as true "drop-in replacement"
data storage solution for mission-critical applications. Solid-state flash
disks provide data integrity under harsh environment conditions of extreme
shock, vibration and humidity, operating over the industrial temperature range.
Although the process of ruggedizing mechanical disks can improve their
environmental figures, this is at the expense of larger unit size, excessive
weight and increased manufacturing cost. On the other hand, solid-state flash
disk technology provides smaller casing with minimal power consumption and its
Fast/Quick security erase feature enables solid-state flash disk to erase the
entire disk in 5 seconds. Cost is no longer the harsh barrier in using F-SSD in
mission-critical applications as flash prices have declined dramatically in the
last two years and experts expect the decline trend to continue. Higher
capacities and faster performance will also come from process enhancements
which reduce geometries, as with traditional RAM technologies
But it's
the superior reliability which makes the solid-state flash disk today
an ideal solution for ground, sea and air military applications.
...M-Systems
profile |