- Back to Home »
- RAID
Posted by : Unknown
Friday, July 26, 2013
1.Introduction
2. ABOUT RAID
3. Why is raid important
INTRODUCTION
RAID is an acronym for Redundant Array of Independent
Disks:
Redundant means that part of the disks’
storage capacity is used to store checkdata that can be used to recover
user data if a disk containing it should fail.
Array means that a collection of disks
are managed by control software thatpresents their capacity to
applications as a set of coordinated virtual disks. In host based arrays,
the control software runs in a host computer. In controlle
rbased arrays, the control software runs in a disk
controller.
Independent means that the
disks are perfectly normal disks that could function independently of each
other.
Disks means that the storage devices
comprising the array are on-line storage. In particular, unlike most tapes,
disk write operations specify precisely which blocks are to be written, so that
a write operation can be repeated if it fails.
ABOUT RAID
The basic idea of RAID was to combine multiple small, inexpensive disk
drives into an array of disk drives which yields performance exceeding that of
a Single Large Expensive Drive (SLED). Additionally, this array
of drives appears to the computer as a single logical storage unit or
drive.
The Mean Time Between Failure (MTBF) of the array will be equal to the
MTBF of an individual drive, divided by the number of drives in the array.
Because of this, the MTBF of an array of drives would be too low for many
application requirements. However, disk arrays can be made fault-tolerant by
redundantly storing information in various ways. Five types of array
architectures, RAID-1 through RAID-5, were defined by the Berkeley paper, each
providing disk fault-tolerance and each offering different trade-offs in
features and performance. In addition to these five redundant array
architectures, it has become popular to refer to a non-redundant array of disk
drives as a RAID-0 array.
Why
Is RAID Important?
As the storage industry becomes increasingly independent of
the computer system industry, storage alternatives are becoming more complex.
System administrators, as well as managers who make storage purchase and
configuration decisions need to understand on-line storage alternatives.
Awareness of what RAID can and cannot do for them helps managers make informed
decisions about on-line storage alternatives. Users of networked personal
computers may also be concerned about the quality of the storage service
provided by their data servers.
Why use RAID?
Typically, RAID is used in large file servers,
transaction or application servers, where data accessibility is critical, and
fault tolerance is required. Today, RAID is also being used in desktop systems
for CAD, multimedia editing and playback, where higher transfer rates are
needed.
Disk Striping
Fundamental to RAID technology is striping.
This is a method of combining multiple drives into
one logical storage unit. Striping partitions the storage space of each drive
into stripes, which can be as small as one sector (512 bytes) or as large as several
megabytes. These stripes are then interleaved in a rotating sequence, so that
the combined space is composed alternately of stripes from each drive. The
specific type of operating environment determines whether large or small
stripes should be used. Most operating systems today support concurrent disk I/O operations across
multiple drives. However, in order to maximize throughput for the disk
subsystem, the I/O load must be balanced across all the drives so that each
drive can be kept busy as much as possible. In a multiple drive system without
striping, the disk I/O load is never perfectly balanced. Some drives will
contain data files that are frequently accessed and some drives will rarely be
accessed. NG DISK DRIVES
By striping the drives in the array with stripes large
enough so that each record falls entirely within one stripe, most records can
be evenly distributed across all drives. This keeps all drives in the array
busy during heavy load situations. This situation allows all drives to work concurrently
on different I/O operations, and thus maximize the number of simultaneous I/O
operations that can be performed by the array.
Definition of RAID Levels
RAID 0 is
typically defined as a group of striped disk drives without parity or data
redundancy. RAID 0 arrays can be configured with large stripes for multi-user
environments or small stripes for single-user systems that access long
sequential records. RAID 0 arrays deliver the best data storage efficiency and
performance of any array type. The disadvantage is that if one drive in a RAID
0 array fails, the entire array fails.
RAID-1
Raid-1 also known as disk mirroring, is simply a
pair of disk drives that store duplicate data but appear to the computer as a
single drive. Although striping is not used within a single mirrored drive
pair, multiple RAID 1 arrays can be striped together to create a single large
array consisting of pairs of mirrored drives. All writes must go to both drives
of a mirrored pair so that the information on the drives is kept identical.
However, each individual drive can perform
simultaneous, independent read operations. Mirroring thus doubles the
read performance of a single non-mirrored drive and while the write performance
is unchanged. RAID 1 delivers the best performance of any redundant array type.
In addition, there is less
performance degradation during drive failure than in RAID 5
arrays.
RAID 2
Raid 2 arrays
sector-stripe data across groups of drives, with some
drives assigned to store ECC information. Because all disk
drives today embed ECC information within each sector, RAID 2 offers no
significant advantages over other RAID architectures and is not supported by
Adaptec RAID controllers.
RAID 2
0 RAID 3
Raid 3 as with RAID 2, sector-stripes data across groups of
drives, but
one drive in the group is dedicated to storing parity
information. RAID 3
relies on the embedded ECC in each sector for error
detection. In the case
of drive failure, data recovery is accomplished by
calculating the exclusive
OR (XOR) of the information recorded on the remaining
drives. Records
typically span all drives, which optimizes the disk transfer
rate. Because each I/O request accesses every drive in the array, RAID 3 arrays
can satisfy only one I/O request at a time. RAID 3 delivers the best
performance for single-user, single-tasking environments with long records.
Synchronized-spindle drives are required for RAID 3 arrays in order to avoid
performance degradation with short records. Because RAID 5 arrays with small
stripes can yield similar performance to RAID 3 arrays, RAID 3 is not supported
by Adaptec RAID controllers.
RAID
3
RAID 4
Raid 4 is
identical to RAID 3 except that large stripes are used, so that records can be
read from any individual drive in the array (except the parity drive). This
allows read operations to be overlapped. However, since all write operations
must update the parity drive, they cannot be overlapped. This architecture
offers no significant advantages over other RAID levels and is not supported by
Adaptec RAID controllers.
RAID 5
Raid 5 sometimes called a Rotating Parity Array, avoids the write
bottleneck caused by the single dedicated parity drive of
RAID 4. Under
RAID 5 parity information is distributed across all the
drives. Since there
is no dedicated parity drive, all drives contain data and
read operations
can be overlapped on every drive in the array. Write
operations will
typically access one data drive and one parity drive.
However, because
different records store their parity on different drives,
write operations
can usually be overlapped.
RAID
5
STANDARD RAID TYPES THEIR ADVANTAGES AND
DISADVANTAGES :
RAID 0: Also known as 'striping', this is technically not a RAID level since it
provides no fault tolerance. Data is written in blocks across multiple drives,
so one drive can be writing (or reading) a block while the next is seeking the
next block The advantages of striping are the higher access rate, and full
utilization of the array capacity. The disadvantage is there is no fault
tolerance if one drive fails, the entire contents of the array become
inaccessible.
RAID 1: Mirroring provides redundancy by writing twice - once to each drive. If
one drive fails, the other contains an exact duplicate of the data and the
controller can switch to using the mirror drive with no lapse in user
accessibility. The disadvantages of mirroring are no improvement in data access
speed, and higher cost, since twice the number of drives is required (50%
capacity utilization).
RAID 3: RAID level 3 stripes data across multiple drives, with an additional
drive dedicated to parity, for error correction/recovery. RAID 3 is not found
on all controllers.
RAID 5: RAID level 5 is the most popular configuration, providing striping as
well
Which RAID level should I use?
The PAC (Performance Availability Capacity) strategy is one method of
assessing which RAID level is most appropriate. Performance is how quickly the
data can be accessed. Availability refers to fault tolerance (if a drive fails
the data is still available). Capacity refers to how efficient the data storage
is (how many drives are required for a given array size).
RAID 0 has the best performance and capacity, but the lowest availability (no
fault tolerance). If one drive fails, the entire array fails because part of
the data is missing with no way to recover it other than restoring from a
backup.
RAID 1 has the highest availability but lowest capacity, since twice the
number of drives are required. Performance is roughly the same as for a single
drive, although in some instances the dual write may be somewhat slower.
RAID 0+1 offers some performance improvements by striping,
then mirroring the striped array, but capacity is low since the mirror requires
a duplicate set of drives.
RAID 5 has moderate benefits in all three areas, so it ranks roughly in the
middle. Read performance can be as fast as RAID 0, but write performance is
slower, since the parity information must be calculated and written along with
the data. Capacity is higher than for RAID 1 but not as high as with striping,
since the array uses additional space for the parity information. Availability
is high with RAID 5 because of the fault tolerance - if a drive fails, the
missing data is recalculated from the remaining operational
drives.
RAS (Reliability, Availability, Serviceability)
RAS Definitions
Let's examine the three main considerations for evaluating
a RAID storage solution from a data availability standpoint: reliability,
availability, and serviceability.
Reliability
Reliability means when or how often can you expect the item
in question to fail. Typically expressed
in Mean Time Between Failures (MTBF), this metric is
used to quantify hardware component failures that exhibit an exponential
failure. For instance, disk drive manufacturers claim MTBFs of 300,000 to
800,000 or more hours. Those disk drive MTBFs sound good, but that's only part
of the picture. What is stated on a specification sheet may represent the
average of the population, not your drive in particular. Your drive's environment may not be optimal,
either because a fan in the server packaging is not running optimally, or your
system experiences a power surge that cripples your disk drive. The
manufacturer may specify theoretical, not operational MTBFs, where theoretical MTBF
specifications are derived from mathematical models of empirical field data of
the individual drive components.
Theoretical MTBFs do not account for failures due to drive infancy,
manufacturing-
induced defects, drive returns in which the failure cannot
be repeated (i.e., NTFs - No Trouble Founds), and damage due to improper
handling.
Your disk drives or other components of your system will
eventually fail. If a critical drive fails, such as a boot
drive or a drive containing payroll information, your entire organization may
be effected.
software is just as likely to fail as hardware.So, how do
you protect your critical data?
Implementing RAID technology, either software or hardware-based, is a
logical first step in protecting your data from disk drive failures. RAID technology should be deployed on any
server or workstation where the cost of lost data or downtime warrants it. But that's just the beginning. There are other availability and
serviceability features that you should examine to determine the optimum RAID
solution for your environment.
Availability
Data availability is defined as having your data accessible
at all times. There are two components
to data availability: data integrity and fault tolerance.
Data integrity : Data integrity means getting the correct
data, every time. Most RAID solutions
offer dynamic sector repair, where the defective sectors due to soft media
errors are repaired on the fly. The real differentiating factor is the amount
of error correction and error detection code provided. Software-based RAID
typically relies on a standard SCSI bus for data integrity protection, where it
can detect 1-bit errors but has no ability to correct any errors. Hardware-based RAID solutions usually contain
more robust code. For instance,
Adaptec's hardware-based RAID
solutions not only detect 4-bit errors, but also correct
1-bit errors on the entire data path from the storage media to the host system
bus. l.
Fault-Tolerance : Fault tolerance is defined as maintaining
data availability in the event of one or more failures in the system. The most common method of achieving fault
tolerance on servers andworkstations today is RAID technology.Each RAID level
offers different tradeoffs on performance, cost, and availability, and as such,
itmay be appropriate to use different RAID levels for different applications -
even on the same server or workstation. RAID 0 (i.e., striping) should only be
used in high performance applications that can afford downtime and/or lost
data. Critical files in which an outage would severely cripple business
activities, such as boot drives, would best be protected using RAID 1 (i.e.,
mirroring), or for even better performance, RAID 0/1 (mirrored striping). Most applications can best be protected by
RAID 5 (striped parity), which offers the best balance between performance,
cost and availability.
1 Drive hot swap is defined as the ability to pull
out and replace a drive while the system is running and data is being
accessed. With warm swap, you
must first pause activity on the SCSI bus before removing the drive. More sophisticated hardware-based RAID
solutions also offer the option of either dedicating spares to each array, or
using a pool of spares for all arrays to draw on. Using dedicated spares on the most critical
applications eliminates contention for spares in the event of multiple drive
failures. Pooling spares is a more
cost-effective method of data availability that is appropriate for less
critical applications.the next level of fault tolerance protection is to add
redundancy to non-disk drive components. The downside is that it significantly
increases the cost of your configuration.
Typical areas to add redundancy include packaging, such as extra fans,
dual I/O paths from the server to the disk drive (i.e., redundant controllers),
and multiple servers. Using an Uninterrupted Power Supply (UPS) is also a good
idea. Some software-based RAID solutions support disk duplexing, a form of
mirroring (RAID 1) using redundant controllers where each disk drive is
attached to a separate controller, thereby eliminating the controller as the
single point of failure. The hardware-based RAID equivalent solution is called
active-active controller failover, available only on more expensive, high-end
external RAID controllers such as Data General's CLARiiON series.
Server redundancy is most cost-effectively achieved through
clustering, such as that offered on Microsoft's Windows NT Server 4.0
Enterprise Edition or many Unix and mainframe computer systems. With clustering,
multiple servers access the same storage. In the event of a server failure,
data on the disk drives can still be accessed using other servers in the
cluster. Hardware based external RAID
controllers are typically used to provide RAID protection for the disk drives
in a clustered environment. In everhigh end mission-critical applications,
remote mirroring (RAID 1) software such that offered by Compaq/Digital's
OpenVMS is employed to mirror data to a remote site for disaster protection. Such configurations are very expensive,
because the entire server configuration is duplicated at an offsite location.
Serviceability
Serviceability means in the event of a failure, how fast
and easy is it to detect and isolate the failure, repair or replace the failed
component, and reset the application or operating system. Serviceability also includes preventive
maintenance features that help you monitor andreplace marginal components
before they fail. S.M.A.R.T. and SAF-TE are two standards that have emerged in recent
years that should be employed on any serious RAID implementation.
Configurations supporting the S.M.A.R.T. (Self monitoring, Analysis and
Reporting Technology) standard monitor disk drives and report any out-
of-threshold conditions that may signify a potential failure to the array card
or server management software, permitting you to replace the drive before it
fails.
Configurations supporting SAF-TE (SCSI Accessed
Fault-Tolerant Enclosure) monitor and report enclosure conditions to array or
server management software, assisting in alerting and isolating
enclosure-related failures. In either case, you need to check that not only the
disk drives are S.M.A.R.T.-compliant or enclosure is SAF-TE-compliant, but also
that the RAID cardÕs management software and operating system support these
standards. Many software- and
hardware-based RAID solutions support S.M.A.R.T. and SAF-TE. However, just as there are many different
vendor implementations of SCSI drives, there are many different implementations
of SAF-TE enclosures, all of which need to be tested for compatibility to
ensure that enclosure-related events are properly reported and interpreted by
the card and RAID management software. With Microsoft NT software-based RAID,
drive and enclosure events are reported via SNMP to the general management log,
a log that contains storage- as well as server- and network-related
events. The system manager can then
employ a filter to view only storage-related events. Each storage installation can only be
monitored locally on each server, so the system manager must physically
"make the rounds" to monitor each RAID installation. Many
hardware-based RAID solutions offer RAID management software specifically
designed not only to configure and manage RAID arrays but also to report
storage-related events.
The more sophisticated of these RAID management software
packages categorize errors and events by severity, such as color-coded alerts
highlighted in yellow for a potential problem and red for an actual component
failure. Some even e-mail, fax or page the system manager in the event of
alerts requiring immediate attention, greatly increasing the system manager's
ability to detect problems and decrease the time it takes to bring the storage
subsystem back up to full operational capability. Others allow you to manage, monitor and in
some instances repair all hardware-based RAID installations from a single
station, even remotely.
MANAGEMENT OF DATA
With the explosion of on-line data, the cost of managing
that data has escalated as well. For
every dollar spent on initial storage purchase, various estimates calculate
that another $5 to $7 is spent managing the storage.These figures include the
cost of installing, configuring, monitoring, and optimizing the on-line storage
for performance, as well as backing up, restoring, and archiving the data. For smaller businesses and IT sites who can't afford a dedicated or
sophisticated IT staff but need to protect their valuable data, storage
management ease of use is of paramount importance. Let's examine the
manageability issue from two aspects: how easy is it install and configure a
software or hardware-based RAID, and how easy is it to monitor and proactively
manage the RAID installation.
Configuing RAID
There can be significant differences between RAID solutions
in both the ease of configuringarrays and the degree to which you can tune your
arrays for optimum performance orfunctionality.
Does the solution offer a streamlined configuration "wizard"
that uses default settings to help first-time users to get up and running
quickly? For more sophisticated users,
advanced features like variable stripe depths, spare
allocation (either dedicated or global) and setting drive reconstruction
priorities (low/medium/high) become important differentiators. Windows NT RAID
software uses one stripe depth - 64 kB, based on research that concludes most
applications achieve optimal performance with stripe depths between 64 kB and
128 kB. Windows NT does not offer spare allocation because failed drives are
manually replaced by the system manager.
In contrast, hardware-based RAID solutions typically offer a variety of
stripe depth
options, such as 8, 16, 32, 64 and 128 kB. More sophisticated hardware solutions also
offer spare allocation and priority settings on drive reconstruction.
¥ Managing
RAID
As discussed in the previous section on serviceability,
some of the key differences in managing software- and hardware-based RAID
solutions center on the ease of identifying and reporting errors. Hardware-based solutions typically offer more
sophisticated management software
features such as alerts color-coded by severity, e-mail,
fax or pager notification of errors, and remote management of multiple RAID
installations. But this is just the beginning.
Graphical user interfaces (GUIs) that employ a Windows-like look and
feel with pop-down menus, property tabs, physical and logical views in
drill-down WindowsÒ Explorer-type tree structures, and detailed views can make
a huge difference in the ease of
managing your storage. Not all RAID solutions offer GUIs. Unlike software-based solutions,
hardware-based RAID solutions allow monitoring and management of RAID
configurations on multiple operating
systems such as Windows NT and Novell Netware. The ability of hardware-based solutions to
remotely manage RAID storage means that you can initialize new arrays and
reactivate offline arrays without ever leaving your desk. More sophisticated
hardware-based RAID management implementations support preventive maintenance
activities such as monitoring card, drive and enclosure fan and temperature
status but
also testing hot spares, verifying parity information, and
reconstructing the information on a failed drive. Some even allow you to schedule these
activities, thus eliminating the need for manual intervention and minimizing
impact on server performance. Another
distinguishing feature among
RAID management implementations is the ability to poll
servers, networks and non-RAID configurations, so that downtime conditions are
more quickly detected and isolated.
Performance
Considerations :
Running benchmarks in a controlled environment is a useful
method for comparing performance,such as the Ziff-Davis WinbenchÒ 97 benchmark
results contained in the AAAÒ-131CA PCI Array Card Series report. This report concludes that Adaptec's
hardware-based RAID solutions demonstrate a consistent performance advantage
over NT software-based RAID. Certain
applications such as NASTRAN, Adobe AfterEffects, Adobe Photoshop and AutoCAD
may see a significant performance improvement due to card-based caching used on
AAA-131CA workstation cards because of more efficient cache flush operations,
reduced disk drive head thrashing, fewer cache misses, and more writes at
memory rather than disk speeds. For a
more comprehensive discussion on the benefits of card-based caching, see
Performance Benefits of a Caching RAID Coprocessor in PC NT Workstations. But
just as your car mileage may vary from EPA mileage ratings, performance on your
RAID storage will vary based on your system configuration and application
environment. Whether the performance differences are enough to warrant
selecting a hardware-based solution is the tricky part. However, since most applications can be
characterized as being CPU-bound, I/O-bound, or a mixture of both, an empirical
discussion of software- and hardware-based RAID solutions may be helpful in
determining which solution is best for you.
CPU Bound Applications
The argument in favor of hardware-based RAID in CPU-bound
applications is straight forward one.
Offloading RAID 5 parity calculations and RAID 1 secondary writes to a
separate hardware- based RAID co-processor reduces CPU interrupts, freeing the
main CPU to perform other compute-intensive functions. I/O traffic on the main PCI bus is reduced,
so that other activities such as network traffic can be processed more
efficiently. The performance advantages of hardware-based RAID is especially
pronounced when RAID 5 data sets are operating in degraded mode (i.e., a drive
in the array has failed), because both read and write requests require
parity calculations, significantly increasing CPU interrupts and I/O traffic.
I/O Bound Applications
In I/O-bound applications, the differences between
software- and hardware-based RAID are less apparent. Clearly, if disk drives are the bottleneck,
whether the parity calculations are performed in the main CPU or RAID
co-processor will make little difference in overall system performance.
However, there are some situations where hardware-based RAID may be
advantageous. You could see a significant improvement in mirrored drive (RAID
1) performance if you implement striping and mirroring (RAID 0/1), not
available on Windows NT or Netware software-based RAID implementations. With RAID 0/1, not only could your
application experience improved read and write times due to simultaneous
multiple drive accesses, but also more consistent and predictable performance
due to the load balancing effect of RAID 0. If your application is already
I/O-bound, a failed drive in a RAID 5 data set can have a paralyzing effect on
system performance. Hardware-based RAID
solutions that support automatic failed drive detection with hot spare
replacement can significantly reduce the amount of time your application is
running in degraded mode, because the application does not have to wait until
you
physically replace the failed drive. Hardware-based RAID solutions that allow you
to set the priority (low/medium/high) for array reconstruction, gives you control
over the tradeoffs you are willing to make between overall system performance
and availability. Hardware-based RAID solutions can improve system boot time
and operating system performance by striping the operating system files, a
feature not supported on NT's software-based RAID implementations. are-based
RAID solution.
COST
Clearly, the up front costs of software-based RAID are hard
to beat. For independent softwareRAID
packages, there's just the cost of a software license and software
installation. There are no acquisition
costs for operating systems supporting embedded RAID, and since you're
installing the operating system software anyway, the incremental installation
costs are zero. Getting something for
free is easy to cost-justify to management, and basic RAID protection is better
than no protection at all.
Most common sources of problems with RAID?
The most common source of a wide variety of symptoms, including data
loss, is incorrect cabling and termination. Use good quality cables which are
no longer than absolutely necessary. Provide good active termination at the end
of the SCSI bus. Use a separate terminating plug rather than using the
termination on the hard drive, to avoid loss of termination if that drive fails
or is removed. Resource conflicts are another common source of problems.
Incorrect interrupts or no interrupt assigned, onboard controller chips which
have not been disabled when not in use, non-compliant motherboards, and so on,
can all result in system lockups, installation aborts, and generally erratic
behavior. Drive mismatches can cause problems with the array and its
performance. Ideally, all the drives in the array should be identical,
including the firmware version on the drive itself, since different versions of
the drive firmware code can result in differences in access speed, queuing
algorithms, and head movement optimization. Most disk manufacturers will
provide firmware updates if multiple versions have been released over the life
of a particular model.