Networks and Servers: High Availability Storage (II)

Storage Area Network

A Storage Area Network (SAN) is a dedicated high-performance subnet that provides access to consolidated, block level data storage and is primarily used to transfer data between computer systems and storage elements and among multiple storage elements, making storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices to the operating system.

A SAN typically has its own communication infrastructure that is generally not accessible through the local area network by other devices. A SAN moves data among various storage devices, allowing for the sharing data between different servers, and provides a fast connection medium for backing up, restoring, archiving, and retrieving data. SAN devices are usually installed closely in a single room, but they can also be connected over long distances, making it very useful to large companies.

SAN Benefits

The primary benefits of a SAN are:

High Availability: One copy of every piece of data is always accessible to any and all hosts via multiple paths;
Reliability: Dependable data transportation ensures a low error rate, and fault tolerance capabilities;
Scalability: Servers and storage devices may be added independently of one another and from any proprietary systems;
Performance: Fibre Channel (the standard method for SAN interconnectivity) has now over than 2000MB/sec bandwidth and low overhead, and it separates storage and network I/O;

SAN versus LAN

Since the original TCP/IP network protocols used in LANs were developed to move and share files, they had no built-in way to directly access disk drives. As a result, very high-performance applications needed direct access to block-based disk drives to move and store data very fast as data is stored as blocks on a disk drive.

A SAN differs from a LAN in two main ways:

Storage Protocol: A LAN uses network protocols that send small blocks of data with increased communication overhead due to addressing and protocol encapsulation and this reduces data throughput. A SAN uses storage protocols (SCSI) that sends larger blocks of data with reduced overhead and data throughput;

Server Captive Storage: LAN based systems connect servers to clients, with each server owning and controlling access to its own storage resources. Storage must be added to a server rather than directly to the LAN. A SAN allows storage resources to be added to the network enabling any server to directly access them.

SAN Protocols

There are four major protocols used in SANs:

Fibre Channel protocol

IT pros use the fibre spelling (reversing the er to re) to refer specifically to fiber-optic cables used in a SAN. The idea is to differentiate SAN cables from the optical cables used in other networks (such as TCP/IP Networks) and this is why the main protocol used in a SAN is called Fibre Channel.

This is the language used by the Host Bus Adapters, hubs, switches, and storage controllers to talk to each other in a SAN. The Fibre Channel protocol is a low-level language; it’s the means of communication between actual hardware components, and not between the applications that run on the hardware. Fibre Channel is the building block of the SAN; imagine Fibre Channel as the road where SCSI is the truck that moves the data cargo down the road.

SCSI protocol:

SCSI stands for Small Computer System Interface and is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, and electrical and optical interfaces and are mostly used for hard disks and tape drives, but they can connect a wide range of other devices, including scanners and CD drives

This is the language used by SAN-attached server applications to talk to the disk drives. This protocol lies on top of the Fibre Channel protocol. Even though most storage array manufacturers now use Fibre Channel disks in their storage arrays, the disks themselves still use the legacy SCSI protocol to communicate with applications over the Fibre Channel network. All the SCSI messages are encapsulated (packaged) into the Fibre Channel protocol.

iSCSI Protocol

Internet SCSI protocol is a low-cost alternative to Fibre Channel that's considered easier to manage and connect because it uses the common TCP/IP protocol and common Ethernet switches. Another benefit to iSCSI is that because it uses TCP/IP, it can be routed over different subnets, which means it can be used over a wide area network for data mirroring and disaster recovery.

The downside to iSCSI is that it is computationally expensive for high storage throughput because it has to encapsulate the SCSI protocol into TCP packets. This means that it either incurs high CPU utilization (not much of a problem with modern multicore processors) or it requires an expensive network card with TOE (TCP offloading engine) capability in the hardware.

AoE Protocol

The ATA over Ethernet protocol was the most recent SAN technology to emerge created as an even lower-cost alternative to iSCSI. AoE is a technology that encapsulates ATA commands into low-level Ethernet frames and avoids using TCP/IP. That means it doesn't overload the CPU nor require high-end TOE-capable Ethernet adapters to support high storage throughput. This makes AoE a high-performance, very low-cost alternative to either Fibre Channel or iSCSI.

Because AoE doesn't use TCP/IP, it isn't a routable technology -- but then again, neither are Fibre Channel SANs. Most SAN implementations don't require routability, and the fact that you might use AoE on a particular initiator or target doesn't prohibit you from using iSCSI.

SAN Storage

The storage layer is where all data resides on the SAN. This is the layer that contains all the disk drives, tape drives, and other storage devices, like optical storage drives. The storage layer’s devices include some intelligence, such as RAID or other data-replication technologies to help protect data in the event of a failure.

If you use an array of disks without any special connection between them that is Just a Bunch Of Disks (also called a JBOD) all located in the same place. But a storage array adds extra intelligence to the controllers — which allows you to do cool stuff like RAID, so it’s no longer just a bunch of stupid disks. The intelligence built into the storage controllers in the storage array is what enables this additional functionality and this is achieved via some smart code called firmware that makes it more intelligent.

Modular arrays

This type of arrays has fewer port connections than monolithic arrays as they usually store less data, and connect to fewer servers. They’re designed in order to be able to start small, with only a few disk drives, adding more drives to the array as storage needs grow. Modular arrays come with shelves that hold the disk drives and each shelf can hold between 10 to 16 drives, fitting into industry-standard 19" racks, making possible to have all the servers and the SAN disks in the same rack.

Modular arrays are perfect for smaller companies looking to install a SAN on a limited budget but they’re also good for large companies with many remote offices, because they are much cheaper and smaller than big monolithic arrays, so they can be placed into smaller offices. Modular arrays almost always use two controllers with separate cache memory in each controller, and then mirror the cache between the controllers to prevent data loss. Most modern modular arrays can have between 16 and 32GB of cache memory.

Monolithic arrays

Monolithic arrays are those big, refrigerator-size collections of disk drives that you can normally find next to mainframes in a data center. Monolithic arrays can accommodate hundreds of disk drives, can store data for a lot more servers than a modular array can, and usually connect to mainframes.

These disk arrays are loaded with advanced technology to provide bullet proof fault tolerance. Monolithic arrays have many controllers, and those controllers can share direct access to a global memory cache (up to hundreds of gigabytes) of fast memory. This method of sharing access to a large global or monolithic cache is why these arrays are also called monolithic.

Network Attached Storage

Network-attached storage (NAS) is file-level computer data storage connected to a computer network providing data access to heterogeneous clients. A NAS is a single storage device that operates on data files, while a SAN is a local network of multiple devices that operate on disk blocks. A NAS system has its own LAN IP address and its own software for configuring and mapping file locations to the network attached devices. NAS systems contain one or more hard disks, usually arranged into redundant RAID arrays and uses file-based protocols such as NFS (popular on UNIX systems), SMB/CIFS (Server Message Block/Common Internet File System) (used with MS Windows systems), or AFP (used with Apple Macintosh computers).

In short NAS is another server on the network and can in many cases, entirely replace a conventional file server with a Direct Attached Storage System (DAS) – particularly if the server’s only role has been to share files. NAS not only operates as a file server, but is specialized for this task either by its hardware, software, or configuration of those elements. NAS is often made as a computer appliance – a specialized computer built from the ground up for storing and serving files – rather than simply a general purpose computer being used for the role. For this reason, NAS units usually do not have a keyboard or display, and are controlled and configured over the network, often using a browser.

SAN vs NAS

At first glance NAS and SAN might seem almost identical, and in fact many times either will work in a given situation. However, there are differences that can seriously affect the way your data is utilized. While a SAN deals in blocks of data, a NAS operates at the file level and is accessible to anyone with access rights, so it needs also to manage user privileges, file locking and other security measures. The processing and control of this data is performed in large enterprise systems by a NAS head, physically separated from the storage system.

In contrast, SANs allow multiple servers to share a pool of storage, making it appear to the server as if it were local or DAS, and it cannot be accessed by individual users. A SAN commonly utilizes Fibre Channel wiring connections and encapsulated SCSI as protocol while a NAS typically makes use of Ethernet links and TCP/IP with NFS/CIFS/AFP as communication protocols.

Technorati Tags: High Availability,Servers,Failover Clustering

del.icio.us Tags: High Availability,Servers,Failover Clustering