Networks and Servers: Virtualization (I)

Virtualization is a massively growing aspect of computing and IT creating "virtually" an unlimited number of possibilities for system administrators. Virtualization has been around for many years in some form or the other, the trouble with it, is being able to understand the different types of virtualization, what they offer, and how they can help us.

Virtualization has been defined as the abstraction of computer resources or as a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. In fact, virtualization always means abstraction. We make something transparent by adding layers that handle translations, causing previously important aspects of a system to become moot.

Storage Virtualization

The amount of data organizations are creating and storing is rapidly increasing due to the shift of business processes to Web-based digital applications and this huge amount of data is causing problems for many of them. First, many applications generate more data than can be stored physically on a single server. Second, many applications, particularly Internet-based ones, have multiple machines that need to access the same data. Having all of the data sitting on one machine can create a bottleneck, not to mention presenting risk from the situation where many machines might be made inoperable if a single machine containing all the application’s data crashes. Finally, the increase in the number of machines causes backup problems because trying to create safe copies of data is a tremendous task when there are hundreds or even thousands of machines that need data backup.

For these reasons, data has moved into virtualization. Companies use centralized storage (virtualized storage) as a way of avoiding data access problems. Furthermore, moving to centralized data storage can help IT organizations reduce costs and improve data management efficiency. The basic premise of storage virtualization solutions is not new. Disk storage has long relied on partitioning to organize physical disk tracks and sectors into clusters, and then abstract clusters into logical drive partitions (e.g., the C: drive). This allows the operating system to read and write data to the local disks without regard to the physical location of individual bytes on the disk platters.
Storage virtualization creates a layer of abstraction between the operating system and the physical disks used for data storage. The virtualized storage is then location-independent, which can enable more efficient use and better storage management. For example, the storage virtualization software or device creates a logical space, and then manages metadata that establishes a map between the logical space and the physical disk space. The creation of logical space allows a virtualization platform to present storage volumes that can be created and changed with little regard for the underlying disks.

The storage virtualization layer is where the resources of many different storage devices are pooled so that it looks like they are all one big container of storage. This is then managed by a central system that makes it all look much simpler to the network administrators. This is also a great way to monitor resources, as you can then see exactly how much you have left at a given time giving much less hassle when it comes to backups etc.

In most data centers, only a small percentage of storage is used because, even with a SAN, one has to allocate a full disk logical unit number (LUN) to the server (or servers) attaching to that LUN. Imagine that a LUN fills up but there is disk space available on another LUN. It is very difficult to take disk space away from one LUN and give it to another LUN. Plus, it is very difficult to mix and match storage and make it appear all as one.

Storage virtualization works great for mirroring traffic across a WAN and for migrating LUNs from one disk array to another without downtime. With some types of storage virtualization, for instance, you can forget about where you have allocated data, because migrating it somewhere else is much simpler. In fact, many systems will migrate data based on utilization to optimize performance.

Storage Virtualization Types

There are two kinds of virtualization in the storage world:

File Level Virtualization

File services requirements are steadily growing due to a combination of several factors: growing client populations, new file-based applications such as content distribution, and a trend towards the use of file servers and NAS as a storage consolidation platform behind web and application servers. Combined, they raise the requirements for file services to new levels and in the face of this demand, processing power within those file servers is increasingly becoming a bottleneck.

File virtualization serves applications that need to access data in the form of entire files rather than block-by-block and these files will typically reside in file systems located on Network Attached Storage (NAS) devices. Therefore, file virtualization deals mainly with the virtualization of the files stored on NAS boxes, storage servers and file servers. File virtualization is also known under the term File Area Network or FAN.

A FAN is a way to aggregate file systems so they can be moved easier and managed centrally via a logical layer known as a global namespace. The benefits are easier server administration, file reorganization and consolidation and that means files can be moved without the user being aware that they may now physically reside in a completely different location.

File virtualization addresses the NAS challenges by eliminating the dependencies between the data accessed at the file level and the location where the files are physically stored. This provides opportunities to optimize storage use and server consolidation and to perform non-disruptive file migrations. It requires some software to be installed on the server that uses the storage, but enables such things as file-level determination of usage, which is used to determine which data is less accessed and can be moved to slower storage.

Block Level Virtualization

The need for this kind of virtualization arose because SAN users found that a lot of important storage management services were restricted to the disks in a particular array and couldn't be expanded beyond that. Once all the disks in that array were filled up all, you had to get another array, and that meant a new thing to manage. If the new array was from a different vendor, a new problem arose from the fact that each vendor has a closed architecture. You couldn’t manage them from the same console; it was hard to replicate data between them, and so forth. Storage virtualization tries to remedy these problems by moving key management functions off the storage arrays internal controller out into the network.

Block virtualization refers to the abstraction (separation) of logical storage from physical storage so that it may be accessed without regard to physical storage or heterogeneous structure. Block level virtualization takes over before the file system even exists: it’s replacing or augmenting existing controllers and taking over at the disk level.

This type of virtualization sits in the midst of the SAN (in front of the storage arrays) and masks the physical location of data from the device accessing this data. Block-level virtualization is usually just called storage virtualization, and serves applications such as database software that need block-level access to data. The disks will typically (but not always) reside in SANs.

Storage Virtualization Implementation Methods

Host-based

Host based volume managers were in use long before the term storage virtualization had been coined. This method uses a driver of some kind installed on the host operating system intercepting and possibly redirecting I/O requests as a privileged task or process. However, a software layer (the volume manager) resides above the disk device driver, intercepts the I/O requests, and provides the meta-data lookup and I/O mapping.

The FAN concept uses this, but software RAID and volume managers are also an example of host-based storage virtualization. Most modern operating systems have a built-in logical volume manager (LVM in UNIX/Linux or Logical Disk Manager in Windows), that performs virtualization tasks. Volumes (LUNs) presented to the host system are handled by a traditional physical device driver.

Network-based

Network based storage virtualization is true virtualization operating on a network based device (typically a standard server or smart switch) and using iSCSI or Fibre Channel networks to connect as a SAN. The switch that sits between the host and storage will actually virtualize all requests, redirecting I/O unbeknownst to the user. This method doesn’t rely on the operating system; in fact the operating system on the host doesn’t even know what is happening. The virtualization device sits in the SAN and provides the layer of abstraction between the hosts performing the I/O and the storage controllers providing the storage capacity. These types of devices are the most commonly available and implemented form of virtualization.

Array-based

In the array-based virtualization a primary storage controller provides the virtualization services and allows the direct attachment of other storage controllers, in other words the “master” array will take over all I/O for all other arrays. The primary controller will provide the pooling and meta-data management services. It may also provide replication and migration services across those controllers which it is virtualizing. This primary controller must be fast enough to handle all the aggregate storage traffic and it must also interoperate with all the existing disk arrays.

Storage Virtualization Benefits

The biggest benefits of storage virtualization are:

Increased utilization

The most immediate benefit of storage virtualization is increased utilization of the available capacity, thus reducing wasted storage space. Inefficient storage utilization and unnecessary storage purchases have led to the adoption of storage virtualization solutions, which use abstraction to separate physical disk space from the logical assignment of that space. By migrating data to cheaper storage, and spreading out the workloads more appropriately, one can both increase utilization and the visibility into what the overall workload is across all your storage arrays.

For example, a LUN provisioned on a SAN may allocate space that may not be used, or disks may be left unallocated -- lost and forgotten on storage arrays scattered across the data center. With virtualization, otherwise-unused storage can be cobbled together into viable LUNs and allocated to applications. Virtualization allows the storage available on multiple systems (often SAN storage subsystems) to be aggregated so that disk space forms a single logical resource. The virtualized storage pool is then provisioned for use by users, servers and applications.

Thin Provisioning

Thin provisioning is a method of optimizing the efficiency with which the available storage space is utilized using virtualization technology to give the illusion of more physical resources than are actually available. This method works in combination with storage virtualization, which is essentially a prerequisite to effectively utilize this technology. It allows space to be easily allocated to servers, on a just-enough and just-in-time basis and it operates by allocating disk storage space in a flexible manner among multiple users, based on the minimum space required by one of them at any given time.

In the conventional storage provisioning model, also known as fat provisioning, storage space is allocated beyond current needs, in anticipation of growing need and increased data complexity. As a result, the utilization rate is low. Large amounts of storage space are paid for but may never be used. Thin provisioning software allows higher storage utilization by eliminating the need to install physical disk capacity that goes unused.

In most implementations, thin provisioning provides storage to applications from a common pool of storage on an as required basis. With thin provisioning, a storage administrator allocates logical storage to an application as usual, but the system releases physical capacity only when it is required. When utilization of that storage approaches a predetermined threshold (e.g. 90%), the array automatically provides capacity from a virtual storage pool which expands a volume without involving the storage administrator.

For example, a 500 GB LUN can be created with 100 GB of storage space to start. As the associated application uses the disk space, more disk space can be allocated periodically (up to the assigned amount) without having to recreate the LUN.

Dynamic Provisioning

Dynamic provisioning is similar, allowing the size of a LUN to be grown or shrunk as needed, ensuring the size of a LUN is always appropriate for each application.

Non-disruptive data migration

Moving data around should not require a downtime. Data storage virtualization supports migration and replication of LUNs between storage systems. This is particularly useful when one storage system must be taken offline for maintenance or replacement. With virtualized storage, data migrations can be seamless and even automated. By simply changing the mapping scheme, virtualization can move the location of data without disrupting disk I/O, allowing for efficient and non-disruptive data movement.

Once storage space is decoupled from a physical disk or storage array, it's simple to migrate and copy that virtual storage between systems or geographic locations. For example, a virtual LUN can be migrated from an older storage system to a newer one without making any adjustments to the applications. Similarly, a virtual LUN can be copied to another local storage system for backup purposes, or replicated to an off-site storage location for disaster recovery purposes.

Centralized management

Another big advantage for virtualization is that storage management can be greatly simplified. One can manage multiple (often heterogeneous) storage subsystems through a single mechanism: the virtualizing controller, managing a virtualized storage environment. A storage administrator can see utilization trends and growth patterns more clearly and make better upgrade or capacity planning decisions. Traditional storage management (such as RAID group creation and maintenance) is still required but now the administrator can allocate storage from a central place and view all available storage from a single interface.

Virtualization brings efficiency to the storage environment. By pooling storage resources into a single resource, administrators can manage all of the space included in the pool regardless of its location. This allows for much better storage utilization, often reaching 80% or better.

Storage Virtualization Problems

Virtualizing means extra dependence on the controllers that do the virtualizing. Care must be taken to ensure that the virtualizing controller can handle the load of all other disk systems that it is virtualizing for. In essence, it is mandatory to get an extremely fast, redundant and memory-rich device to benefit from storage virtualization.

Higher storage utilization often means higher storage traffic from users and higher migration traffic between storage systems, so ensuring network readiness is always the first consideration for a solution provider.

With all types of virtualization one must pay attention to the fact that more single points of failure are introduced, which must be accounted for, and resource utilization is always more than we estimate. It’s interesting that the main benefit of virtualization is higher utilization, but once you start scaling up a bit more, you quickly run out of resources.

Solution providers must also consider the interoperability of virtualization products within the customer's storage environment. While there can be significant benefits to virtualizing a heterogeneous storage environment the truth is that is not always possible.

Technorati Tags: Servers,Storage,Virtualization,High Availability

del.icio.us Tags: Servers,Storage,Virtualization,High Availability