Every minute of every day

 

– YouTube receives 48 hours of uploaded video

– Over 2 million search queries hit Google

– Twitter users post 100,000 tweets

– 571 new websites are created

– Over 200,000,000 emails are created and sent

 

 

The size of digital data the world over is estimated to be nearly 1.2 zettabytes; that’s about 1.3 trillion gigabytes!

In 1997 Michael Lesk (original Unix team) theorised that there was 12,000 petabytes of data stored globally. The web was thought to be increasing 10-fold annually.  Estimates today reckon 2.5 quintillion bytes of data is written daily and 90% of global data has been created in the last two years!  14.7 exabytes of new data is expected for this year!  The more we generate, the more we preserve and protect data with backup and replication, driving the demand for IT storage media even higher.  One of the biggest challenges continues to be our inability to predict how much we need and when the increased storage amounts are needed.

Traditionally, and before virtualisation, storage was predictable.  Administrators could predict the communication (input/output) between systems and these were relatively sequential.  Now with the heavy workloads of virtual machines, the I/O processes become random (data could be anywhere on the disk).  The read/write activity on storage disk heads is increased which results in increased latency (the length of time taken to discover data).

Virtualisation technology and cloud-based applications consolidate compute and storage but need high performance and high capacity storage to handle the high volume transactions of the number of concurrent users.

Storage systems store data and metadata together.  Metadata is ‘data’ about data and helps find the relevant information.  As data is written or read from disk, storage has to write or read the metadata as well.  Over time, with data being modified, deleted, and rewritten, metadata becomes very fragmented on disk. Storage system features such as de-duplication, can cause metadata to multiply and grow rapidly causing system deterioration.

The creation of inefficient silo systems each processing and storing data with no interaction to other systems adds significantly to data centre capital and operating costs.  Often storage arrays are limited in their application and connectivity leading to complexity and inflexibility with resultant errors and downtime.  Separate systems often run de-duplication and compression services at an extra cost.  Business continuity and/or disaster recovery needs are not always considered when storage is procured and replication software is often procured at an additional cost and installed as an after-thought.

Storage choice is constrained and complicated further by the choice of protocol used by applications to connect to shared storage.  Storage vendors tend to support one type of block (iSCSI or Fibre Channel (FC)) and/or file protocols (e.g. NFS, CIFS and SMB3.0) but not all.  Organisations can find that their preferred storage vendor does not adequately support the protocols used within their organisation.

IT Executives are increasingly looking for an efficient and non-silo’d IT infrastructure that supports critical business operations and applications.  The right balance between capacity, performance, and cost is becoming essential in the modern data centre.

Data protection (technologies such as snapshots and replication) is also becoming an important factor to safeguard data assets and comply with regulations is universal.  However, these features tend to affect performance and command an additional price premium with traditional and hybrid arrays.  Some vendors offer compression to reduce the overall storage capacity requirement.  De-duplication to increase efficiency and reduce capacity is available from fewer vendors, again usually at a price premium.

In summary, architectural complexity, non-integrated products, expensive proprietary networking protocols, cumbersome administration and licensing for every module of software are the norm and burden storage consumers with high prices and high maintenance making it incredibly difficult to balance performance and capacity requirements with price.

Below is a high level outline of the types of storage available today:

Hard Disk Drives (HDDs) = Traditional storage – mechanical & Magnetic parts

.           Performance impacts from latency (disk & head movement)

·           Cost per terabyte reduced over the years

·           Slower than SSDs

·           Trade-off between speed and capacity

·           Better for sequential I/O

·           Arraying of HDDs together gives improved performance

.           Arraying of HDDs results in under utilisation of disk capacity

 

Solid State Drives (SSDs) = Electronic with no moving parts and using Flash technology

·           Latency reduced to microseconds

·           Improved performance over HDDs

·           More Expensive than HDDs

·           Good where speed is paramount

·           Lower power consumption than HDDs

·           Better for random I/O

·           Faster as no mechanical limitation for file access

 

Hybrid array = Mix of SSDs & HDDs

.           Fast SSDs and/or DRAM to cache in-demand data

·           Less expensive hard drives for low cost capacity

·           Balance performance, capacity & price

 

Note: A recent survey stated SME IT professionals are looking at hybrid storage for their next storage purchase as they bring low-latency and high capacity at sensible prices.  Many virtual environments are reported to have implemented or are implementing hybrids.