Thursday 19 December 2013

Object Storage - unsung hero of Cloud and Big Data

They say that information is power, but in a world where the majority of information is digitised and stored electronically, it can only be realised if the information can be found and within the context of what you need to know. Digital information needs to be contextual and retrievable. 

This has long been solved in the case of structured data which resides within databases. These provide relational context and indexes for search and retrieval. But what about unstructured file data which is where the vast majority of information now lives? The current term for this challenge is Big Data. 

One of the key technologies which will help with this challenge is little known and does not come with a fancy title – it is called object storage.

To best explain what object storage is we need to start with the ones and zeros of data, then work our way towards the information that we intend to work with.

Hard drives are at the heart of most storage systems today. Data is retained in blocks of ones and zeros. SAN storage systems expose these blocks to the applications running on connected servers. This can be highly efficient, for example with databases which require fast granular access to the data.

Users however, typically need to work with files like spreadsheets, Word documents, slide-decks, images, emails etc. Each file comprises of a sequenced number of blocks of data which together combine to make the file. Furthermore, the file must reside in a file system which provides a nested folder structure so its location can be indexed. This functionality is provided by an operating system which sits logically above the hard drives. This can be integrated into the storage system (making it a NAS device) or can be external in the form of a server (File Server) or dedicated appliance (NAS gateway or header).

SAN and NAS respectively provide block and file access to data and this has historically catered for most organisations’ needs. However, there are some shortcomings. The fastest growing form of data is unstructured or file data. Files are getting bigger, more numerous and rarely get deleted. As file systems get larger, so they slow down and in fact have hard limits to their scalability. They are also restrictive in the way that information can be searched due to the non-contextual nature of the nested folder structure (which is determined largely by individual users). This system works OK at an individual level because we all apply some logic to how we organise our files. However, at an organisational level, this logic is unknown. So for instance, finding all files which contain “confidential information” relating to a particular customer, might be nigh on impossible. 

With object storage, we have the ability to build context and structure right into the file itself. To achieve this, we wrap the file with “metadata”. This is information about the file. The combination of data and metadata is called an object.

This sounds simple, but the implications are enormous.

Scalability and Performance. Object stores remove the nested folder structure which is the barrier to very large datasets. Information is found instead by searching through the metadata for what you need. It is similar to the way search engines find information on the internet. If you like your files structured by date, by customer, by whatever – not a problem. This can be added into the metadata. You can even simulate a whole nested folder system by including the path in the metadata for each object.

Big Data. Because objects have contextual information included, organisations can search across datasets and extract just the information they need. A great example would be in a hospital environment. It is quite possible to store patient records such that personal identifiable information is retained in metadata and access to this restricted. Analytics could be run across patient data with total anonymity retained.

Cloud. Massive scale plus the ability to apply security policies to data based on fields within the metadata makes object storage a great solution for service providers who store data on behalf of customers. Storage can be carved up into virtual containers based on who owns the data (multi-tenancy). This is exactly what the likes of Amazon, Google and many others are basing their businesses on.

Archive. One of the key uses of metadata is to incorporate a checksum. This essential piece of information tells the system if the object is valid. In other words, the system will know if the object has been changed by a user, corrupted or even deleted. In fact, object stores are typically set up as WORM  “write once read many”. This means that objects are not actually changed, rather a new version is created. This provides the ability to roll back to previous versions or prior to deletion. 

The combination of checksums and WORM functionality make object stores ideal for long term archival of data.

Firstly, by automatically keeping a copy of each object, the system can be made self-healing. If an object is corrupted or lost, the system knows and can recreate the original from the copy. Object stores are mostly designed with a scale-out architecture. If the store is spread across two geographic locations with a copy in each, then arguably the data no longer needs backing up to tape. This in itself can dramatically reduce operational costs.

Secondly, each object carries its own integrity, meaning that objects can be migrated from one store to another whilst retaining proof of its original content. This also means that the underlying hardware can be upgraded as new technology is developed and the objects remain intact indefinitely (with full chain of custody). Now, the data can outlive the infrastructure and the applications which created it.

Thirdly, the metadata can include information pertaining to confidentiality and retention period. Object stores can act upon these details making them ideal for enforcing legal compliance and governance policies.

So you may ask, why are object stores not taking over the world of storage?

Well in one respect, they already are. Many cloud providers have built infrastructures on home grown object storage technology. This has given them leading edge advantages over traditional solutions which is a strong reason why they have not been advertising these facts too loudly. Vast quantities of data are sitting in object stores. IDC forecast that worldwide revenue for file-based and object-based storage will reach $38 billion by 2017, a huge jump from the market's estimated $23-billion-plus revenue in 2013.

As organisations increasingly look to the benefits of cloud type infrastructure, whether public (outsourced) or private (in-house) or hybrid (combination), so object storage will become a key consideration. Traditional file systems will become restrictive and impact business, but it’s a matter of scale. The cloud service providers are leading the charge, but many others may well follow. 

Object storage does require change in terms of technology and significantly in terms of processes. These may be reasons for slower adoption. Another concern might be that object stores have not yet become standards based meaning there is potential for vendor lock-in. Some solutions are more open than others.

Is object storage right for you?

This will of course depend on your situation. If any of the following apply to you, then object storage could be a consideration:

       Massive growth in unstructured data which is straining traditional storage systems.

       Long term retention or specified retention term for data especially where chain of custody is a requirement (data compliance and governance).

       Archive economics – when the savings associated with archiving data way from primary storage and out of the tape backup cycle can outweigh the costs of implementing an active archive with object storage, creating a business case for change.

       Big Data – when silos of data, by application or physical storage, are impeding the ability to find and retrieve information for analysis or decision support.

       Cloud services. When you need to provide a multi-tenanted data service to your customers – whether internal or external to your organisation.

Object storage will not replace traditional SAN and NAS solutions, these have some advantages of their own for many applications. However, as unstructured data continues to grow, object storage will become a complementary and commonplace addition to storage infrastructures everywhere.

Unclear about SAN and NAS - check out