They say that information is power, but in a world where the
majority of information is digitised and stored electronically, it can
only be realised if the information can be found and within the context of what
you need to know. Digital information needs to be contextual and
retrievable.
This has long been solved in the case of structured data which
resides within databases. These provide relational context and indexes for
search and retrieval. But what about unstructured file data which is where the vast
majority of information now lives? The current term for this challenge is Big
Data.
One of the key technologies which will help with this
challenge is little known and does not come with a fancy title – it is called
object storage.
To best explain what object storage is we need to start with
the ones and zeros of data, then work our way towards the information that we
intend to work with.
Hard drives are at the heart of most storage systems today.
Data is retained in blocks of ones and zeros. SAN storage systems expose these blocks
to the applications running on connected servers. This can be highly efficient, for example with databases which require fast granular
access to the data.
Users however, typically need to work with files like spreadsheets,
Word documents, slide-decks, images, emails etc. Each file comprises of a
sequenced number of blocks of data which together combine to make the file.
Furthermore, the file must reside in a file system which provides a nested folder
structure so its location can be indexed. This functionality is provided by an
operating system which sits logically above the hard drives. This can be
integrated into the storage system (making it a NAS device) or can be external
in the form of a server (File Server) or dedicated appliance (NAS gateway or
header).
SAN and NAS respectively provide block and file access to
data and this has historically catered for most organisations’ needs. However,
there are some shortcomings. The fastest growing form of data is unstructured
or file data. Files are getting bigger, more numerous and rarely get deleted.
As file systems get larger, so they slow down and in fact have hard limits to
their scalability. They are also restrictive in the way that information can be
searched due to the non-contextual nature of the nested folder structure (which
is determined largely by individual users). This system works OK at an
individual level because we all apply some logic to how we organise our files.
However, at an organisational level, this logic is unknown. So for instance,
finding all files which contain “confidential information” relating to a
particular customer, might be nigh on impossible.
With object storage, we have the ability to build context
and structure right into the file itself. To achieve this, we wrap the file
with “metadata”. This is information about the file. The combination of data
and metadata is called an object.
This sounds simple, but the implications are enormous.
Scalability and Performance. Object stores remove the nested
folder structure which is the barrier to very large datasets. Information is
found instead by searching through the metadata for what you need. It is
similar to the way search engines find information on the internet. If you like
your files structured by date, by customer, by whatever – not a problem. This
can be added into the metadata. You can even simulate a whole nested folder
system by including the path in the metadata for each object.
Big Data. Because objects have contextual information
included, organisations can search across datasets and extract just the
information they need. A great example would be in a hospital environment. It
is quite possible to store patient records such that personal identifiable
information is retained in metadata and access to this restricted. Analytics
could be run across patient data with total anonymity retained.
Cloud. Massive scale plus the ability to apply security
policies to data based on fields within the metadata makes object storage a
great solution for service providers who store data on behalf of customers.
Storage can be carved up into virtual containers based on who owns the data (multi-tenancy).
This is exactly what the likes of Amazon, Google and many others are basing
their businesses on.
Archive. One of the key uses of metadata is to incorporate a
checksum. This essential piece of information tells the system if the object is
valid. In other words, the system will know if the object has been changed by a
user, corrupted or even deleted. In fact, object stores are typically set up as
WORM “write once read many”. This means that objects are not actually
changed, rather a new version is created. This provides the ability to roll
back to previous versions or prior to deletion.
The combination of checksums and WORM functionality make
object stores ideal for long term archival of data.
Firstly, by automatically keeping a copy of each object, the
system can be made self-healing. If an object is corrupted or lost, the system
knows and can recreate the original from the copy. Object stores are mostly
designed with a scale-out architecture. If the store is spread across two
geographic locations with a copy in each, then arguably the data no longer
needs backing up to tape. This in itself can dramatically reduce operational costs.
Secondly, each object carries its own integrity, meaning
that objects can be migrated from one store to another whilst retaining proof
of its original content. This also means that the underlying hardware can be
upgraded as new technology is developed and the objects remain intact indefinitely
(with full chain of custody). Now, the data can outlive the infrastructure and
the applications which created it.
Thirdly, the metadata can include information pertaining to
confidentiality and retention period. Object stores can act upon these details
making them ideal for enforcing legal compliance and governance policies.
So you may ask, why are object stores not taking over the
world of storage?
Well in one respect, they already are. Many cloud providers
have built infrastructures on home grown object storage technology. This has
given them leading edge advantages over traditional solutions which is a strong
reason why they have not been advertising these facts too loudly. Vast
quantities of data are sitting in object stores. IDC forecast that worldwide
revenue for file-based and object-based storage will reach $38 billion by 2017,
a huge jump from the market's estimated $23-billion-plus revenue in 2013.
As organisations increasingly look to the benefits of cloud
type infrastructure, whether public (outsourced) or private (in-house) or
hybrid (combination), so object storage will become a key consideration.
Traditional file systems will become restrictive and impact business, but it’s
a matter of scale. The cloud service providers are leading the charge, but many
others may well follow.
Object storage does require change in terms of technology
and significantly in terms of processes. These may be reasons for
slower adoption. Another concern might be that object stores have not yet become standards based meaning there is potential for vendor lock-in. Some solutions are more open than others.
Is object storage right for you?
This will of course depend on your situation. If any of the
following apply to you, then object storage could be a consideration:
• Massive
growth in unstructured data which is straining traditional storage systems.
• Long
term retention or specified retention term for data especially where chain of
custody is a requirement (data compliance and governance).
• Archive
economics – when the savings associated with archiving data way from primary
storage and out of the tape backup cycle can outweigh the costs of implementing
an active archive with object storage, creating a business case for change.
• Big
Data – when silos of data, by application or physical storage, are impeding the
ability to find and retrieve information for analysis or decision support.
• Cloud
services. When you need to provide a multi-tenanted data service to your
customers – whether internal or external to your organisation.
Object storage will not replace traditional SAN and NAS solutions, these have some advantages of their own for many applications. However, as unstructured data continues to grow, object storage will become a complementary and commonplace addition to storage infrastructures everywhere.
Unclear about SAN and NAS - check out www.2decipher.com
I got idea about the object storage. The detailed description helped me to gain lots of knowledge about Thanks for sharing this valuable post
ReplyDeleteHadoop Training in Chennai
Great blog.you put Good stuff.All the topics were explained briefly.so quickly understand for me.I am waiting for your next fantastic blog.Thanks for sharing.Any coures related details learn...
ReplyDeleteHadoop Online Training
Hello I am so delighted I located your blog, I really located you by mistake, while I was watching on google for something else, Anyways I am here now and could just like to say thank for a tremendous post and a all round entertaining website. Please do keep up the great work. big data definition
ReplyDelete