Friday 22 February 2019

3 ways data analytics can fail

Whether you are using a spreadsheet or advanced data analytics with machine learning or AI - unless you are vigilant, your foraging for insights from data can so easily go wrong.

A big part of the problem is that ultimately, data is always viewed through a human lens. Unfortunately, our brains did not evolve to process complex numerical data and our instincts, biases and desires can significantly alter our understanding of the data being presented to us. We have no way of knowing if data is right or wrong simply by looking at it. So we have to be very careful when making decisions based on it.

In this short video - 3 causes of insight failure are highlighted. Each compounding the risks that our decision making may be flawed.

  • GIGO - Garbage In Garbage Out
  • People see what they want to see
  • Lies, damned lies and statistics
I also include some hints at ways to avoid these problems.

Tuesday 12 February 2019

Data Analytics - past, present and future

The desire to look into the future is as old as humanity itself. From wanting to know where the next meal is coming from to placing bets on the Grand National - a view of the future has an obvious attraction.

Nowadays, businesses are turning from witchcraft and wizardry and looking to something a little more scientific. They want to figure out what is to come and give themselves an edge. 

These days, it's all about data science.

New technologies like artificial intelligence, machine learning and cognitive systems, coupled with the promise of Big Data are creating the illusion that all questions can be answered - even those which pertain to the future.

The challenge of course is that the only data we have - is from the past. And whilst you may subscribe to the view that past performance predicts future performance - you will in fact be disappointed. It's just a question of when.

Of course, data science can, if wielded correctly, sustain a competitive advantage. It can swing the odds in your favour as you navigate an uncertain future. If your predictions are 10% more accurate than your competitors - well that is an edge. 

It's like the adage of outrunning a bear. In fact you don't have to, you just need to run faster than the person next to you.

To find out more about using data from the past to foresee the future and the role of data scientists - watch this video on Data Analytics. This is Part 1 - from descriptive to prescriptive.

Tuesday 22 August 2017

Artificial Intelligence: what will they think of next?

Back in the 1980's, computers started to hit the masses with products like the Sinclair ZX80, the Commodore 64 and the BBC Micro. Many kids at school got the computing bug. This was nothing to do with gaming (that did not exist yet), it was all to do with coding.

There was a saying which neatly summed up the interactions with those early computers:

Computers don't always do what you want. They always do what you tell them.

The apparent "intelligence" of a computer was not derived from the computer itself but from the intelligence of the coders who wrote the programmes which it ran. So if you wrote bad code, the computer appeared to misbehave. My computer spent a lot of time on the naughty step.

I remember being impressed and amused when I came across an early computer running a programme called Eliza (developed originally in the mid 1960's). Communicating via the keyboard, you could actually converse with an imaginary person within the computer. The responses implied some form of intelligence, but after a short period of time, you realised Eliza was pretty dumb but nevertheless likeable. Eliza was simply a set of algorithms and rules which interacted with the patterns in your text in order to mimic intelligent conversation.

One of the most significant definitions of thinking machines came from Alan Turing (best known for code breaking in the second world war and for being the main character in the film The Imitation Game). He proposed a test in which a machine would be set up to converse with a human remotely via a screen and text only. If the human could not detect that he was conversing with a machine, then the so called Turing Test would be passed.

Eliza was the first example of a machine passing this test but was certainly not an example of artificial intelligence.

There is no universal definition of artificial intelligence, but here are some characteristics which commonly apply:

A machine, computer system or software which can:
  • carry out tasks normally requiring human intelligence
  • think for itself and make informed decisions
  • learn for itself, adding to its pool of knowledge beyond that of the original programme
  • generate original ideas
  • make predictions based on analysis of past (or perhaps simulated) experiences

There have been philosophical debates about whether man made machines can truly "think" going back to Socrates and Plato. Questions of sentience, feelings and the possession of a soul by machines have been explored by science fiction writers like Philip K Dick in "Do Androids Dream of Electric Sheep" through to the adventures of Data in Star Trek the Next Generation.

The challenge we are about to face is more down to earth. Philosophy aside, if a machine can accurately mimic the behaviour of a human, think and carry out tasks for itself, then why not have it do those jobs which humans are doing today?

We are living in an information age. The vast majority of us are "information workers". Robots in this context are not restricted by physical form and artificial intelligence can already carry out information worker tasks as well or better, and certainly cheaper than humans. Eliza is arguably coming around to haunt us in the guise of the latest generation chatbots. These 21st century editions are not constrained by keyboards and text (although you will find this communication mode used on websites), nowadays, they can use natural speech to converse with customers and even translate between languages. We are even seeing 24 hour shopkeepers and helper systems being put into homes with names like Alexa, Cortana and unimaginatively, Google Home.

These forms of artificial intelligence are still very basic and have no feelings for sure and will quite happily make you redundant. Predictions about how many jobs are truly at risk vary enormously but have one thing in common - big numbers. I am heartened by one of the lessons of history. Back in the 60's, around the time Eliza and myself were being conceived, household appliances like washing machines, blenders and toasters were set to make life easy. Similar advances at work including the rise of computers were going to lead to three day weeks and a leisurely life of luxury. This did not come to pass. Instead, ever increasing demand for more machines, computers, blenders and all manner of other things has led to shops open on Sundays, 60 hour weeks and a whole lot more stress in our lives. My point is, demand for human labour does not diminish, but the roles in demand will change.

Artificial intelligence is already moving beyond the bounds of the digital world and into the physical one. Driverless cars are a reality and if you are thinking, well I haven't seen one, I can assure you it won't be too long. Southern Rail staff in the UK have been going through a long painful battle against the introduction of trains with driver operated doors on the grounds that not having a guard do this reduces passenger safety. A time will come when driverless trains will prove to be safer than human operated ones and unfortunately, train driving as a career will be heading to a siding.

If you are thinking this topic is getting a bit dark, you ain't seen nothing yet. In 2016, one of the greatest thinkers alive today had this to say about AI:

"The primitive forms of artificial intelligence we already have, have proved very useful. But I think the development of full artificial intelligence could spell the end of the human race,"  Stephen Hawking in an interview with the BBC.

The book by Nick Bostrom, "Superintelligence: Paths, Dangers, Strategies" has become a bestseller. Bostrom is the founding director of Oxford university’s Future of Humanity Institute and so spends a lot of time thinking about this stuff. Not all agree about the timing of when computer intelligence will match that of a human but sometime well before the end of this century seems to be a consensus. Bostrom does not predict it will all go wrong but highlights a number of possible ways that it could.

In the meantime, trying not to have nightmares about Terminator coming true, artificial intelligence is on the cusp of making a real difference to our lives both at home and in our work. Computers may start doing more than what we tell them and actually start showing some initiative. And that has surely got to be a good thing.

Gartner's latest view on AI - here

Wednesday 9 August 2017

What is Docker containerisation

Docker is an open source platform which encapsulates applications into highly efficient “containers”. Many containers can run on a host without interfering with each other. The community edition is available for free.

3 minute video version of this post is here.

Docker Inc – is a commercial organisation which sells Docker enterprise edition to organisations which need tools which are fully tested, validated and supported for use in an enterprise setting.

You may be familiar with server virtualisation from vendors like VMware and Microsoft. It enables the creation of virtual machines. Many virtual machines can run on a single host without interfering with each other.

It sounds a bit like Docker containers but the origin of server virtualisation is different to that of containers and on the inside they are different too.

The adoption of server virtualisation has been driven by the needs of IT departments to drastically reduce IT infrastructure running costs. Physical servers historically ran a single application to ensure no conflicts occurred between them. With requirements for lots of applications, that meant lots of expensive servers, all needing space, power and cooling. Advances in computing power and the development of virtualisation technology means that a single host server can replace many physical servers and contain many virtual machines - each with its own operating system and application.

Containerisation, on the other hand, has been driven by the needs of application developers. As businesses increasingly use digital technology to become more competitive, so the need to update and add new applications faster has become critical. It is a technology which is linked to the rise in DevOps, whereby businesses are putting dedicated teams in place to develop new applications. These projects must easily migrate from test and development phase into production so that they can run reliably and at scale. Ongoing application development continues to require the code to move between environments seamlessly.

A big overhead when developing applications turns out to be the IT infrastructure itself.  Software is typically developed by teams using a range of platforms like PC's, laptops, servers and even cloud based systems. These may not be compatible. Each platform move requires customisation and testing for reliable operations. This really slows down progress and hinders collaboration across developer teams.

Containerisation starts with the application. An application does not need an entire dedicated operating system, it can make do with a cut down set of files called “bins and libraries” – together these can be put in a container. The container needs access to a shared operating system plus the Docker software. The operating system is typically Linux with Microsoft Windows a more recent option.

Containers don’t each hold an operating system so they are much smaller than virtual machines and many more can fit into a single host. A container is much faster to start up and easier to maintain than a virtual machine, helping developers be more productive.

Although virtual machines can be easily moved around between hosts or even out to the cloud – this is typically under the control of IT departments. This functionality often comes at a cost. With containers, the developers don’t need to ask IT for help. Sharing containers, jointly developing applications and moving them from test to production to the cloud – and back – is quick and easy.

So when comparing containers with virtual machines – its less to do with what they are, and more to do with why they came about in the first place. 

 - Server virtualisation is helping IT departments do more for less.
 - Containerisation, led by Docker, is helping developers do more, faster.

Luckily they can also co-exist, containers can run inside virtual machines -  so organisations can benefit from the best of both. 

Although Docker is not the only container technology, it is the most widely adopted and the most talked about. More information is available at

Sunday 19 February 2017

Internet of Things: the WHAT and WHY

There is a lot of hype about the Internet of Things. There must be something in it - but what?

50 billion devices connected to the Internet. 10 trillion Dollar opportunity. 17 zettabytes of data storage. 

I can read (or make up) statistics as well as the next person, but do these figures actually mean anything? I suspect they are somewhat missing the point.

In order to make some sense of IoT, we need to look at it from two different but inter-dependant perspectives:

  • What it is
  • Why it is

Watch the video version of this post here

The WHY needs to be understood for each and every IoT project. Without some sort of business return, projects will never get off the ground. However, without looking at the WHAT, it's very hard to start imagining how it can help us in the first place. So let's start there.

Most definitions of IoT start with something like "the interconnection via the Internet of devices (things), enabling them to send and receive data." This is not the whole picture. We need to include in our definition what can be done with this new data through analytics or data visualisation. We can potentially discover new business insights to help us either reduce costs, provide new services or revenue streams or to improve our competitive edge.

This all sounds a bit fluffy until you apply it to an example:

  • Car insurance is highly competitive. Premiums are affected by a driver's history with factors like length of driving experience, no claims periods and previous accidents. IoT has enabled insurers to modify premiums based on actual driving behaviours. The addition of a "black box" to the insured car enables tracking of acceleration, speed and position. These devices are connected to the internet via mobile network. Not only can the insurer monitor individual drivers, they can also use data collected across all their customers to start building a much richer picture of risks. For example, identifying high risk areas for collisions. This data could potentially be shared in anonymous form to external parties for a fee. Examples could be the police who may have an interest in areas where speeding is rife, or the Highways Agency interested in accident blackspots. With a better understanding of risks, the insurer can potentially undercut their competition or devise new policies which are tailored for specific demographics.

There are some technology trends which are combining to enable organisations to realise value from IoT projects which would not have been feasible in the past. 

These are why IoT has become such a hot topic.
  • Sensors can be wirelessly connected to the internet within small battery powered modules which can last for years without replacement.
  • Mobile carriers are using IP based protocols and developing low cost narrowband connections. These support use cases which are becoming commercially feasible for the first time.
  • Real-time and historical analytics capability has become affordable and readily available to a wide audience and can be consumed in multiple ways including on-premise or cloud.
  • A number of vendors are productising these solutions and making it easier and quicker for customers to leverage the Internet of Things.
The two main challenges for anyone wanting to cash in on IoT are directly related to our two perspectives:
  • WHAT is it
    Projects span a huge range of technologies from "things" through to data analytics. It will be hard for a singe supplier to provide expertise and services capability across the whole spectrum. A partnership approach will often be the only workable option.
  • WHY is it
    Even with expertise across the IoT spectrum, projects will never fly without a strong business case. Suppliers will need to couple their IoT skills with a deep understanding of their customer's business and engage using a consultative approach.
Finally, IoT projects can be bespoke and costly. The secret to success will be the identification of verticals where solutions can be developed with a wider market appeal.

Tuesday 25 October 2016

Exploring hyper-converged without the hype

Hyper-converged solutions from a number of vendors (new and old) have grabbed the headlines and become popular with organisations which need a simple, small footprint package including hypervisor, compute, networking and storage which can scale-out to meet their needs. This architecture is being touted as the "software defined" future of the data centre which can eliminate IT silos and fit into any size of organisation.
By using software to both glue the components together and to provide a simple management interface, these systems can be deployed quickly and are space efficient with a low management overhead.

Hyper-converged infrastructure (or HCI) also has the advantage that all the hardware components are housed in a standard X86 server chassis (often with more than one server "node" per box) which makes it high density and low cost.


Watch this short video to see how HCI compares to traditional and converged solutions.



 However, there are a few limitations to watch out for:

  1. You cannot grow storage capacity without investing in compute too (which you may not need). 
  2. Performance will be restricted by the network connecting the system nodes – the more you add, the greater the inefficiencies.
  3. The virtual San within the solution can only be accessed by virtual machines. So any non-virtualised database applications may not be able to use the storage. Note some solutions may allow an iSCSI connection to the virtual SAN, but may not provide the low latency and performance needed.
  4. Although it uses relatively cheap hardware, the software to run the solution is not. Experience shows that in larger systems, traditional or converged solutions may be lower cost.
  5. Although designed to eliminate silos by providing a single scalable infrastructure, hyper converged can become its own silo when it cannot meet the requirements for all applications.
  6. Database applications are usually licenced by the processor cores that they "touch". In a hyper-converged solution, this could be the entire cluster meaning high cost or breaches of licence agreements as nodes are added. 
Some other considerations:
  • It's worth checking out a number of vendors as there are differences between them and specifications are also evolving very fast.
  • Consider whether you want an appliance type solution or a software only approach (meaning you can build your own solution).
  • Think about your hypervisor choice and how that fits with the rest of your estate. You will probably want to be able to migrate virtual machines across datacentres and maybe out to the cloud. So don't let the hypervisor choice stand in the way of this.
So in summary, hyper-converged or HCI can bring huge benefits in terms of flexibility, operational costs and the speed of deploying new applications, however there are some limitations which need to be understood. In other words, go in with your eyes wide open.

Thursday 19 December 2013

Object Storage - unsung hero of Cloud and Big Data

They say that information is power, but in a world where the majority of information is digitised and stored electronically, it can only be realised if the information can be found and within the context of what you need to know. Digital information needs to be contextual and retrievable. 

This has long been solved in the case of structured data which resides within databases. These provide relational context and indexes for search and retrieval. But what about unstructured file data which is where the vast majority of information now lives? The current term for this challenge is Big Data. 

One of the key technologies which will help with this challenge is little known and does not come with a fancy title – it is called object storage.

To best explain what object storage is we need to start with the ones and zeros of data, then work our way towards the information that we intend to work with.

Hard drives are at the heart of most storage systems today. Data is retained in blocks of ones and zeros. SAN storage systems expose these blocks to the applications running on connected servers. This can be highly efficient, for example with databases which require fast granular access to the data.

Users however, typically need to work with files like spreadsheets, Word documents, slide-decks, images, emails etc. Each file comprises of a sequenced number of blocks of data which together combine to make the file. Furthermore, the file must reside in a file system which provides a nested folder structure so its location can be indexed. This functionality is provided by an operating system which sits logically above the hard drives. This can be integrated into the storage system (making it a NAS device) or can be external in the form of a server (File Server) or dedicated appliance (NAS gateway or header).

SAN and NAS respectively provide block and file access to data and this has historically catered for most organisations’ needs. However, there are some shortcomings. The fastest growing form of data is unstructured or file data. Files are getting bigger, more numerous and rarely get deleted. As file systems get larger, so they slow down and in fact have hard limits to their scalability. They are also restrictive in the way that information can be searched due to the non-contextual nature of the nested folder structure (which is determined largely by individual users). This system works OK at an individual level because we all apply some logic to how we organise our files. However, at an organisational level, this logic is unknown. So for instance, finding all files which contain “confidential information” relating to a particular customer, might be nigh on impossible. 

With object storage, we have the ability to build context and structure right into the file itself. To achieve this, we wrap the file with “metadata”. This is information about the file. The combination of data and metadata is called an object.

This sounds simple, but the implications are enormous.

Scalability and Performance. Object stores remove the nested folder structure which is the barrier to very large datasets. Information is found instead by searching through the metadata for what you need. It is similar to the way search engines find information on the internet. If you like your files structured by date, by customer, by whatever – not a problem. This can be added into the metadata. You can even simulate a whole nested folder system by including the path in the metadata for each object.

Big Data. Because objects have contextual information included, organisations can search across datasets and extract just the information they need. A great example would be in a hospital environment. It is quite possible to store patient records such that personal identifiable information is retained in metadata and access to this restricted. Analytics could be run across patient data with total anonymity retained.

Cloud. Massive scale plus the ability to apply security policies to data based on fields within the metadata makes object storage a great solution for service providers who store data on behalf of customers. Storage can be carved up into virtual containers based on who owns the data (multi-tenancy). This is exactly what the likes of Amazon, Google and many others are basing their businesses on.

Archive. One of the key uses of metadata is to incorporate a checksum. This essential piece of information tells the system if the object is valid. In other words, the system will know if the object has been changed by a user, corrupted or even deleted. In fact, object stores are typically set up as WORM  “write once read many”. This means that objects are not actually changed, rather a new version is created. This provides the ability to roll back to previous versions or prior to deletion. 

The combination of checksums and WORM functionality make object stores ideal for long term archival of data.

Firstly, by automatically keeping a copy of each object, the system can be made self-healing. If an object is corrupted or lost, the system knows and can recreate the original from the copy. Object stores are mostly designed with a scale-out architecture. If the store is spread across two geographic locations with a copy in each, then arguably the data no longer needs backing up to tape. This in itself can dramatically reduce operational costs.

Secondly, each object carries its own integrity, meaning that objects can be migrated from one store to another whilst retaining proof of its original content. This also means that the underlying hardware can be upgraded as new technology is developed and the objects remain intact indefinitely (with full chain of custody). Now, the data can outlive the infrastructure and the applications which created it.

Thirdly, the metadata can include information pertaining to confidentiality and retention period. Object stores can act upon these details making them ideal for enforcing legal compliance and governance policies.

So you may ask, why are object stores not taking over the world of storage?

Well in one respect, they already are. Many cloud providers have built infrastructures on home grown object storage technology. This has given them leading edge advantages over traditional solutions which is a strong reason why they have not been advertising these facts too loudly. Vast quantities of data are sitting in object stores. IDC forecast that worldwide revenue for file-based and object-based storage will reach $38 billion by 2017, a huge jump from the market's estimated $23-billion-plus revenue in 2013.

As organisations increasingly look to the benefits of cloud type infrastructure, whether public (outsourced) or private (in-house) or hybrid (combination), so object storage will become a key consideration. Traditional file systems will become restrictive and impact business, but it’s a matter of scale. The cloud service providers are leading the charge, but many others may well follow. 

Object storage does require change in terms of technology and significantly in terms of processes. These may be reasons for slower adoption. Another concern might be that object stores have not yet become standards based meaning there is potential for vendor lock-in. Some solutions are more open than others.

Is object storage right for you?

This will of course depend on your situation. If any of the following apply to you, then object storage could be a consideration:

       Massive growth in unstructured data which is straining traditional storage systems.

       Long term retention or specified retention term for data especially where chain of custody is a requirement (data compliance and governance).

       Archive economics – when the savings associated with archiving data way from primary storage and out of the tape backup cycle can outweigh the costs of implementing an active archive with object storage, creating a business case for change.

       Big Data – when silos of data, by application or physical storage, are impeding the ability to find and retrieve information for analysis or decision support.

       Cloud services. When you need to provide a multi-tenanted data service to your customers – whether internal or external to your organisation.

Object storage will not replace traditional SAN and NAS solutions, these have some advantages of their own for many applications. However, as unstructured data continues to grow, object storage will become a complementary and commonplace addition to storage infrastructures everywhere.

Unclear about SAN and NAS - check out