Article updated June 10th 2014 – scroll to Update section below for updated comments:

 

Software defined seems to be the latest buzz phrase doing the rounds recently; software defined storage, networking, datacentres, hitting the marketing feeds and opinion pieces as terms like Cloud are now considered mainstream, and not leading edge enough for the technology writers and vendors looking for the next paradigm.

Because Nutanix supply their hyperconverged compute and software solution with hardware there have been many comments that their product isn’t truly software defined; but it is, despite the hardware, and this is why.

In everything that they do Nutanix are a software company. Their product is the Nutanix Operating System (NOS), which forms part of the Virtual Computing Platform. They do not produce any custom hardware, everything that NOS runs on is commodity x86 hardware, no custom ASICs, drives, NICs, boards, etc. The reason they provide hardware with their software solution is very simple – supportability and customer service.

I run a modest hosting company and being extremely budget conscious (as in, I didn’t have any!) I looked for the cheapest route to market that I can, while still feeling somewhat secure about the service I provide. The problem is that this is a lot harder than you may think, and in the complex world of virtualisation hardware compatibility is still very much there; it may be abstracted away from the guest VMs, but the poor old infrastructure manager has it in spades.

Last year I had two problems that showed this in high relief:

The first was a BIOS issue we encountered soon after buying 4 identically specified Dell PowerEdge 715s in May 2013. It was not long after they entered production that we began seeing random PSoDs (the good old VMware PURPLE screen of death when it kernel panics) on these servers. Multiple calls to both Dell and VMware resulted in attempts at hardware replacement, then it surfaced that a bug had been introduced into the BIOS at version 3.0.4.

This took 3 months and two attempts to be finally fixed at 3.1.1, but even then we had some (but different PSoDs) occurring at random. After another two months this eventually was identified to the SD cards we used to boot ESXi and an interaction with an SD card on the DRAC remote management card. Disabling the DRAC SD card stabilised the system, some 7 months after purchase.

The second was the need to go to SSD storage for workloads which were soaking our poor Equalogic and other HDD SANs. Again budget was so limited that we turned to building a SuperMicro box, filling it with Samsung 840 Pro drives (at the time not Enterprise but recently reclassified in use as they are so brilliant at what they do), and putting Starwind (on Win2K8R2) across the top. This solution worked brilliantly and has never failed, but is at the end of the day single controller by it’s nature –  so it was always at the back of my mind that a nasty shock was waiting for me if a DIMM popped or even if Windows crashed (hardly unlikely in my experience).

Before you ask, no we couldn’t afford HA at that time – all those extra disks, duplicate chassis, and increased StarWind license fees made it eyewatering. We have today created just such a beast in our new facility, and the HA nature of it now goes some way to assuaging the fact that it still is a home grown all flash SAN solution – just one with a better uptime potential.

So when the opportunity arose to create a second site that our production services could move to then it was time to bite the bullet. After a number of investigations including Maxta and Simplivity (VSAN still in beta at this time) I decided on Nutanix to deliver this new platform.

Now nobody can accuse Nutanix of being cheap – it is targeting an Enterprise space and I am definitely not Enterprise. In fact I may have one of the smallest companies on Earth that has a production implementation of this, it took a significant bite out of my turnover let alone IT budget. However what attracted me was that the hardware was fully tested and compliant, such that should a problem occur there was no finger pointing from the “software” vendor and wasted days before actually engaging support on the problem.

This in a nutshell is where the value lies in the Nutanix appliance. It is fully tested and certified to work with their software so that should you call support they get right on what software problem it could be and don’t start blaming BIOS, firmware, RAID card types, NICS, drives, etc., etc., ad nauseam.

It may be that one day Nutanix has a software only SKU available – it can be done today technically but isn’t on the price list – but they would have to enforce a rigorous HCL to make sure their incredible support levels weren’t diluted. In this case the hardware may not be that cheap to independently source as people reckon, and you need third party support on top (included for hardware and software with the appliance), and of course if Nutanix aren’t selling hardware they either take a hit on their bottom line or increase their software prices – nobody is in this for kudos, it’s business!

In summary I chose Nutanix for one reason above the obvious raft of Enterprise features I was gaining, to be able to sleep nights knowing my home grown or loosely cobbled together hardware solution wasn’t going to go bump in the middle of the night.


 

Update:

A number of comments and articles have subsequently appeared (this being a prime example https://www.theregister.co.uk/2014/06/09/hyper_converged_kit_what_for/) which this blog entry seems to be pre-written to debunk. However one point I didn’t make, because my tests were subsequent to posting this entry, was the time to deploy.

I took a bare, out-of-the-box, Nutanix block, racked it, connected to 10G switch, and powered it up. I had their Foundation software already running on my laptop, so when the nodes came online they appeared in the Foundation console through IPv6 link local automatic scanning.

I then hit the install button and over the course of the next 40 minutes all I had to do was watch a blue progress bar run from 0% to 100% – and at 100% I had a fully configured Nutanix cluster with running and base configured (as in accessible) ESXi5.5 hosts.

In the following 20 minutes I installed a vCenter 5.5 appliance on one of the hosts, and on power up did the base configuration on port 5480, logged into vCenter on VMware Client, added the hosts, and therefore in 1 hour had a functioning ESXi5.5 cluster.

The Foundation software can do this to 5 blocks (20 nodes) simultaneously! So it’s reasonable to assume that even pretty large VMware clusters can be built out of the box in a day.

Can anybody else really do that?

There won’t be many.

Also, although this was VMware the same can be done on Hyper-V, with Foundation installing and base configuring Windows 2012R2 in parallel on the nodes.

For KVM it’s even quicker as the nodes are pre-installed out of the factory and all you need to do is build the Nutanix cluster (in parallel), so theoretically you are looking at 20 minutes.