Virtualization has been at the core of production environments for decades, but in an ever-changing form. The latest of these variations are cloud and hyperconvergence, the latter is explained in this article.
What is hyperconverged infrastructure?
In a technical context, hyperconverged infrastructure (HCI) refers to a deployment that offers software-defined networking, storage and virtualization on a cluster of nodes. The important distinction to cloud deployments is that all these services are hosted on every node in the cluster, instead of having separate clusters for storage or compute.
Hyperconverged deployments are heavily opinionated to offer easier setup and high-availability failover of workloads, at the cost of flexibility.
Comparing hyperconverged and cloud stacks
At first glance, cloud and HCI deployments seem very similar: They both offer high-availability, durable storage, software-defined networking and virtualized workloads, but the way they reach this goal differs. Cloud uses clusters of separate machines per component / service:
HCI runs all necessary services on all cluster nodes instead:
The most important differences when comparing them are:
Cloud stacks use separate internal services to provide networking, storage and compute, whereas HCI uses every node to store data, run VMs and connect them.
Cloud offers high flexibility at any scale, allowing operators to pick and choose the best fit for every component like storage. HCI offers only one option for each task, but with higher automation.
Cloud infrastructure requires careful planning and maintenance, slowing down operations. HCI leans heavily into automation and ease of use, cutting down time and expertise needed to run and maintain.
Cloud stacks prefer reliability at scale, while HCI focuses on ease of use and making the most of available hardware.
Open source HCI products
Most people think of vmware or nutanix when talking about HCI, but open source has mature and practical alternatives that need to be mentioned. There are many good contenders for HCI products in the open source world, but they fall into two separate categories:
The first are "true" HCI, with opinionated components and high ease of use. These support only a specific option for storage, networking etc, offering better automation and easier setup. Adding a node to the cluster immediately makes it's disks available for storage, cpu for VM workloads and NICs for networking - no further setup required. Proxmox VE is the most popular in this category, with opensuse virtualization (formerly harvester) quickly gaining ground as an alternative.
The second type are softwares that offer multiple options for deployment components, with the correct combination resulting in a hyperconverged cluster. These require more careful planning up front, but offer more flexibility in the long run, as they may be able to adapt to changing needs down the road. They are more complex to operate and skirt the line between cloud stacks and HCI solutions, with popular options like XCP-ng and ovirt being commonly used.
It should be noted that the second type is slowly losing ground, as decision makers favor either complete ease of use and automation of "true" HCI, or jump directly into the extremely scalable cloud deployments. Taking a step inbetween is not economical for many businesses.
When to use cloud vs HCI
You should view HCI as a "step towards cloud stacks". While this view is oversimplified, it brings the core idea across quite well. HCI is a step up from traditional VM hosts, offering high-availability and cutting down on operations and maintenance costs through automation. This frees up resources for small to medium businesses to scale and reinvest while making the most of the hardware they purchase.
When a business keeps growing beyond a few hundred machines, HCI eventually offers diminishing returns. The CPU overhead of disk read/write operations and networking add up at scale, increasingly eating into the resources available to VMs. Networking eventually becomes congested with disk replication, vm migrations and normal vm traffic and VMs cannot use more resources than the largest node in the cluster offers. These factors will force a large business to eventually leave HCI and traditional virtualization behind to invest into cloud stacks despite their high initial cost and increased operational complexity.