A guide to OCI container runtimes

About container managers and runtimes

When working with containers, most people immediately think of the tools to interact with them: docker, podman, kubernetes etc. But those tools are only part of the infrastructure. At the heart lies a runtime, which actually sets up the container including isolation mechanisms and networking. Building on that is a container manager, which handles external tasks like restarting failed containers or starting containers at boot. Finally, a user interface (even CLI tools) is used to interact with both of these to allow humans simplified control over the container's lifecycle.

Think of it as a car: a container runtime is like the engine, the heart of the vehicle but useless on it's own. The container manager are the parts to make it functional as a car, such as gears and wheels. The user interface are all the parts that allow humans to control the vehicle, like the steering wheel, gas- and break pedals etc.

You can use any of these runtimes with your favorite management tools like docker, podman or k8s, to change the internal speed and security tradeoffs for containers according to your preferences.

While there are many different kinds of container environments, we are focusing on runtimes that are fully compatible with the standards defined by the Open Container Initiative (OCI), also called oci-compliant runtimes. This excludes other runtimes that are not (yet) fully compatible with the broader container ecosystem like systemd-nspawn, LXC and firecracker.

runc

The most popular and oldest runtime is runc, initially part of the docker ecosystem. It has since been isolated from docker itself to allow other container managers to interface with it, sparking the creation of different container managers like CRI-O with a focus on only implementing the Container Runtime Interface (CRI) of kubernetes (aka only the features kubernetes needs to run).

It is widely used, for example as the default runtime in docker (through containerd), podman and CRI-O. However, this runtime shares the host kernel with running containers, filtering syscalls with seccomp and apparmor, but leaving some attack surface potentially allowing applications to compromise the host through kernel-level exploits, making it not suitable for systems with high security requirements.

Best used for: general-purpose containers where ease of use and compatibility are preferred.

crun

Developed by Red Hat as an alternative to runc, this runtime chose the C programming language for its implementation. This resulted in significant speed improvements (especially startup times, benefitting short-lived containers) and a reduced memory overhead (good for cheap / embedded systems). It is less compatible with older operating systems and tooling, choosing to rely on modern features like cgroupsv2 for cutting-edge speed and feature support instead.

Just like runc, it shares the host kernel with containers (with the same filters as security mechanisms), which exposes it as potential attack surface, even allowing host compromise in worst-case scenarios, making it unsuitable for systems with high security standards.

Best used for: Container workloads using short-lived containers (where the startup speed really matters) or low-powered/embedded systems (benefitting from reduced resource footprint).

gVisor

Google developed gvisor as an alternative to common container runtimes at the time that shared the host kernel and relying on system call filtering for isolation. It instead intercepts and handles syscalls in userspace (pseudo-emulation for syscalls) separately for each running container. This isolates the container much better from the underlying host system and heavily reduces the risk of kernel-level exploits, although not completely and with some new attack surface (the filter implementation itself).

The gvisor kernel filter doesn't implement the full syscall api, only necessary portions to run containers. Emulating syscalls per container in userspace is much slower than traditional container runtimes for the CPU, but also significantly less memory-intensive than full virtual machines.

Best used for: Security-focused deployments that run on hardware without virtualization support, or memory-constrained workloads that have cpu resources to spare.

Kata containers

This runtime is more of a translation layer, using full KVM-based virtual machines under the hood, but managed through the OCI runtime interface. This hardware-based virtualization provides best-in-class isolation of containers and host system, while leveraging the KVM infrastructure to almost mitigate the performance penalty of the virtualized kernel. That said, running VMs introduces significant memory and networking overhead, drastically amplified by hardware without modern virtualization features.

Best used for: Security-focus deployments that depend on CPU speed and have spare memory, or that need best-in-class isolation between containers and the host system.