A practical guide to filesystems

Table of contents

Filesystems define the layout and structure of data on harddrive partitions. Each filesystem comes with its own set of advantages and drawbacks, making the choice of filesystem quite difficult for operators that aren't familiar with the topic. This article aims to provide a guide on the most common filesystems, why and when to use them, and what to watch out for.

FAT32: Universal compatibility

The most widely compatible filesystem still in use. Its simple design has made it a prime target for support from basically any device: New and old computers, smartphones, cameras - just about anything will recognize and understand a FAT32 formatted storage device.

This unrivaled level of compatibility makes FAT32 an easy choice for shared flash storage, like USB drives or SD cards.

The old age of FAT32 is both blessing and curse: While it enabled the compatibility with any system, it also carries size restrictions that seemed reasonable decades ago. A FAT32 system cannot be larger than 32GB (technically 2TB, but most implementations don't support more than 32GB per FAT32 partition), and no file in it can be larger than 4GB. It also lacks support for file permissions, encryption and defragmentation, making it unsuitable for long-lived needs (like harddrives).

exFAT: Modern compatibility

Aiming to replace FAT32 as the compatibility filesystem, exFAT increased maximum storage and filesize limits to 128PB, while remaining relatively simple in design.

It is a great choice for a filesystem shared between different modern operating systems, for example an external harddrive used by Windows, MaxOS and linux machines.

While it improves on the most important shortcomings of FAT32, it also lacks its backwards compatibiltiy with older devices. exFAT is a solid choice for a solid state storage (usb drive, external harddrive) that is shared between different modern operating systems, but fails to compete with more mature filesystems for home computing or server storage use cases (slower than ext4/NTFS, lacks journaling to prevent data loss during power outages etc).

ext4: Default for Linux

The fourth iteration of the extended filesystem is a mature and fast choice, the default choice for linux installations and the successor to ext2 and ext3. It is heavily relied upon in many usage scenarios, from handheld devices to personal computers and enterprise servers.

Key features include the journaling system that provides some protections against data loss and corruption, its good performance for most computing use cases, and the wide support from the linux community.

While ext4 is a reliable and fast choice for most computing tasks, it may fall short in specific edge cases: The performance starts to visibly degrade beyond 50TB storage size or when exceeding 10k subdirectories. Advanced features like compression and deduplication aren't well supported and the filesystem can only be expanded while mounted, shrinking can only be done on an unmounted partition.

These limitations are rarely an issue outside of large-scale storage setups, making ext4 a solid choice when a reliable filesystem with good performance is needed.

NTFS: Default for Windows

Being the long-standing default for Windows operating systems, NTFS is still widely used on personal computers and workstations. It includes uses journaling to protect the stored data, has reasonable performance and support compression out of the box.

While Windows prefers this filesystem, it has limited support from other operating systems, being read-only in MacOS and needing extra software to work in linux at all. It is slightly slower than ext4 and suffers from data fragmentation, requiring frequent resource-heavy defragmentation passes.

It remains a solid choice for Windows computers, but cannot compete with other filesystem options outside of this use case.

XFS: Optimized for large files

Similar to ext4, the XFS is a reliable and fast filesystem, but optimized for large files. It combines the robust journaling approach for data protection with improved parallel I/O operations (using B+ trees for directory lookups) and a low fragmentation algorithm for writes (so files are written as large consecutive chunks, reducing seek operations). It scales well with higher number of CPU cores and offers best in class performance for files larger than 4GB.

However, the optimization towards large files has drawbacks too: Smaller files, especially those with less than 4KB, incur a much higher overhead than in ext4, the filesystem cannot be shrunk (only expanded), and it is not as widely supported as other options (limited availability of recovery tools etc).

It is best used for systems that need to deal with many large files, like download mirrors or some types of databases.

ZFS: Enterprise-grade filesystem

Comparing ZFS to other filesystems may be a little complicated, because the ZFS suite also includes volume management (think LVM), snapshots and redundancy (RAID) out of the box. This comparison will ignore those features, as other filesystems (in linux environments) can get the same benefits through LVM.

The ZFS filesystem offers strong data integrity through checksums and a copy-on-write (CoW) approach, avoiding the overhead incurred by journaling systems. It is highly scalable (up to 256 quadrillion zettabytes!), supports extensive performance tuning per use case, advanced caching, encryption, compression and deduplication.

ZFS can efficiently handle large sequential reads (e.g. streaming large files) and has built-in self-healing capabilities with scrubbing to protect against data corruption.

All those shiny features come at a cost: ZFS needs much more hardware resources to just to run, recommending 1GB memory per 1TB of storage and the compression and deduplication computations need orders of magnitude more CPU cycles than simpler filesystems without those features. While the many configuration options make it flexible to fit many enterprise use cases, it can also be too complex to setup and run for less experienced operators, requiring specially skilled employees to manage effectively.

A ZFS is best suited for large scale storage systems that have the CPU/memory resources to run it and need gain benefits from the advanced deduplication, compression and encryption capabilities, and where data integrity is critical.

Btrfs: The problem child

Btrfs was meant to be a competitor to ZFS, offering largely similar features, but it failed to reach that goal so far. From the beginning, it was riddled with issues. The slow bugfixes and unreliable implementation of critical features like RAID 5/6 support made many professional users lose confidence and the popularity and adoption of ZFS has reduced the Btrfs community size, further amplifying the problem.

It remains under active (albeit slow) development from a small but determined community, with some commercial backing. At the moment, it is interesting to play around with, but a poor choice for enterprise or production settings, offering no benefits over ZFS. That said, it's GPL license allows it to be built into the kernel directly, so it has some applications for scenarios that don't need enterprise features, like desktops and less critical servers.

Honorable mentions

While the list above contains the most common filesystems in use today, some old ones have carved out a niche that their modern replacement cannot quite satisfy.

One such old system is ext2. In terms of features, it definitely cannot keep up with ext4, but this lack of overhead also makes it incredibly lightweight for the system running it. Setups that need to get every last bit of I/O capacity out of their filesystems still find themselves looking at ext2 as a viable option, especially when data integrity is not a major concern (e.g. when running on hardware RAID).

Another such filesystem is ReiserFS, as its approach to storing metadata allows it to perform well for large amounts of small or tiny files. Where other filesystems like ext4 struggle with large amounts of files smaller than 4KB, ReiserFS was specifically optimized for this use case and can outperform more modern filesystems for this use case even today. It remains efficient even when a single directory contains more than 100k files, whereas ext4 will start to struggle with more than 10k items per directory.

Note that these filesystems aren't a good choice for needs outside of their specific niche, and are not as actively maintained as the options above.


More articles

A primer on LVM

Making the most of your storage devices

Automated security hardening on RockyLinux with OpenSCAP

Securing enterprise linux in less than a minute

Choosing the right RAID setup

Making sense of pros and cons for RAID configurations

The downsides of source-available software licenses

And how it differs from real open-source licenses

Configure linux debian to boot into a fullscreen application

Running kiosk-mode applications with confidence

How to use ansible with vagrant environments

Painlessly connect vagrant infrastructure and ansible playbooks