An introduction to structured logging

Table of contents

Almost every piece of software generates logs throughout it's operation. While it is important to have this introspective into running services for debugging or fixing errors, less attention is typically given to the format of those log messages - until structured logging started gaining popularity.

The problem with traditional logging

The common approach for logging is to simply output free-form text messages with contents and format decided by the developer of the software. A simple log line may look like this:

24-mar-2020 WARN high memory usage

This line already includes a lot of valuable information, but it lacks a clear structure. A human may quickly be able to figure out the format, but to parse it for an automated system like alerting or remote log storage, significant time will be spent on text processing for each line. Another problem is the lack of context: While the developers may know where that log line came from or what may have caused it, the operators / administrators running it in production likely won't.

Benefits of structured logging

Structured logging simply refers to using logging formats that isolate the components of a log message into key-value pairs and formats it's output in a way that is readable for both humans and machines alike. A common approach is to use an equals sign between key and value, or simply encode logs as JSON. The log message from earlier might look like this:

date=24-mar-2020 level=WARN msg="high memory usage" request=c19172gsiw user=a-52
{"date": "24-mar-2020", "level: "WARN", "msg": "high memory usage", "request": "c19172gsiw", "user": "a-52"}

The immediate benefit is that this logging format allows for as many attributes as needed while still remaining easily parsable for machines and allowing humans to quickly skip to the portion of information they are looking for within a message. With traditional logging, adding the 2 new fields to the message would have made it long and difficult to parse on the fly; adding even more may have made it too complex to be useful. Structured logging circumvents this problem entirely, allowing developers to attach more necessary context to log messages and enhance debugging significantly.

This predefined formatting also helps when integration applications into monitoring and alerting systems, as the individual parts of the message are already separated, so writing custom parsers and filtering rules will often be unnecessary. Querying for specific fields or values will be supported by most log storage systems, allowing the team operating the software to quickly figure out what happened before an error occurred, how to reproduce and ultimately resolve it.

Tradeoffs and drawbacks

While structured logging brings significant advantages over the chaotic free-form text logs used traditionally, they are not without drawbacks. Depending on the chosen format, emitting logs may need more resources then before, for example when using JSON, the message will first need to be encoded as a JSON object as opposed to just writing a string to the log output. This performance overhead may also be replicated on dependent systems like alerting services, where structured logs need to be parsed again before their faster and more precise querying advantage can take effect.

Depending on the desired format of the log messages, storage size may also increase. As a rough estimate, structured logging will take around 10-30% more storage space than a traditional unstructured logging message.

Finally, for existing software it may not be feasible to migrate to a new logging approach, as logging may well be spread throughout all parts of the software, requiring a large-scale refactor of the codebase when migrating logging commands.

While structured logging has benefits for new applications using modern approaches to monitoring and alerting based on log messages, it can be a bad tradeoff to migrate existing software to this new approach. Performance overhead must also be considered, as systems emitting large amounts of logs may see an increase in computational expense spent on logging with a more structured format. It should be noted that these issues should be rare in modern applications, that typically distribute the application logic throughout multiple smaller modules or microservices, allowing them to scale independently as needed.

More articles

Builtin HTTP routing in Go

Fully featured web services with standard library only

Operating postgres clusters on Kubernetes

Automating backups and failover with the kubegres operator

Writing kubernetes manifests by hand

Writing valid and secure object definitions

Working with tar archives

...and tar.gz, tar.bz2 and tar.xz

Understanding the linux tee command

Copying stdin to multiple outputs

Protecting linux servers from malware with ClamAV and rkhunter

Finding malicious files the open source way