Finding common bugs with valgrind

One of the oldest security vulnerabilities haunting software to this day are the classic buffer overflow/underflow exploits, caused by mismanaged memory and missing boundaries. And almost as old as this problem is valgrind - a program on a mission to catch common bugs automatically.

Installing valgrind

Valgrind is commonly viewed as a tool to find memory leaks, but it is actually a suite of tools to find a wide range of common software issues in compiled code, from possible deadlocks to memory issues. It is best used for C/C++ applications, but can be used for anything (garbage-collected languages like go/native java can produce a lot of false positives, prefer specific tools for those). Be aware that catching some of these bugs is more than just software quality, as some can be used to exploit the program and potentially cause significant damage to the host system, compromise data confidentiality and so on.

On debian-based systems, the entire suite can be installed through apt:

sudo apt install valgrind

Basic usage

Let's start with a simple memory leaking program:

#include <stdlib.h>

int main() {
   int *arr = malloc(10 * sizeof(int));
   return 0; // allocated memory is not freed before exit
}

Using valgrind is incredibly easy, but benefits from executables compiled with debugging symbols. For gcc, that is as simple as adding the -g flag:

gcc -g -o main main.c

Now simply run valgrind and pass it the command to run the executable:

valgrind ./main

The output will look similar to this:

==4136328== Memcheck, a memory error detector
==4136328== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==4136328== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==4136328== Command: ./main
==4136328== 
==4136328== 
==4136328== HEAP SUMMARY:
==4136328==    in use at exit: 40 bytes in 1 blocks
==4136328==  total heap usage: 1 allocs, 0 frees, 40 bytes allocated
==4136328== 
==4136328== LEAK SUMMARY:
==4136328==   definitely lost: 40 bytes in 1 blocks
==4136328==   indirectly lost: 0 bytes in 0 blocks
==4136328==     possibly lost: 0 bytes in 0 blocks
==4136328==   still reachable: 0 bytes in 0 blocks
==4136328==        suppressed: 0 bytes in 0 blocks
==4136328== Rerun with --leak-check=full to see details of leaked memory

The number on the left is the process id (aka PID) of the program tested by valgrind. On the right, some information about the identified memory leak is displayed, like the amount of bytes lost and how many allocations contributed to it.

While this output is helpful to check if the application contains a memory leak, it doesn't help much in identifying its cause. But as the output suggests, adding --leak-check=full changes this:

valgrind --leak-check=full ./main

Now a few more lines appear within the previous output:

==4137928== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==4137928==   at 0x48417B4: malloc (vg_replace_malloc.c:381)
==4137928==   by 0x10914A: main (main.c:4)

As you can see, it is now displaying the exact root cause of the memory leak as main.c:4 which means file main.c line 4 - which is exactly where the unfreed memory allocation happens.

Finding memory issues

Now that we have seen valgrind correctly spot a memory leak, we can discuss the other memory issues valgrind can find, which are often overlooked.

One of the most important ones is usage of uninitialized variables:

#include <stdio.h>

int main() {
   int x;
   printf("%d\n", x); // Using x without initialization
   return 0;
}

Note how the variable x is printed but never assigned a value. Valgrind will easily detect this issue as well:

==4146349== Conditional jump or move depends on uninitialised value(s)
==4146349==   at 0x48D0027: __vfprintf_internal (vfprintf-process-arg.c:58)
==4146349==   by 0x48C565A: printf (printf.c:33)
==4146349==   by 0x109159: main (main.c:5)

Reading the output, it is obvious that the problem appears in main.c in line 5, where the printf() function was called.

Similarly, valgrind's memory checker can also find out-of-bounds reads and writes. Take this sample program:

#include <stdlib.h>

int main() {
   int *arr = malloc(5 * sizeof(int));
   arr[10] = 10; // out-of-bounds write
   free(arr);
   return 0;
}

Running valgrind, you will get a message like this:

==4148594== Invalid write of size 4
==4148594==   at 0x109167: main (main.c:5)
==4148594== Address 0x4a57068 is 20 bytes after a block of size 20 alloc'd
==4148594==   at 0x48417B4: malloc (vg_replace_malloc.c:381)
==4148594==   by 0x10915A: main (main.c:4)

The last type of memory issue is rather rare but very problematic if unchecked: double memory freeing. Calling free() on a pointer once frees the memory, doing it again on the same pointer results in undefined behavior. Anything, from exploitable vulnerabilities to immediate crashing can happen. Here is a sample program:

#include <stdlib.h>

int main() {
   int *ptr = malloc(sizeof(int));
   free(ptr);
   free(ptr); // freeing pointer twice
   return 0;
}

Valgrind correctly finds and reports this problem as well:

==4151889== Invalid free() / delete / delete[] / realloc()
==4151889==   at 0x484417B: free (vg_replace_malloc.c:872)
==4151889==   by 0x109176: main (main.c:6)
==4151889== Address 0x4a57040 is 0 bytes inside a block of size 4 free'd
==4151889==   at 0x484417B: free (vg_replace_malloc.c:872)
==4151889==   by 0x10916A: main (main.c:5)
==4151889== Block was alloc'd at
==4151889==   at 0x48417B4: malloc (vg_replace_malloc.c:381)
==4151889==   by 0x10915A: main (main.c:4)

As you can see, even the default memcheck tool from the valgrind suite does much more than finding simple memory leaks.

Detecting memory race conditions

Many modern applications make use of threads to spread their workload across multiple cpu cores effectively. These threads often need to interact with one another or the main program, which they do by sharing memory. This introduces a new problem: unsafe memory reads or writes, specifically where two threads compete over the same memory, also know as a "race condition".

Valgrind includes the helgrind tool to find specifically race conditions based around shared memory.Take this example program:

#include <pthread.h>
#include <stdio.h>

int counter = 0;

void* increment(void* arg) {
   for (int i = 0; i < 10000; ++i) {
       counter++;
   }
   return NULL;
}

int main() {
   pthread_t t1, t2;
   pthread_create(&t1, NULL, increment, NULL);
   pthread_create(&t2, NULL, increment, NULL);
   pthread_join(t1, NULL);
   pthread_join(t2, NULL);
   printf("Counter: %d\n", counter);
   return 0;
}

Running the helgrind tool on it:

valgrind --tool=helgrind ./main

will report the race condition in incrementing the shared counter variable:

==4163103== Possible data race during read of size 4 at 0x10C02C by thread #3
==4163103== Locks held: none
==4163103==   at 0x10916A: increment (main.c:8)
==4163103==   by 0x484C7D6: mythread_wrapper (hg_intercepts.c:406)
==4163103==   by 0x49051F4: start_thread (pthread_create.c:442)
==4163103==   by 0x4984AFF: clone (clone.S:100)
==4163103== 
==4163103== This conflicts with a previous write of size 4 by thread #2
==4163103== Locks held: none
==4163103==   at 0x109173: increment (main.c:8)
==4163103==   by 0x484C7D6: mythread_wrapper (hg_intercepts.c:406)
==4163103==   by 0x49051F4: start_thread (pthread_create.c:442)
==4163103==   by 0x4984AFF: clone (clone.S:100)
==4163103== Address 0x10c02c is 0 bytes inside data symbol "counter"

While this would have been difficult to spot for many humans, the automated tools makes finding it very easy.

Finding other threading issues

There is a lot more that can go wrong with multithreaded programs than race conditions on memory, which is where drd comes in. It is a more advanced thread checking tool in many ways, but not a complete replacement for helgrind - there are still edge cases where a memory race condition can go unnoticed by drd, but helgrind would spot it. That said, drd can spot a lot more thread-related issues than helgrind, from deadlocks and mutex issues like double locks to accessing unlocked memory access.

Here is simple double locking issue:

#include <pthread.h>
#include <stdio.h>

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void* routine(void* arg) {
   pthread_mutex_lock(&lock);
   pthread_mutex_lock(&lock); // locked twice
   pthread_mutex_unlock(&lock);
   return NULL;
}

int main() {
   pthread_t t1, t2;
   pthread_create(&t1, NULL, routine, NULL);
   pthread_create(&t2, NULL, routine, NULL);
   pthread_join(t1, NULL);
   pthread_join(t2, NULL);
   return 0;
}

Running drd against this program:

valgrind --tool=drd ./main

quickly identifies the issue:

==4173746== Recursive locking not allowed: mutex 0x10c060, recursion count 1, owner 2.
==4173746==   at 0x48545B4: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:932)
==4173746==   by 0x48545B4: pthread_mutex_lock@* (drd_pthread_intercepts.c:945)
==4173746==   by 0x109192: routine (main.c:8)

This kind of problem is invisible to helgrind, which is why drd is necessary. Remember to use both together when testing multithreaded programs for the best possible coverage.

Performance profiling

While valgrind is mostly used to catch bugs and common errors, it also includes a few tools useful for debugging the performance of a program.

Callgrind is used to compute a call graph of all functions within a program, to see how much time and resource was spent on each per run. Identifying slow or resource-heavy functions can be done quickly with this tool.

Cachegrind profiles the cache usage of CPU instructions, to better understand why code is slow despite no obvious issues with the source code itself. This is often related to inefficient memory usage within loops, where a slight adjustment of how the loop iterates over data can make a huge difference in execution speed.

Massif is a memory profiler, used to track heap space and usage, to gain an understanding of memory spikes and what caused memory allocations, with support for showing usage trends.

Note that these tools are quite old and are getting less and less popular. New contenders like pprof and perf have quickly gained traction for the same use cases for their easier use and more reliable implementation around cpu timing.