Navigating code with grep

Developer tooling has expanded drastically since the early days of computing, but some software engineers still swear by older minimal tools. One of these tools is grep, used to search and navigate text input.

Why grep?

The primary reasons for grep's sustained popularity in software development are simplicity and availability. It is available in virtually any unix-like platform by default, and useful for any text search task. Administrators, DevOps engineers, programmers and even power users eventually use grep for one task or another, making it the swiss army knife of searching. Having a general purpose tool that works everywhere often beats separate tools optimized per task.

There are modern grep alternatives like ag (Silver Searcher) or rg (ripgrep), which are not only significantly faster (up to 10x!), but also add more features and filtering options.

But unless you are working on very large monorepos (hundreds of MB of source code, excluding git history), grep will be near-instant, and the difference between 50ms and 5ms of execution time is barely noticeable by terminal users anyway.

More features sound good on paper, but they also require learning and remembering more flags and options to utilize them, making search much more complex than it should be. For the same reason, even regular expressions in grep are often ignored for code development, since writing two simple plaintext matches is often easier and faster than coming up with a regex.

It is also noteworthy that interactive searching will usually get by with imperfect search queries. Having one or two bad results won't hurt developers much, so searches typically start out very broad and are only narrowed down with more filters as needed, if at all.

Developers using grep based workflows often have advantages in unfamiliar codebases, as they do not need to care about directory structure or filenames as much, instead navigating through the contained code directly. They are also fully decoupled from vendor lock-in or reliance on any one IDE, as the terminal is effectively their development environment, with free choice and easy switching between editors, linters and language runtimes.

Grep for code

Searching with grep is made much easier by output coloring. You should use --color=auto to let grep enable it automatically when used in a terminal. For bash shells, simply create an alias so you don't have to pass the flag every time:

alias grep='grep --color=auto'

The most important options for grep-based programming boil down to less than 10 flags.

To search a directory and all files recursively, use -R:

grep -R "todo" .

The . denotes the current directory, the most common usage for this kind of search.

The command will display all lines containing the word "todo" in any order.

You would typically add the flag -i to make matching case-insensitive:

grep -Ri "todo" .

Note that single-character flags can be combined, making -R -i the same as -Ri. This does not work for options starting with a double dash like --include!

Programming requires context around matches, so you will often add the -n flag to display line numbers. For even more context, you can display a few lines before and after each match with -C <n>:

grep -RniC 5 "todo" .

Now you see the 5 lines before and after a match as well, providing important information about the business logic.

If you want to prevent partial matching, for example to find only the word "init" without matching ones like "definition", you can supply -w:

grep -Rniw "init" .

A search can be inverted with -v, matching only lines not matching the search pattern. You wouldn't use this on files directly, but rather to filter output, which we will look at later.

When working in directories containing binary files like git repositories, you can exclude non-text files with -I (uppercase "i"):

grep -IRni "otod" .

Finally, if you want to find files instead of lines inside files, you can use -l to only print the names of files whose content matches a search:

grep -Ril "unsafe" .

or -L to print the ones that don't have any matching lines:

grep -RiL "todo" .

Command chaining for advanced searches

What grep is missing in complexity is added by shells like bash. For example, grep has no builtin way to find lines matching "dog" but not if they also contain the word "cat". But chaining two grep commands solves this:

grep -R "dog" . --color=yes | grep -v "cat"

Note that chaining disables colored output for the first grep command, since the pipe | isn't a terminal output. To retain colored output, we force-enable it by adding --color=yes to the command.

If you get a lot of output, you can pipe it into less for interactive paging and search:

grep -R "todo" . --color=yes | less -R

Again, we need to force-enable colored output here, and also pass -R to the less command to make it treat incoming text as raw bytes - otherwise, the coloring sequences will show up as garbage

While you could involve more standard shell tooling like find, xargs, sed or awk in advanced search queries, it is rare to do so in practice. Developers typically prefer speed and simplicity when searching, leaving more complex command chains for scripts in build workflows or CI/CD pipelines.

This is also the reason why flags like -E (extended regex), -P (PCRE regex) and -F (literal string matcher) aren't widely used for software development. Sticking to the simple default matching behavior is preferable for tasks that should remain short and simple, like navigating code.

Restricting input files

To make searches more precise, grep can apply rules to what files will be considered when searching.

You can either use --include=<pattern> as a whitelist to only search in matching filenames, for example to ignore .h files in C code:

grep --include="*.c" -R "todo" .

Or instead search in all files with exceptions using --exclude, e.g. to skip tests:

grep --exclude="*_test.py*" -R "todo" .

Note that --include and --exclude only consider the basename of files for matching, not it's path.

To exclude entire directories and all their contents, for example documentation, use --exclude-dir=<pattern> instead:

grep --exclude-dir=docs/ -R "todo" .

When working for a long time on the same codebase, it may be worthwhile to create project-specific grep aliases for include/exclude options.

Keeping a codebase greppable

Just about any codebase can be navigated with grep, but adherence to some simple rules make it much more efficient. Maintaining these baseline rules is what keeps a codebase "greppable", meaning it can easily be parsed by and navigated through grep alone, without invoking any other tools.

Keeping the source code in a separate directory from documentation and tests helps avoid exclude flags entirely. A common layout would be to have source code in a src/ subdirectory, with documentation in docs/, so grep can be executed only within src/.

Having many abstractions makes using grep cumbersome, as you may need to look up multiple nested abstractions or inheritance layers before finding the underlying business logic. These problems are especially pronounced in languages like Java or C#, but much less common in C or C++. Some languages are simply a better intuitive fit for grep than others.

Finally, relying on generated code is problematic for grep - what doesn't exist on disk can't be easily searched by it. Projects using code generators should ensure the generated code is also committed to the repository and available as text files, or they will be incompatible.