Combining multiple commands using pipes and stream redirection is a common task in the linux command line. While shells like bash
already offer the most common tools to manage streams, some advanced features are missing. One of these features is copying an input stream to multiple output targets - that's what tee
does.
How the tee
command works
The tee command is very similar to a pipe or output redirection operator in bash, but with a slight difference: it can copy stdin
to more than one output. By default, stdin
will always be copied to stdout
- but additionally, it can be copied to one or more files on the fly as well.
The tee
command takes one or more files as arguments, and can be chained between other commands to copy the stream between them:
Note that by default tee will override the content of the files given to it. If you want to append to them instead, use the -a
flag:
echo "text" | tee -a log.txt | wc
Now the file log.txt
gets appended to with each run, rather than overridden.
Debugging complex pipes
One common use of the tee command is to debug commands that consist of many individual commands chained together. Consider this example:
sample_command | grep "error" | wc -l
This is a relatively simple command chain, but it may be annoying to debug: if the output of wc - l
at the end is 0
, it could either mean that there were no errors, or that the grep
pattern didn't catch the error messages. Maybe the log messages use words like "warn" or "fatal" to log errors, and not the literal word "error".
In this case, knowing the intermediate output of the first command can be useful, so copying it to a file with tee
can be helpful:
sample_command | tee log.txt | grep "error" | wc -l
Now the raw output of sample_command
is available in the file log.txt
for debugging. This is obviously silly for such a small command chain, but for pipes chaining 10 or more commands together it becomes a valuable tool.
Watching live log output
When fetching logs from a file in realtime, you are stuck between a rock and a hard place with only bash builtins: Of course you could use redirecting:
tail -f access.log | grep --line-buffered "/api/v1/auth" > auth.log
But this comes with a problem: You can't see what's happening. How many lines were written to auth.log
? Is it enough data for troubleshooting? How long should the command run? You would need to start a second terminal session and check the contents of auth.log
manually to know - or use tee
:
tail -f access.log | grep --line-buffered "/api/v1/auth" | tee auth.log
Since tee always copies to stdout
, you can now see what gets written to auth.log
in realtime.
Writing to restricted files
Sometimes you may want to write output to a file owned by root
. The straightforward solution is to run the command generating the output with sudo and redirecting into a file:
my_command > /root/log.txt
But this may have a lot of side effects: Everything that my_command
does now has root privileges. For long-running services or software that interacts with user input, this is often ill-advised, as a single bug in the software may give immediate root access to the server to an attacker.
To write the output of a program that shouldn't have root privileges (or that you don't trust), you can use tee
to only only give the writing process the permissions to access the file:
my_command | sudo tee /root/log.txt
Now my_command
does not have any elevated privileges, and only tee
is affected by the sudo
command to be able to write to the protected file.
Fixing buffering issues
When dealing with realtime streams, you may occasionally see tee lagging behind it's input. This can be caused by the shell's buffering behavior: by default, streams are buffered into chunks before passing them down the chain. Buffering can be altered through the stdbuf
command.
Buffer full lines:
stdbuf -oL ping example.com | tee log.txt
Disable buffering:
stdbuf -o0 ping example.com | tee log.txt
Which buffering mode works for you depends on the command you are running, but be warned that disabling buffering altogether may have a negative performance impact for large output streams.