Writing practical bash functions

Functions are an essential component of almost all programming languages. But functions in bash differ a fair bit from other common scripting languages, from arguments to return values.

A basic bash function

Simple functions look similar to the definitions on other languages:

function greet(){
   echo "Hello world!"
}

To run the greet function, invoke it like a normal command within the script:

greet

There is an obvious pitfall here: what if you have a command greet in your PATH, say /usr/local/bin/greet, and define a function named greet? The function will take precedence in this scenario; to execute the command in your PATH, you need to use bash's builtin command utility:

command greet

Invoking it this way will explicitly execute a command named greet, ignoring functions by the same name.

Passing arguments

When passing arguments to functions, the differences between other scripting languages become apparent. Commonly, arguments are passed to functions by specifying them in the parenthesis

function greet(name){...}

This does not work in bash. The parenthesis in the function definition only serve to define the function, they cannot take arguments at all. The function itself can still take arguments though, but there is no real way to define which or how many. In bash, function arguments are passed in a few ways:

Through local numbered variables, starting at $1 for the first argument, $2 for the second etc
As an array from the $@ variable
As a single string from the $* variable
The special variable $# contains the number of arguments passed to the function

In order to replicate the pseudo-code function above in bash, you need to access the first argument using the $1 variable:

function greet(){
   local name="$1" 
   echo "Hello $name"
}

Note how the function definition doesn't tell if and what arguments the function accepts, so documenting them with a comment is often necessary:

# Prints a greeting with the provided name.
# Usage: greet "Name"
# Arguments:
#   $1 - Name to greet
function greet() {
   local name="$1"
   echo "Hello $name"
}

Passing arguments to functions when calling them works similarly to passing them to bash commands

greet "bob"

This passes the string bob to the greet function, making it print Hello bob.

Handling optional and required function arguments

Relying solely on arguments being passed correctly is likely to introduce errors when parameters are forgotten. In this case, function authors may want to enter an error state for required arguments, or use a default value in it's place for optional ones. Both can be achieved with bash variable expansion from within the function body.

Making an argument required:

function greet(){
   local name="${1?Missing first argument 'name'}" 
   echo "Hello $name"
}

Making an argument optional with default fallback value:

function greet(){
   local name="${1:-unknown}" 
   echo "Hello $name"
}

The optional argument will fall back to unknown if no name is provided, the required one errors with Missing first argument 'name' instead.

Dynamic function arguments

Since arguments do not need to be specified when defining functions, they can take an arbitrary number of arguments at runtime. For example, the greet function could process any number of names:

function greet(){
   for name in "$@"; do 
      echo "Hello $name" 
   done
}

Now when calling it with multiple names as arguments:

greet "bob" "john" "jane"

will output one message per name:

Hello bob
Hello john
Hello jane

By ranging over all arguments, the function can work with dynamic argument counts. If you want more control over passed arguments, you could do length checks using the $# variable, for example if you need at least one argument:

function greet(){
   if [[ $# -eq 0 ]]; then 
      echo "Error: At least one argument is required." 
      return 1 
   fi
}

Robust functions will use these features where necessary, but remember that making function usage too complex could be a sign of design issues, and splitting the large function into multiple smaller ones may be a better choice for code complexity and readability.

Catching a single return value

Commonly, functions will return the result of an operation. In most scripting languages, this happens by passing the value(s) to return to the return statement. In bash however, return exits the program with a status code, including when calling it from within a function. Returning values from bash function instead happens by catching their output using substitution.

Given this function:

function greet(){
   echo "Hello world!"
}

When calling it on it's own, it would simply print Hello world! to stdout:

greet

However, you substitute the function output into a variable by wrapping it with $(...):

message=$(greet)

Now the variable $message contains the output of the function, and nothing is written to stdout.

Command substitution is the best and easiest way to capture a single return value from a function, but cannot handle multiple return values.

Side-effect functions

A different way to catch function output is to make it introduce side effects. A side effect is essentially anything that happens when calling a function that isn't obvious, like declaring or changing variables.

Here is an example function that changes an existing variable:

function my_side_effect_function(){
   fruit="banana"
}
fruit="apple"
my_side_effect_function
echo "$fruit"

When executing this script, it will print Banana, because the function changed the value of $fruit. This change was not obvious when looking at the function call alone, so it is considered a side effect.

Such effects can happen unintentionally, because bash functions work in global scope, i.e. if they define variables, those are available outside of the function (or may reuse variables by the same name from outside the function body).

To prevent this, declare all variables within a function as local, forcing bash to create a new local variable even if a global one by the same name already exists:

function my_side_effect_function(){
   local fruit="banana"
}
fruit="apple"
my_side_effect_function
echo "$fruit"

This time, the function prints apple, because the $fruit variable within the my_side_effect function is declared local, so does not access or change the global $fruit variable.

You can use such side effects to return multiple values from a function:

function kernel_info(){
   kernel_name=$(uname -s)
   kernel_version=$(uname -v)
   kernel_release=$(uname -r)
}
kernel_info
echo "Kernel name: $kernel_name"
echo "Kernel version: $kernel_version"
echo "Kernel release: $kernel_release"

Usage of this technique is somewhat debated in the bash community, mainly because it is not obvious from the function call alone where the new variables like $kernel_name come from, but also because such functions take control of global variable names, potentially blocking them from use or conflicting with other code from programmers.

It is generally considered best practice to avoid side effects and declare all variables within functions as local, to avoid conflicts and ambiguity.

User-defined variables for return values

A more expressive and flexible way to return multiple values from a function is to let the user decide their name, allowing them to choose to invoke side effects or not. This works by passing the name(s) of the variable(s) to receive the return value(s) as arguments to the function:

function kern_info(){
   declare -n kernel_name="$1" 
   declare -n kernel_version="$2" 
   declare -n kernel_release="$3"
   kernel_name=$(uname -s)
   kernel_version=$(uname -v)
   kernel_release=$(uname -r)
}

kern_info "name" "version" "release"

echo "Kernel $name version $version  ($release)"

The above script lets users pass their own names for return value variables, which are then created from within the function body. Calling declare -n effectively creates a local variable that is bound to another named variable. This allows us to create a local variable for each return value to work with, and bind it to the global variable's name the user provided when calling the function, without accidentally changing or creating other variables in the global namespace.

This approach is slightly more complex to write, but a much cleaner and less error-prone solution to returning multiple values from a function in bash.

You may come across another method involving eval - be very careful with that variant, because practically all versions of that allow command injection exploits through variable names or values! Using declare entirely circumvents all of these security issues.