Automating backups with ansible

Backups are an essential part of every production deployment, but how those are created, stored and restored often differs heavily between systems. Even worse, the administrator that set up backups may not be the same that has to restore them in an emergency. Because of this, streamlining the entire process starts looking like a good choice - which is where ansible comes into play.

A sample scenario

Let's create a specific scenario to see how ansible can be used for the backup workflow, and what problems it solves along the way. For this, we assume that we have two servers: a backup server and a database server running postgresql. The backup server has access to the database server and should connect, make a backup, download it and store it locally. We also need some form of retention to ensure the backup server's disk does not get filled up by old backup files.

We also assume that you have already set up an inventory and ssh authentication between the backup- and database servers.

Making backups

Making a backup from the database is fairly simple, just call pg_dumpall and download the resulting .sql file:

backup_database.yml

- name: Backup PostgreSQL database
  hosts: database_server
  become: true
  vars:
    local_backup_dir: /path/to/local/backup/dir
  tasks:
    - name: Create backup file on the server
      shell: "pg_dumpall --clean --if-exists -U postgres > /tmp/postgres_backup.sql"
      environment:
        PGHOST: localhost
        PGUSER: postgres

    - name: Compress the backup file
      command: "gzip /tmp/postgres_backup.sql"

    - name: Remove temporary sql file from server
      file:
        path: /tmp/postgres_backup.sql
        state: absent

    - name: Copy compressed backup file to the local machine
      fetch:
        src: /tmp/postgres_backup.sql.gz
        dest: "{{ local_backup_dir }}/postgres_backup_{{ ansible_date_time.iso8601 }}.sql.gz"
        flat: yes

    - name: Remove compressed backup file from server
      file:
        path: /tmp/postgres_backup.sql.gz
        state: absent

    - name: Remove backups older than 7 days from the local machine
      shell: |
        find {{ local_backup_dir }} -name 'postgres_backup_*.sql.gz' -mtime +7 -delete
      delegate_to: localhost

The playbook is first running pg_dumpall to write all database contents safely to a text file in SQL format, then compresses the file to preserve disk space and network bandwidth. The compressed backup is downloaded to the machine running the playbook, making sure the temporary backup files are removed from the server. The last task enforces a 7-day retention by deleting all local backup files older than 7 days. This ensures that the backup server's disk is not needlessly filled up by unused backups while keeping enough backups to recover from issues that were discovered a little late. Adjust the retention period as needed

To make a backup, run this command from the backup server:

ansible-playbook backup_database.yml

That's all you need to do to make reliable backups whenever you like. In order to automate backup creation, you could set up a cron job to call the playbook at 2am every night:

0 2 * * * cd /my_backup_dir && ansible-playbook backup_database.yml

Make sure to use crontab -e when creating cron jobs to ensure you don't accidentally write invalid syntax to the file, preventing the backup from running.

(Note the --clean and --if-exists flags to pg_dumpall, which will include commands to drop remaining database contents before restoring the backup, avoiding potential conflicts during recovery and removing the need to manually clean the database beforehand)

Restoring a backup

Since backups are useless unless you can restore one, this portion also needs to be handled. Many backup strategies omit this step, leaving operators scrambling for manual commands and looking up documentation when the need ot restore a backup arises. Streamlining the entire process into a simple playbook results in faster incident resolution and less stress for everyone involved.

restore_database.yml

- name: List and restore PostgreSQL backup dynamically with validation
  hosts: database_server
  become: true
  vars:
    local_backup_dir: /path/to/local/backup/dir
  tasks:
    - name: List available backup files
      find:
        paths: "{{ local_backup_dir }}"
        patterns: "*.sql.gz"
      register: backup_files
      delegate_to: localhost

    - name: Ensure backups are available
      fail:
        msg: "No backups found in {{ local_backup_dir }}!"
      when: backup_files.files | length == 0
      delegate_to: localhost

    - name: Show available backups
      debug:
        msg: |
          Available backups:
          {% for file in backup_files.files %}
          {{ loop.index }}: {{ file.path }}
          {% endfor %}
      delegate_to: localhost

    - name: Prompt for backup file selection
      pause:
        prompt: |
          Enter the number corresponding to the backup file you want to restore:
          {% for file in backup_files.files %}
          {{ loop.index }}: {{ file.path }}
          {% endfor %}
      register: user_input
      delegate_to: localhost

    - name: Validate user input
      fail:
        msg: "Invalid selection! Please choose a number between 1 and {{ backup_files.files | length }}."
      when: >
        (user_input.user_input | int) < 1 or
        (user_input.user_input | int) > (backup_files.files | length)
      delegate_to: localhost

    - name: Set the selected backup file
      set_fact:
        selected_backup: "{{ backup_files.files[(user_input.user_input | int) - 1].path }}"
      delegate_to: localhost

    - name: Copy selected backup file to the server
      copy:
        src: "{{ hostvars['localhost']['selected_backup'] }}"
        dest: /tmp/postgres_restore.sql.gz

    - name: Decompress the backup file on the server
      command: "gunzip /tmp/postgres_restore.sql.gz"

    - name: Restore the database
      command: "psql -U postgres -f /tmp/postgres_restore.sql"
      environment:
        PGHOST: localhost
        PGUSER: postgres

    - name: Remove backup file from server
      file:
        path: /tmp/postgres_restore.sql
        state: absent

    - name: Remove temporary compressed backup file from server
      file:
        path: /tmp/postgres_restore.sql.gz
        state: absent

As with the previous playbook, execution is just a single command:

ansible-playbook restore_database.yml

This time, the playbook will interactively guide the user through the recovery process, listing all locally available backups and prompting them to pick one, validating the selection to ensure the process can continue properly.

Pay attention to the delegate_to: keys in most of the tasks, used to run the first tasks on localhost and only the last three on the database server. This is necessary because the backup files are obviously only available on the backup server (which is running the playbook, aka localhost), but later the selected file needs to be uploaded and restored on the remote server.

Automating the complete backup process, from creation and retention to recovery ensures that the process is streamlined and reliable across all steps, and that all the information around it is stored in a single place. While writing playbooks is a higher time investment than setting up a bash script and a cron job, it gives operators the peace of mind that everything is planned out and tested when the need to restore a backup arrives, significantly cutting down on stress and unnecessary delays when it truly matters.