Backups are an essential part of every production deployment, but how those are created, stored and restored often differs heavily between systems. Even worse, the administrator that set up backups may not be the same that has to restore them in an emergency. Because of this, streamlining the entire process starts looking like a good choice - which is where ansible comes into play.
A sample scenario
Let's create a specific scenario to see how ansible can be used for the backup workflow, and what problems it solves along the way. For this, we assume that we have two servers: a backup server and a database server running postgresql. The backup server has access to the database server and should connect, make a backup, download it and store it locally. We also need some form of retention to ensure the backup server's disk does not get filled up by old backup files.
We also assume that you have already set up an inventory and ssh authentication between the backup- and database servers.
Making backups
Making a backup from the database is fairly simple, just call pg_dumpall
and download the resulting .sql file:
backup_database.yml
- name: Backup PostgreSQL database
hosts: database_server
become: true
vars:
local_backup_dir: /path/to/local/backup/dir
tasks:
- name: Create backup file on the server
shell: "pg_dumpall --clean --if-exists -U postgres > /tmp/postgres_backup.sql"
environment:
PGHOST: localhost
PGUSER: postgres
- name: Compress the backup file
command: "gzip /tmp/postgres_backup.sql"
- name: Remove temporary sql file from server
file:
path: /tmp/postgres_backup.sql
state: absent
- name: Copy compressed backup file to the local machine
fetch:
src: /tmp/postgres_backup.sql.gz
dest: "{{ local_backup_dir }}/postgres_backup_{{ ansible_date_time.iso8601 }}.sql.gz"
flat: yes
- name: Remove compressed backup file from server
file:
path: /tmp/postgres_backup.sql.gz
state: absent
- name: Remove backups older than 7 days from the local machine
shell: |
find {{ local_backup_dir }} -name 'postgres_backup_*.sql.gz' -mtime +7 -delete
delegate_to: localhost
The playbook is first running pg_dumpall
to write all database contents safely to a text file in SQL format, then compresses the file to preserve disk space and network bandwidth. The compressed backup is downloaded to the machine running the playbook, making sure the temporary backup files are removed from the server. The last task enforces a 7-day retention by deleting all local backup files older than 7 days. This ensures that the backup server's disk is not needlessly filled up by unused backups while keeping enough backups to recover from issues that were discovered a little late. Adjust the retention period as needed
To make a backup, run this command from the backup server:
ansible-playbook backup_database.yml
That's all you need to do to make reliable backups whenever you like. In order to automate backup creation, you could set up a cron job to call the playbook at 2am every night:
0 2 * * * cd /my_backup_dir && ansible-playbook backup_database.yml
Make sure to use crontab -e
when creating cron jobs to ensure you don't accidentally write invalid syntax to the file, preventing the backup from running.
(Note the --clean
and --if-exists
flags to pg_dumpall
, which will include commands to drop remaining database contents before restoring the backup, avoiding potential conflicts during recovery and removing the need to manually clean the database beforehand)
Restoring a backup
Since backups are useless unless you can restore one, this portion also needs to be handled. Many backup strategies omit this step, leaving operators scrambling for manual commands and looking up documentation when the need ot restore a backup arises. Streamlining the entire process into a simple playbook results in faster incident resolution and less stress for everyone involved.
restore_database.yml
- name: List and restore PostgreSQL backup dynamically with validation
hosts: database_server
become: true
vars:
local_backup_dir: /path/to/local/backup/dir
tasks:
- name: List available backup files
find:
paths: "{{ local_backup_dir }}"
patterns: "*.sql.gz"
register: backup_files
delegate_to: localhost
- name: Ensure backups are available
fail:
msg: "No backups found in {{ local_backup_dir }}!"
when: backup_files.files | length == 0
delegate_to: localhost
- name: Show available backups
debug:
msg: |
Available backups:
{% for file in backup_files.files %}
{{ loop.index }}: {{ file.path }}
{% endfor %}
delegate_to: localhost
- name: Prompt for backup file selection
pause:
prompt: |
Enter the number corresponding to the backup file you want to restore:
{% for file in backup_files.files %}
{{ loop.index }}: {{ file.path }}
{% endfor %}
register: user_input
delegate_to: localhost
- name: Validate user input
fail:
msg: "Invalid selection! Please choose a number between 1 and {{ backup_files.files | length }}."
when: >
(user_input.user_input | int) < 1 or
(user_input.user_input | int) > (backup_files.files | length)
delegate_to: localhost
- name: Set the selected backup file
set_fact:
selected_backup: "{{ backup_files.files[(user_input.user_input | int) - 1].path }}"
delegate_to: localhost
- name: Copy selected backup file to the server
copy:
src: "{{ hostvars['localhost']['selected_backup'] }}"
dest: /tmp/postgres_restore.sql.gz
- name: Decompress the backup file on the server
command: "gunzip /tmp/postgres_restore.sql.gz"
- name: Restore the database
command: "psql -U postgres -f /tmp/postgres_restore.sql"
environment:
PGHOST: localhost
PGUSER: postgres
- name: Remove backup file from server
file:
path: /tmp/postgres_restore.sql
state: absent
- name: Remove temporary compressed backup file from server
file:
path: /tmp/postgres_restore.sql.gz
state: absent
As with the previous playbook, execution is just a single command:
ansible-playbook restore_database.yml
This time, the playbook will interactively guide the user through the recovery process, listing all locally available backups and prompting them to pick one, validating the selection to ensure the process can continue properly.
Pay attention to the delegate_to:
keys in most of the tasks, used to run the first tasks on localhost
and only the last three on the database server. This is necessary because the backup files are obviously only available on the backup server (which is running the playbook, aka localhost), but later the selected file needs to be uploaded and restored on the remote server.
Automating the complete backup process, from creation and retention to recovery ensures that the process is streamlined and reliable across all steps, and that all the information around it is stored in a single place. While writing playbooks is a higher time investment than setting up a bash script and a cron job, it gives operators the peace of mind that everything is planned out and tested when the need to restore a backup arrives, significantly cutting down on stress and unnecessary delays when it truly matters.