Automating Palo Alto HA Firewall Upgrades with Ansible

Automating Palo Alto HA Firewall Upgrades with Ansible
In: Palo Alto Ansible
Table of Contents

In this blog post, we will cover upgrading Palo Alto firewalls in HA using Ansible. This only covers upgrading minor versions, so it won't work if you are going from 10.x to 11.x, for example. This also only supports an HA pair.

This playbook is based on the repo from Palo Alto itself. There are many playbooks there covering scenarios like upgrading the major version, upgrading the content, and so on, but we will only focus on one specific playbook for HA, which I tweaked a little bit to suit my own setup.

Manual Upgrade Process

Of course, this post assumes you already know how to upgrade the firewalls manually. In case you don't, here are the steps. Palo Alto also recommends upgrading the active unit first and then the passive. You download the image to the active unit and tick the box to sync it to the peer, then suspend the active unit to trigger a failover so the passive takes over. Install the image on the suspended unit, reboot it, and wait for it to come back online so the HA pair syncs again. Once it is back, suspend the current active unit (the original passive) to fail traffic back to the upgraded unit, then install the image on it and reboot. After it comes back online and the HA pair syncs, the upgrade is complete.

Initial state
FW-A
Active
FW-B
Passive
Step 1 · Download image to active, sync to peer
FW-A
Active · image synced
FW-B
Passive · image received
Step 2 · Suspend active, failover to passive
FW-A
Suspended
FW-B
Active
Step 3 · Install image on FW-A and reboot
FW-A
Rebooting · installing
FW-B
Active
Step 4 · FW-A back online, HA syncs
FW-A
Passive
FW-B
Active
Step 5 · Suspend FW-B, fail back to FW-A
FW-A
Active
FW-B
Suspended
Step 6 · Install image on FW-B and reboot
FW-A
Active
FW-B
Rebooting · installing
Done · FW-B back online, HA syncs · upgrade complete
FW-A
Active
FW-B
Passive

Getting the Ansible Environment Ready

First, create a project directory and the required subdirectories and files inside it. You will need a backups directory where the config backups are stored, along with the playbook, inventory, and requirements files. Please make sure you also have Python installed.

palo-upgrade-ansible/
├── backups/
├── inventory.yml
├── requirements.yml
└── upgrade.yml

The backups directory is where the config backups from the firewalls are saved. When you run the playbook, it exports the running config from both firewalls and stores them here as XML files, named with the firewall's IP address / hostname and the date. This happens before anything is upgraded, so you always have a copy of the config to fall back on if something goes wrong.

Next, install the Python dependencies. You need Ansible plus a few Palo Alto Python libraries - pan-os-python, pan-python, pandevice, and xmltodict. If you use uv, you can add them to your project like this.

uv add ansible pan-os-python pan-python pandevice xmltodict

If you prefer pip, install the same packages with the following command.

pip install ansible pan-os-python pan-python pandevice xmltodict

Once Ansible is installed, pull down the Palo Alto collection using the requirements.yml file.

ansible-galaxy collection install -r requirements.yml

This installs the paloaltonetworks.panos collection, which provides all the modules we use in the playbook.

Inventory

Let's look at the inventory file next. We define a single host called ha_pair and point it at both firewalls using primary_ip_address and secondary_ip_address. The connection is set to local because the Palo Alto modules run on the Ansible control node and talk to the firewalls over their API, rather than connecting over SSH.

all:
  hosts:
    ha_pair:
      ansible_connection: local
      primary_ip_address: 'fw1.company.local'
      secondary_ip_address: 'fw2.company.local'

We need to specify which firewall is currently the primary and which is the secondary. Of course, we could have dynamically worked this out within the playbook by querying the HA state, but that would complicate things, so we keep it simple and define them statically in the inventory.

The Playbook

Now let's look at the playbook itself. I won't go through it line by line, but I'll walk you through what it does at a high level so you understand the flow before you run it.

---
- hosts: ha_pair

  vars_prompt:
    - name: username
      prompt: "Enter username"
      private: false
    - name: password
      prompt: "Enter Palo Alto password"
      private: true

  vars:
    primary:
      ip_address: '{{ primary_ip_address }}'
      username: '{{ username | default(omit) }}'
      password: '{{ password | default(omit) }}'

    secondary:
      ip_address: '{{ secondary_ip_address }}'
      username: '{{ username | default(omit) }}'
      password: '{{ password | default(omit) }}'

    # backup_config - Create a backup of the currently running config before upgrading on both devices.
    backup_config: true

    # pause_mid_upgrade - Optionally pause for additional verification during upgrade.  This playbook will perform
    #                     basic checks for HA status and session sync, but this will wait for manual verification before
    #                     upgrading the secondary firewall.
    pause_mid_upgrade: true

  tasks:
    - name: Backup device config (primary)
      paloaltonetworks.panos.panos_export:
        provider: '{{ primary }}'
        category: 'configuration'
        filename: 'backups/{{ primary_ip_address }}-{{ ansible_facts["date_time"]["date"] }}.xml'
      when: backup_config|bool

    - name: Backup device config (secondary)
      paloaltonetworks.panos.panos_export:
        provider: '{{ secondary }}'
        category: 'configuration'
        filename: 'backups/{{ secondary_ip_address }}-{{ ansible_facts["date_time"]["date"] }}.xml'
      when: backup_config|bool

    - name: Download target PAN-OS version
      paloaltonetworks.panos.panos_software:
        provider: '{{ primary }}'
        version: '{{ version }}'
        download: true
        sync_to_peer: true
        install: false

    - name: Suspend primary device
      paloaltonetworks.panos.panos_op:
        provider: '{{ primary }}'
        cmd: 'request high-availability state suspend'

    - name: Check that secondary is now active
      paloaltonetworks.panos.panos_op:
        provider: '{{ secondary }}'
        cmd: 'show high-availability state'
      register: secondary_active
      retries: 10
      delay: 30
      until: ( secondary_active.stdout | from_json).response.result.group["local-info"].state == 'active' and
             ( secondary_active.stdout | from_json).response.result.group["peer-info"].state == 'suspended' and
             ( secondary_active.stdout | from_json).response.result.group["peer-info"]["state-reason"] == 'User requested' # yamllint disable-line

    - name: Install target PAN-OS version and restart (primary)
      paloaltonetworks.panos.panos_software:
        provider: '{{ primary }}'
        version: '{{ version }}'
        download: false
        restart: true

    - name: Pause for restart
      ansible.builtin.pause:
        seconds: 30

    - name: Chassis ready (primary)
      paloaltonetworks.panos.panos_op:
        provider: '{{ primary }}'
        cmd: 'show chassis-ready'
      changed_when: false
      register: result
      until: result is not failed and (result.stdout | from_json).response.result == 'yes'
      retries: 30
      delay: 60

    - name: State sync check (primary)
      paloaltonetworks.panos.panos_op:
        provider: '{{ primary }}'
        cmd: 'show high-availability state'
      register: primary_state_sync
      retries: 10
      delay: 30
      until: '"state" in ( primary_state_sync.stdout | from_json).response.result.group["local-info"] and
             ( primary_state_sync.stdout | from_json).response.result.group["local-info"].state == "passive" and
             ( primary_state_sync.stdout | from_json).response.result.group["local-info"]["state-sync"] == "Complete"'

    - name: Pause for verification
      ansible.builtin.pause:
        prompt: 'Primary upgrade complete.  Pausing for verification.'
      when: pause_mid_upgrade

    - name: Suspend secondary device
      paloaltonetworks.panos.panos_op:
        provider: '{{ secondary }}'
        cmd: 'request high-availability state suspend'

    - name: Check that primary is now active
      paloaltonetworks.panos.panos_op:
        provider: '{{ primary }}'
        cmd: 'show high-availability state'
      register: primary_active
      retries: 10
      delay: 30
      until: ( primary_active.stdout | from_json).response.result.group["local-info"].state == 'active' and
             ( primary_active.stdout | from_json).response.result.group["peer-info"].state == 'suspended' and
             ( primary_active.stdout | from_json).response.result.group["peer-info"]["state-reason"] == 'User requested'

    - name: Install target PAN-OS version and restart (secondary)
      paloaltonetworks.panos.panos_software:
        provider: '{{ secondary }}'
        version: '{{ version }}'
        download: false
        restart: true

    - name: Pause for restart
      ansible.builtin.pause:
        seconds: 30

    - name: Chassis ready (secondary)
      paloaltonetworks.panos.panos_op:
        provider: '{{ secondary }}'
        cmd: 'show chassis-ready'
      changed_when: false
      register: result
      until: result is not failed and (result.stdout | from_json).response.result == 'yes'
      retries: 30
      delay: 60

    - name: State sync check (secondary)
      paloaltonetworks.panos.panos_op:
        provider: '{{ secondary }}'
        cmd: 'show high-availability state'
      register: secondary_state_sync
      retries: 10
      delay: 30
      until: '"state" in ( secondary_state_sync.stdout | from_json).response.result.group["local-info"] and
             ( secondary_state_sync.stdout | from_json).response.result.group["local-info"].state == "passive" and
             ( secondary_state_sync.stdout | from_json).response.result.group["local-info"]["state-sync"] == "Complete"'

The playbook runs against the ha_pair host we defined in the inventory. When you run it, it prompts you for the username and password so you don't have to store credentials in plain text. These are then passed to both firewalls using the primary and secondary provider variables. At the top, there are a couple of variables.

  • The backup_config variable controls whether a backup of the running config is taken before the upgrade, which is set to true by default.
  • The pause_mid_upgrade variable tells the playbook to stop and wait for manual verification after the primary is upgraded, before moving on to the secondary. This gives you a chance to confirm everything looks healthy before continuing.

From there, the tasks follow the same manual process we covered earlier. It backs up the config on both firewalls, then downloads the target PAN-OS version to the primary and syncs it to the peer, so both units have the image. It then suspends the primary, which triggers a failover, and waits until it confirms the secondary has become active and the primary is suspended.

Once the failover is confirmed, it installs the image on the primary and reboots it. The playbook waits for the chassis to come back online and checks that the primary has rejoined the HA pair as passive with its state sync complete. At this point, if pause_mid_upgrade is enabled, the playbook pauses for you to verify everything before it touches the secondary.

After you continue by pressing 'Enter', it repeats the same steps on the secondary. It suspends the secondary to fail traffic back to the upgraded primary, confirms the primary is active again, then installs the image on the secondary and reboots it. Finally, it waits for the secondary to come back online and confirms it has rejoined as passive with its state sync complete. Once that finishes, both firewalls are running the new version and the HA pair is back to normal.

Running the Playbook

You can run the playbook with the following command.

ansible-playbook -i inventory.yml upgrade.yml -e "version=11.1.4-h7"

The version variable is the target PAN-OS version you want to upgrade to, passed in at runtime with -e. Since this isn't defined in the playbook or inventory, you need to provide it each time you run it. When the playbook starts, it will prompt you for the username and password before it begins.

Improvements and More Options

I only covered the HA minor version upgrade here, but there is a lot more you can do. Palo Alto maintains a repo of Ansible playbooks, which has playbooks for upgrading the major version, upgrading a standalone firewall, downloading content, and more. It is worth a look if you want to go further or adapt something for your own setup.

There are also a few ways you could improve this playbook. You could dynamically work out which firewall is active and passive instead of defining them statically in the inventory. You could add pre and post-upgrade checks to capture the state of the firewalls before and after, then compare them to make sure nothing changed unexpectedly. You could also add a check to confirm both units are healthy and fully synced before the upgrade even starts.

Written by
Suresh Vinasiththamby
Tech enthusiast sharing Networking, Cloud & Automation insights. Join me in a welcoming space to learn & grow with simplicity and practicality.
Comments
More from Packetswitch
Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Packetswitch.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.