RSS

Archive for the ‘Ansible’ Category

Repeatable Deployments 3 – Adding NVMe Drive Automatically

Monday, July 1st, 2024

Repeatable Deployments Part 3 - Adding NVMe Drive Automatically Banner

This latest chapter in the Repeatable Deployments series looks at automating the steps discussed in the Raspberry Pi and NVMe Base post. The previous post investigated the manual steps necessary to add a PCIe drive, here we will look at automating the setup process.

TL;DR The scripts and instructions for running them can be found in the AnsibleNVMe GitHub repository.

Automation Options

The two main (obvious) options for automation in this case would be:

  • Shell scripts
  • Ansible

In this post we will look at using Ansible as it allows remote installation and configuration without having to install scripts on the target machine.

The Hardware

The installation hardware will be based around the Raspberry Pi 5 as the PCIe bus is required for the NVMe base and associated SSD:

  • Raspberry Pi (3, 4 or 5)
  • 256 GByte SATA SSD
  • SATA to USB adapter
  • Cooling fan (for the Raspberry Pi 5)
  • NVMe Base and 500GByte M2 drive
  • Power Supply
  • Ethernet cable
  • 3D printed mounts to bring everything together

For the purpose of this post we will be configuring the system with the following credentials:

  • Hostname: TestServer500
  • User: clusteruser

The password will be stored in an environment variable and on a Mac this is setup by executing the following command:

export CLUSTER_PASSWORD=your-password

The remainder of this post will assume the above names and credentials but feel free to change these as desired.

Step 1 – Install the Base Operating System

The operating system is installed following the same method as described in part 1 of this series. Simply ensure that the hostname, user name and password parameters are set to those noted above (or with your substitutions).

Step 2 – Ensure SSH Works

The next thing we need to do is to check that the Raspberry Pi boots and that we can log into the system. So apply power to the board and wait for the board to boot. This normally takes a minute or two as the system will boot initially and then expand the file system before booting two more times.

Time for the first log on to the Raspberry Pi with the command:

ssh testserver500.local

If this is the first time this device has been setup with the server name then you will be asked to accept the certificates for the host along with the password for the Raspberry Pi.

The authenticity of host 'testserver500.local (fe80::67b6:b7f4:b285:2599%en17)' can't be established.
ED25519 key fingerprint is SHA256:gfttQ9vr7CeWfjyPLUdf5h2Satxr/pRrP2EjbmW2BKA.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'testserver500.local' (ED25519) to the list of known hosts.
clusteruser@testserver500.local's password:
Linux TestServer500 6.6.20+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
clusteruser@TestServer500:~ $

Answer yes to accept the certificates and then enter the password at the following prompt.

If the machine name has been used before, or if you are trying repeat deployments then you will receive a message about certificate errors:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:<sha256-key>
Please contact your system administrator.
Add correct host key in /Users/username/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /Users/username/.ssh/known_hosts:35
Host key for testserver500.local has changed and you have requested strict checking.
Host key verification failed.

This can be resolved by editing the ~/.ssh/known_hosts> file and removing the entries for testserver500.local, saving the file and retrying.

A final step is to copy the local machine ssh keys to the Raspberry Pi. This can be done with the command:

ssh-copy-id testserver500.local

This command provides access to the Raspberry Pi, from the current machine, without needing to enter the password so keep in mind the security implications. Executing the above command will result in output something like:

/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
clusteruser@testserver500.local's password: <enter you password here>

Number of key(s) added:        1

Now try logging into the machine, with:   "ssh 'clusteruser@testserver500.local'"

Where the password for the Raspberry Pi user was entered when prompted. These two steps are required for Ansible to work correctly (at least from a Mac).

Step 3 – Update the System

At this point the Raspberry Pi has the base operating system installed and we have confirmed that the system can be accessed from the local host computer. The next step is to ensure that the operating system is updated with the current patches and the EEPROM is updated to the latest to allow access to the NVMe Base.

From the previous post, Raspberry Pi and NVMe Base, we know that we can do this with the commands:

sudo apt get update -y
sudo apt get dist-upgrade -y
sudo raspi-config nonint do_boot_rom E1 1
sudo reboot now

The first thing to note is that in the Ansible scripts we will be using privilege elevation to execute command with root privilege. This means that we do not need to use sudo to execute the commands in the Ansible scripts. So we start by creating a YAML file with the following contents:

---
- name: Update the Raspberry Pi 5 OS and reboot
  hosts: raspberrypi
  become: true
  tasks:
    - name: Update apt caches and the distribution
      apt:
        update_cache: yes
        upgrade: dist
        cache_valid_time: 3600
        autoclean: yes
        autoremove: yes
        
    - name: Update the EEPROM
      command: raspi-config nonint do_boot_rom E1 1

    - name: Reboot the Raspberry Pi
      reboot:
        msg: "Immediate reboot initiated by Ansible"
        reboot_timeout: 600
        pre_reboot_delay: 0
        post_reboot_delay: 0

There are a number of websites discussing Ansible scripts including the Ansible Documentation site so we will just look at the pertinent elements of the script.

hosts: raspberrypi defines the host names / entries that this section of the script applies to. We will define this later in the inventory.yml file.

become: true is the entry that tells Ansible that we want to execute the tasks with elevated privileges.

tasks defines a group of tasks to be executed on the Raspberry Pi. These tasks will be a combination of actions that Ansible is aware of as well as commands to be executed on the Raspberry Pi. In this script the tasks are:

  • Use apt to update the distribution
  • Execute the command to update the Raspberry Pi EEPROM
  • Reboot the Raspberry Pi to ensure the updates are applied

Now we have the definition of the tasks we want to execute we need to define the systems we want to run the script against. This can be done using an inventory script. For the single Raspberry Pi this is a simple file and looks like this:

[raspberrypi]
testserver500.local ansible_user=clusteruser ansible_ssh_pass=$CLUSTER_PASSWORD ansible_python_interpreter=/usr/bin/python3

The above contains a number of familiar entries, the server and user names. ansible_ssh_pass references the environment variable set earlier. The final entry, ansible_python_interpreter=/usr/bin/python3, prevents a warning from Ansible about the Python version deployed on the Raspberry Pi. This warning looks something like:

[WARNING]: Platform linux on host testserver500.local is using the discovered Python interpreter at /usr/bin/python3.11, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-core/2.17/reference_appendices/interpreter_discovery.html for more
information.

We are now ready to use Ansible to update the Raspberry Pi using the following command:

ansible-playbook -i inventory.yml UpdateAndRebootRaspberryPi.yml

If all goes well, then you should see something like the following:

PLAY [Update the Raspberry Pi 5 OS and reboot] ******************************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************
ok: [testserver500.local]

TASK [Update apt caches and the distribution] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Update the EEPROM] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Reboot the Raspberry Pi] ******************************************************************************************************************************
changed: [testserver500.local]

PLAY RECAP *******************************************************************************************************************
testserver500.local        : ok=4    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Step 4 – Format and Configure the Drives

We can now move on to configuring the system to access the NVMe drives.

Configure PCIe Gen 3 support

The NVMe Base can be run using PCIe Gen 3 support. This is experimental and not guaranteed to work although in my experience there are no issues with the drive supplied with the NVMe Base. The following code adds the appropriate entried in the /boot/firmware/config.txt file.

- name: Ensure pciex1_gen3 is enabled in /boot/firmware/config.txt
  blockinfile:
  path: /boot/firmware/config.txt
  marker: "# {mark} ANSIBLE MANAGED BLOCK"
  block: |
      dtparam=pciex1_gen=3
  insertafter: '^\[all\]$'
  create: yes

Format the Drive and Mount the File System

The next few steps executes the command necessary to format the drive and mount the formatted drive ensuring that the cluseruser can access the drive:

- name: Format the NVMe drive nvme0n1
  command: mkfs.ext4 /dev/nvme0n1 -L Data

- name: Make the mount point for the NVMe drive
  command: mkdir /mnt/nvme0

- name: Mount the newly formatted drive
  command: mount /dev/nvme0n1 /mnt/nvme0

- name: Make sure that the user can read and write to the mount point
  command: chown -R {{ ansible_user }}:{{ ansible_user }} /mnt/nvme0

Make the Drive Accessible Through Reboots

At this mount the drive will be available in the /mnt directory and the clusteruser is able to access the drive. If we were to reboot now then the drive will still be available and formatted but it will not be mounted following the reboot. The final step is to update the /etc/fstab file to mount the drive automatically at startup.

- name: Get the UUID of the device
  command: blkid /dev/nvme0n1
  register: blkid_output

- name: Extract UUID from blkid output
  set_fact:
    device_uuid: "{{ blkid_output.stdout | regex_search('UUID=\"([^\"]+)\"', '\\1') }}"

- name: Clean the extracted UUID
  set_fact:
    clean_uuid: "{{ device_uuid | regex_replace('\\[', '') | regex_replace(']', '') |  regex_replace(\"'\" '') }}"

- name: Add UUID entry to /etc/fstab
  lineinfile:
    path: /etc/fstab
    line: "UUID={{ clean_uuid }} /mnt/nvme0 ext4 defaults,auto,users,rw,nofail,noatime 0 0"
    state: present
    create: yes

There is a small complication as the UUID in device_uuid is surrounded by [‘ and ‘] characters. These delimiters need to be removed and the clean steps do this before adding the entry into the /etc/fstab file.

The only thing left to do is to run the playbook with the command:

ansible-playbook -i inventory.yml ConfigureNVMeBase.yml

If all goes well then we should see something similar to:

PLAY [Configure Raspberry Pi 5 to use the drive attached to the NVMe Base.] ******************************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************
ok: [testserver500.local]

TASK [Ensure pciex1_gen3 is enabled in /boot/firmware/config.txt] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Format the NVMe drive nvme0n1] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Make the mount point for the NVMe drive] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Make sure that the user can read and write to the mount point] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Mount the newly formatted drive] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Get the UUID of the device] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Extract UUID from blkid output] ******************************************************************************************************************************
ok: [testserver500.local]

TASK [Clean the extracted UUID] ******************************************************************************************************************************
ok: [testserver500.local]

TASK [Add UUID entry to /etc/fstab] ******************************************************************************************************************************
changed: [testserver500.local]

TASK [Reboot the Raspberry Pi] ******************************************************************************************************************************
changed: [testserver500.local]

PLAY RECAP ******************************************************************************************************************************
testserver500.local        : ok=11   changed=8    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Now to see if it has worked.

Step 5 – Test the Deployment

There are a few things we can check to verify that the system is configured correctly:

  • Check the drive appears in /dev
  • Ensure the drive has been mounted correctly in/mnt
  • Check that the clusteruser can create files and directories

Starting a ssh session on the Raspberry Pi we can manually check the system:

Linux TestServer500 6.6.31+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Jun 30 10:15:00 2024 from fe80::1013:a383:fe75:54e6%eth0
clusteruser@TestServer500:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.8G     0  3.8G   0% /dev
tmpfs           806M  5.3M  800M   1% /run
/dev/sda2       229G  2.3G  215G   2% /
tmpfs           4.0G     0  4.0G   0% /dev/shm
tmpfs           5.0M   48K  5.0M   1% /run/lock
/dev/nvme0n1    469G   28K  445G   1% /mnt/nvme0
/dev/sda1       510M   64M  447M  13% /boot/firmware
tmpfs           806M     0  806M   0% /run/user/1000
clusteruser@TestServer500:~ $ cd /mnt
clusteruser@TestServer500:/mnt $ ls -l
total 4
drwxr-xr-x 3 clusteruser clusteruser 4096 Jun 30 10:14 nvme0
clusteruser@TestServer500:/mnt $ cd nvme0
clusteruser@TestServer500:/mnt/nvme0 $ mkdir Test
clusteruser@TestServer500:/mnt/nvme0 $ echo "Hello, world" > hello.txt
clusteruser@TestServer500:/mnt/nvme0 $ cat < hello.txt
Hello, world
clusteruser@TestServer500:/mnt/nvme0 $ ls -l
total 24
-rw-r--r-- 1 clusteruser clusteruser    13 Jun 30 10:16 hello.txt
drwx------ 2 clusteruser clusteruser 16384 Jun 30 10:14 lost+found
drwxr-xr-x 2 clusteruser clusteruser  4096 Jun 30 10:15 Test

Looking good.

Lesson Learned

There were a few things that caused issues along the way.

Raspberry Pi – Access Denied

As mentioned in Step 2 – Ensure SSH Works, we need to log in to the Raspberry Pi in order for Ansible to be able to connect to the Raspberry Pi and run the playbook. Missing the first step, logging on to the Raspberry Pi will result in the following error:

TASK [Gathering Facts] ******************************************************************************************************************************
fatal: [testserver500.local]: FAILED! => {"msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."}

Missing the second step, copying the SSH ID will result in the following error:

TASK [Gathering Facts] ******************************************************************************************************************************
fatal: [testserver500.local]: UNREACHABLE! => {"changed": false, "msg": "Invalid/incorrect password: Permission denied, please try again.", "unreachable": true}

Permission Denied

In the Step 4 – Format and Configure the Drives script we have the following:

- name: Mount the newly formatted drive
  command: mount /dev/nvme0n1 /mnt/nvme0

- name: Make sure that the user can read and write to the mount point
  command: chown -R {{ ansible_user }}:{{ ansible_user }} /mnt/nvme0

Switching these two lines result in the user not being able to write to the file system on the /mnt/nvme0 drive. Read and execute access are allowed but write access is denied.

Conclusion

The scripts presented here allow for a new Raspberry Pi to be configured with a newly formatted NVMe SSD drive in only a few minutes. This method does present a small issue in that the NVMe drive will be formatted as part of the set up process which does mean that the data on the drive will be lost. Something that is easy to resolve.

Repeatable Deployments (Part 1)

Tuesday, March 19th, 2024

Repeatable Deployment Banner

A common problem in the IT world is to create a consistent environment in a repeatable manner. This is important in a number of use cases:

  • Development
  • Testing
  • Training

This series of posts will investigate using Ansible to create a consistent test environment, one that can be setup and torn down quickly and easily.

The starting point is setting up the hardware and installing the operating system (OS) which will be covered here. Subsequent posts will use Ansible to configure the system and deploy additional tools.

The Hardware

The test environment will be based around the Raspberry Pi 5 (although any version of the Pi hardware could be used). The system will be built around the following components:

  • Raspberry Pi (3, 4 or 5)
  • 256 GByte SATA SSD
  • SATA to USB adapter
  • Cooling fan (for the Raspberry Pi 5)
  • Power Supply
  • Ethernet cable
  • 3D printed mounts to bring everything together

Grabbing a Raspberry Pi 5 and putting all of this together yields something like this:

Raspberry Pi Setup

Raspberry Pi Setup

SATA SSDs have been chosen for the OS and data storage as they are both faster and more reliable than SD cards. From a cost perspective they are not too much more expensive than a quality SD card. It should be noted that recent third party addon boards are becoming available that add one or two NVMe drives to be added to the the Raspberry Pi 5 using the PCIe bus.

Write OS Image

The easiest way to create a bootable Raspberry Pi system is to use the Raspberry Pi Imager. This is a free tool that allows the selection of one of the many operating systems available for the Raspberry Pi and it can then be used to write the operating system to a SD card or HDD/SSD

The process starts by connecting the SATA to USB adapter the the SSD and then connecting the drive to the host computer. This makes the drive appear as an external USB drive.

Now start Raspberry Pi Imager:

Raspberry Pi Imager

Raspberry Pi Imager

Select the device we are going to create the image for, in this case this is the Raspberry Pi 5:

Select Device

Select Device

The next step is to decide which operating system should be installed on the SSD. There are a large number of options and the selection will depend upon what you want to achieve. In this case we can use a basic system such as Raspberry Pi OS Lite. Firstly, select the Raspberry Pi (64-bit) operating system:

Select Operating System

Select Operating System

Now refine this selection and select the Raspberry Pi OS Lite (64-bit):

Select Raspberry Pi Lite

Select Raspberry Pi Lite

A basic system will be adequate as the device is intended to be run headless and so the desktop environment and applications are not required.

Next step is to select the storage device that the image will be written to. Once this is done we can move on to providing some configuration options for the operating system.

Ready For Configuration

Ready For Configuration

Click the Next button to move on to the next step, editing the configuration.

Edit Settings

Edit Settings

Clicking Edit setting starts the editing process. The General options are presented first, here we can set the following:

  • Hostname
  • User name and password
  • WiFi access point details
Customise General Settings

Customise General Settings

SSH should be enabled in order to run the system headless. This is enabled on the Services tab:

Customise Services

Customise Services

Clicking on Save now gives the option of applying the settings and start writing the image to the SSD:

Apply Settings

Apply Settings

The final step is to verify that the SSD can be erased:

Confirm Media Erase

Confirm Media Erase

Control now passes back to the main window where the write and verification progress can be monitored:

Writing OS

Writing OS

After a short while the the process will complete and Raspberry Pi Imager wil conform that the image has been written successfully and the drive can now be disconnected from the host computer and connected to the Raspberry Pi 5:

OS Write Successful

OS Write Successful

Conclusion

The whole process of creating the image is straightforward and only takes a few minutes. At the end of the process the Raspberry Pi is ready to boot.

The next step will be to start the installation and configuration of additional software tools and components. Something for the next post in this series.

Getting Started with Ansible

Monday, August 28th, 2023

20x4 LCD Display

Recent work has involved reviewing some test environments for an IoT development board. The aim is to improve some of the components used for testing as well as adding new functionality. The requirements are:

  • Provide an updated version of existing functionality
  • Single board environment with all functionality deployed for quick testing
  • Cluster distributing the test environment for load testing

The most cost effective way to do this is to use a number of Raspberry Pi single board computers. These boards are now becoming available in quantities after several years of limited availability.

The Problem

How to setup the environment in such a way that will allow a fresh environment to be created reliably.

Enter ansible.

Ping

First step, try to contact a board and this is where ping comes in. This command will verify that ansible can connect to a board. The following command will test the connection to each board:

ansible cluster -m ping -i hosts

This command requires a text file hosts containing the list of boards to the contacted. The file is simple and may only contact two lines:

[cluster]
node

In the above example, the file defines a group of machines to be contacted and this is named cluster and in this case the group contains only one machine and this is named node. The name cluster is also mentioned in the ansible command above.

Additional machines can also be named under the cluster entry by simply placing additional entries on a new line in the file.

So far this is nothing new and it is covered in the Ansible documentation.

What Happened

The first step was to use the Raspberry Pi Imager application to create a new image on a new SSD. Nothing complex:

  • Raspberry Pi 64-bit Lite OS
  • Set the machine name to be node
  • Enable SSH
  • Set the user name to clusteruser and give the user a secure password

The password was then stored on the local machine in an environment variable CLUSTER_PASSWORD to allow the scripts to be stored in source control without giving away any secrets.

Time to test the connection with the following command:

ansible cluster -m ping -i hosts --extra-vars "ansible_user=clusteruser ansible_password=$CLUSTER_PASSWORD"

Breaking this down, we want to ping all of the machines defined in the cluster group. The group is defined in the file hosts and we are going to log on to the machines with the user name clusteruser and with the password contained in the CLUSTER_PASSWORD environment variable.

Now running the above command results in the following:

node | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

Conclusion

A good start to the project, now on to something more complex, time to install and configure some software.

And I can’t believe I’ve missed Ansible for so long.