- The container is ephemeral so that if the DOCA/MOFED container restarts and/or gets updated I have to install the MFT tools all over again.
- I need to stage the packages in the DOCA/MOFED container and the required kernel-devel dependencies.
Before we begin let's first explain what the MFT package of firmware management tools is used for:
- Generate a standard or customized NVIDIA firmware image querying for firmware information
- Burn a firmware image
- Make configuration changes to the firmware settings
The following is a list of the available tools in MFT, together with a brief description of what each tool performs.
Tool | Description/Function |
---|---|
mst | Starts/stops the register access driver Lists the available mst devices |
mlxburn | Generation of a standard or customized NVIDIA firmware image for burning (.bin or .mlx)to the Flash/EEPROM attached to a NVIDIA HCA or switch device |
flint | This tool burns/query a firmware binary image or an expansion ROM image to the Flash device of a NVIDIA network adapter/gateway/switch device |
debug utilities | A set of debug utilities (e.g., itrace, fwtrace, mlxtrace, mlxdump, mstdump, mlxmcg, wqdump, mcra, mlxi2c, i2c, mget_temp, and pckt_drop) |
mlxup | The utility enables discovery of available NVIDIA adapters and indicates whether firmware update is required for each adapter |
mlnx-tools | Mellanox userland tools and scripts |
Sources: Mlnx-tools Repo MFT Tools Mlxup
Prerequisites
Before we can build the container we need to setup the directory structure, gather a few packages and create the dockerfile and entrypoint script. First let's create the directory structure. I am using root in this example but it could be a regular user.
$ mkdir -p /root/mft/rpms
$ cd /root/mft
Next we need to download the following rpms from Red Hat Package Downloads and place them into the rpms directory. The first is the kernel-devel package for the kernel of the OpenShift node this container will run on. To obtain the kernel version we can run the following oc
command on our cluster.
$ oc debug node/nvd-srv-29.nvidia.eng.rdu2.dc.redhat.com
Starting pod/nvd-srv-29nvidiaengrdu2dcredhatcom-debug-rhlgs ...
To use host binaries, run `chroot /host`
Pod IP: 10.6.135.8
If you don't see a command prompt, try pressing enter.
sh-5.1# chroot /host
sh-5.1# uname -r
5.14.0-427.47.1.el9_4.x86_64
sh-5.1#
Now that we have our kernel version we can download the two packages into our /root/mft/rpms
directory.
- kernel-devel-5.14.0-427.47.1.el9_4.x86_64.rpm
- usbutils-017-1.el9.x86_64.rpm
Next we need to create the dockerfile.mft which will build the container.
$ cat <<EOF > dockerfile.mft
# Start from UBI9 image
FROM registry.access.redhat.com/ubi9/ubi:latest
# Set work directory
WORKDIR /root/mft
# Copy in packages not available in UBI repo
COPY ./rpms/*.rpm /root/rpms/
RUN dnf install /root/rpms/usbutils*.rpm -y
# DNF install packages either from repo or locally
RUN dnf install wget procps-ng pciutils yum jq iputils ethtool net-tools kmod systemd-udev rpm-build gcc make -y
# Cleanup
WORKDIR /root
RUN dnf clean all
# Run container entrypoint
COPY entrypoint.sh /root/entrypoint.sh
ENTRYPOINT ["/bin/bash", "/root/entrypoint.sh"]
EOF
The docker container file references a entrypoint.sh script so we need to create that under /root/mft/
.
$ cat <<EOF > entrypoint.sh
#!/bin/bash
# Set working dir
cd /root
# Set tool versions
MLNXTOOLVER=23.07-1.el9
MFTTOOLVER=4.30.0-139
MLXUPVER=4.30.0
# Set architecture
ARCH=`uname -m`
# Pull mlnx-tools from EPEL
wget https://dl.fedoraproject.org/pub/epel/9/Everything/$ARCH/Packages/m/mlnx-tools-$MLNXTOOLVER.noarch.rpm
# Arm architecture fixup for mft-tools
if [ "$ARCH" == "aarch64" ]; then export ARCH="arm64"; fi
# Pull mft-tools
wget https://www.mellanox.com/downloads/MFT/mft-$MFTTOOLVER-$ARCH-rpm.tgz
# Install mlnx-tools into container
dnf install mlnx-tools-$MLNXTOOLVER.noarch.rpm
# Install kernel-devel package supplied in container
rpm -ivh /root/rpms/kernel-devel-*.rpm --nodeps
mkdir /lib/modules/$(uname -r)/
ln -s /usr/src/kernels/$(uname -r) /lib/modules/$(uname -r)/build
# Install mft-tools into container
tar -xzf mft-$MFTTOOLVER-$ARCH-rpm.tgz
cd /root/mft-$MFTTOOLVER-$ARCH-rpm
#./install.sh --without-kernel
./install.sh
# Change back to root workdir
cd /root
# x86 fixup for mlxup binary
if [ "$ARCH" == "x86_64" ]; then export ARCH="x64"; fi
# Pull and place mlxup binary
wget https://www.mellanox.com/downloads/firmware/mlxup/$MLXUPVER/SFX/linux_$ARCH/mlxup
mv mlxup /usr/local/bin
chmod +x /usr/local/bin/mlxup
sleep infinity & wait
EOF
Now we should have all the prerequisites and we can move onto building the container.
Building The Container
To build the container run the podman build
command on a Red Hat Enterprise Linux 9.x system using the Dockerfile.mft provided in this repository.
$ podman build . -f dockerfile.mft -t quay.io/redhat_emp1/ecosys-nvidia/mfttools:1.0.0
STEP 1/9: FROM registry.access.redhat.com/ubi9/ubi:latest
STEP 2/9: WORKDIR /root/mft
--> 6e6c9f1636c7
STEP 3/9: COPY ./rpms/*.rpm /root/rpms/
--> 30a022291bd9
STEP 4/9: RUN dnf install /root/rpms/usbutils*.rpm -y
Updating Subscription Management repositories.
subscription-manager is operating in container mode.
Red Hat Enterprise Linux 9 for x86_64 - BaseOS 9.2 MB/s | 41 MB 00:04
Red Hat Enterprise Linux 9 for x86_64 - AppStre 9.4 MB/s | 48 MB 00:05
Red Hat Universal Base Image 9 (RPMs) - BaseOS 2.2 MB/s | 525 kB 00:00
Red Hat Universal Base Image 9 (RPMs) - AppStre 5.2 MB/s | 2.3 MB 00:00
Red Hat Universal Base Image 9 (RPMs) - CodeRea 1.7 MB/s | 281 kB 00:00
Dependencies resolved.
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
usbutils x86_64 017-1.el9 @commandline 120 k
Installing dependencies:
hwdata noarch 0.348-9.15.el9 rhel-9-for-x86_64-baseos-rpms 1.6 M
libusbx x86_64 1.0.26-1.el9 rhel-9-for-x86_64-baseos-rpms 78 k
Transaction Summary
================================================================================
Install 3 Packages
Total size: 1.8 M
Total download size: 1.7 M
Installed size: 9.8 M
Downloading Packages:
(1/2): libusbx-1.0.26-1.el9.x86_64.rpm 327 kB/s | 78 kB 00:00
(2/2): hwdata-0.348-9.15.el9.noarch.rpm 3.3 MB/s | 1.6 MB 00:00
--------------------------------------------------------------------------------
Total 3.4 MB/s | 1.7 MB 00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : hwdata-0.348-9.15.el9.noarch 1/3
Installing : libusbx-1.0.26-1.el9.x86_64 2/3
Installing : usbutils-017-1.el9.x86_64 3/3
Running scriptlet: usbutils-017-1.el9.x86_64 3/3
Verifying : libusbx-1.0.26-1.el9.x86_64 1/3
Verifying : hwdata-0.348-9.15.el9.noarch 2/3
Verifying : usbutils-017-1.el9.x86_64 3/3
Installed products updated.
Installed:
hwdata-0.348-9.15.el9.noarch libusbx-1.0.26-1.el9.x86_64
usbutils-017-1.el9.x86_64
Complete!
--> 7c16c8d84152
STEP 5/9: RUN dnf install wget procps-ng pciutils yum jq iputils ethtool net-tools kmod systemd-udev rpm-build gcc make -y
Updating Subscription Management repositories.
subscription-manager is operating in container mode.
Last metadata expiration check: 0:00:08 ago on Thu Jan 9 18:32:20 2025.
Package yum-4.14.0-17.el9.noarch is already installed.
Dependencies resolved.
======================================================================================================
Package Arch Version Repository Size
======================================================================================================
Installing:
ethtool x86_64 2:6.2-1.el9 rhel-9-for-x86_64-baseos-rpms 234 k
gcc x86_64 11.5.0-2.el9 rhel-9-for-x86_64-appstream-rpms 32 M
iputils x86_64 20210202-10.el9_5 rhel-9-for-x86_64-baseos-rpms 179 k
(...)
unzip-6.0-57.el9.x86_64
wget-1.21.1-8.el9_4.x86_64
xz-5.2.5-8.el9_0.x86_64
zip-3.0-35.el9.x86_64
zstd-1.5.1-2.el9.x86_64
Complete!
--> 862d0e2c9c6f
STEP 6/9: WORKDIR /root
--> 5b3ec62db585
STEP 7/9: RUN dnf clean all
Updating Subscription Management repositories.
subscription-manager is operating in container mode.
43 files removed
--> c14c44f59e9e
STEP 8/9: COPY entrypoint.sh /root/entrypoint.sh
--> d2d5192c3c57
STEP 9/9: ENTRYPOINT ["/bin/bash", "/root/entrypoint.sh"]
COMMIT quay.io/redhat_emp1/ecosys-nvidia/mfttools:1.0.0
--> 1873a4483236
Successfully tagged quay.io/redhat_emp1/ecosys-nvidia/mfttools:1.0.0
1873a448323610f369a8565182a2914675f16d735ffe07f92258df89cd439224
Once the image has been built push the image up to the registry that the Openshift cluster can access.
$ podman push quay.io/redhat_emp1/ecosys-nvidia/mfttools:1.0.0
Getting image source signatures
Copying blob e5df12622381 done |
Copying blob 97c1462e7c7b done |
Copying blob facf1e7dd3e0 skipped: already exists
Copying blob 2dca7d5c2bb7 done |
Copying blob 6f64cedd7423 done |
Copying blob ec465ce79861 skipped: already exists
Copying blob 121c270794cd done |
Copying config 1873a44832 done |
Writing manifest to image destination
Running The Container
The container will need to run priviledged so we can access the hardware devices. To do this we will create a ServiceAccount
and Namespace
for it to run in.
$ cat <<EOF > mfttool-project.yaml
apiVersion: v1
kind: Namespace
metadata:
name: mfttool
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: mfttool
namespace: mfttool
EOF
Once the resource file is generated create it on the cluster.
$ oc create -f mfttool-project.yaml
namespace/mfttool created
serviceaccount/mfttoolcreated
Now that the project has been created assign the appropriate privileges to the service account.
$ oc -n mfttool adm policy add-scc-to-user privileged -z mfttool
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "mfttool"
Next we will create a pod yaml for each of our baremetal nodes that will run under the mfttool namespace and leverage the MFT tooling.
$ cat <<EOF > mfttool-pod-nvd-srv-29.yaml
apiVersion: v1
kind: Pod
metadata:
name: mfttool-pod-nvd-srv-29
namespace: mfttool
spec:
nodeSelector:
kubernetes.io/hostname: nvd-srv-29.nvidia.eng.rdu2.dc.redhat.com
hostNetwork: true
serviceAccountName: mfttool
containers:
- image: quay.io/redhat_emp1/ecosys-nvidia/mfttools:1.0.0
name: mfttool-pod-nvd-srv-29
securityContext:
privileged: true
EOF
Once the custom resource file has been generated, create the resource on the cluster.
oc create -f mfttool-pod-nvd-srv-29.yaml
pod/mfttool-pod-nvd-srv-29 created
Validate that the pod is up and running.
$ oc get pods -n mfttool
NAME READY STATUS RESTARTS AGE
mfttool-pod-nvd-srv-29 1/1 Running 0 28s
Next we can rsh into the pod.
$ oc rsh -n mfttool mfttool-pod-nvd-srv-29
sh-5.1#
Once inside the pod we can run an mst start
and then an mst status
to see the devices.
$ oc rsh -n mfttool mfttool-pod-nvd-srv-29
sh-5.1# mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
[warn] mst_pciconf is already loaded, skipping
Create devices
Unloading MST PCI module (unused) - Success
sh-5.1# mst status
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
MST devices:
------------
/dev/mst/mt4129_pciconf0 - PCI configuration cycles access.
domain:bus:dev.fn=0000:0d:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00
/dev/mst/mt4129_pciconf1 - PCI configuration cycles access.
domain:bus:dev.fn=0000:37:00.0 addr.reg=88 data.reg=92 cr_bar.gw_offset=-1
Chip revision is: 00
sh-5.1#
One of the things we can do with this container is query the devices and their settings with mlxconfig
. We can also change those settings like when we need to change a port from ethernet mode to infiniband mode.
mlxconfig -d /dev/mst/mt4129_pciconf0 query
Device #1:
----------
Device type: ConnectX7
Name: MCX715105AS-WEAT_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 400GbE (default mode) / NDR IB; Single-port QSFP112; Port Split Capable; PCIe 5.0 x16 with x16 PCIe extension option; Crypto Disabled; Secure Boot Enabled
Device: /dev/mst/mt4129_pciconf0
Configurations: Next Boot
MODULE_SPLIT_M0 Array[0..15]
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
(...)
ADVANCED_PCI_SETTINGS False(0)
SAFE_MODE_THRESHOLD 10
SAFE_MODE_ENABLE True(1)
Another tool in the container is flint
which allows us to see the firmware, product version and PSID for the device. This is useful for preparing for a firmware update.
flint -d /dev/mst/mt4129_pciconf0 query
Image type: FS4
FW Version: 28.42.1000
FW Release Date: 8.8.2024
Product Version: 28.42.1000
Rom Info: type=UEFI version=14.35.15 cpu=AMD64,AARCH64
type=PXE version=3.7.500 cpu=AMD64
Description: UID GuidsNumber
Base GUID: e09d730300126474 16
Base MAC: e09d73126474 16
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000001244
Security Attributes: secure-fw
Another tool in the container is mlxup
which is an automated way to update the firmware. When we run mlxup
it queries all devices on the system and reports back the current firmware and what available firmware there is for the device. We can then decide to update the cards or skip for now. In the example below I have two single port CX-7 cards in the node my pod is running on and I will upgrade their firmware.
$ mlxup
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX7
Part Number: MCX715105AS-WEAT_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 400GbE (default mode) / NDR IB; Single-port QSFP112; Port Split Capable; PCIe 5.0 x16 with x16 PCIe extension option; Crypto Disabled; Secure Boot Enabled
PSID: MT_0000001244
PCI Device Name: /dev/mst/mt4129_pciconf1
Base MAC: e09d73125fc4
Versions: Current Available
FW 28.42.1000 28.43.1014
PXE 3.7.0500 N/A
UEFI 14.35.0015 N/A
Status: Update required
Device #2:
----------
Device Type: ConnectX7
Part Number: MCX715105AS-WEAT_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 400GbE (default mode) / NDR IB; Single-port QSFP112; Port Split Capable; PCIe 5.0 x16 with x16 PCIe extension option; Crypto Disabled; Secure Boot Enabled
PSID: MT_0000001244
PCI Device Name: /dev/mst/mt4129_pciconf0
Base MAC: e09d73126474
Versions: Current Available
FW 28.42.1000 28.43.1014
PXE 3.7.0500 N/A
UEFI 14.35.0015 N/A
Status: Update required
---------
Found 2 device(s) requiring firmware update...
Perform FW update? [y/N]: y
Device #1: Updating FW ...
FSMST_INITIALIZE - OK
Writing Boot image component - OK
Done
Device #2: Updating FW ...
FSMST_INITIALIZE - OK
Writing Boot image component - OK
Done
Restart needed for updates to take effect.
Log File: /tmp/mlxup_workdir/mlxup-20250109_190606_17886.log
Notice the firmware upgrade completed but we need to restart the cards for the changes to take effect. We can use the mlxfwreset
command to do this and then validate with the flint
command that the card is running the new firmware.
sh-5.1# mlxfwreset -d /dev/mst/mt4129_pciconf0 reset -y
The reset level for device, /dev/mst/mt4129_pciconf0 is:
3: Driver restart and PCI reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
-I- Stopping Driver -Done
-I- Resetting PCI -Done
-I- Starting Driver -Done
-I- Restarting MST -Done
-I- FW was loaded successfully.
sh-5.1# flint -d /dev/mst/mt4129_pciconf0 query
Image type: FS4
FW Version: 28.43.1014
FW Release Date: 7.11.2024
Product Version: 28.43.1014
Rom Info: type=UEFI version=14.36.16 cpu=AMD64,AARCH64
type=PXE version=3.7.500 cpu=AMD64
Description: UID GuidsNumber
Base GUID: e09d730300126474 16
Base MAC: e09d73126474 16
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000001244
Security Attributes: secure-fw
We can repeat the same steps on the second card in the system to complete the firmware update.
sh-5.1# mlxfwreset -d /dev/mst/mt4129_pciconf1 reset -y
The reset level for device, /dev/mst/mt4129_pciconf1 is:
3: Driver restart and PCI reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw -Done
-I- Stopping Driver -Done
-I- Resetting PCI -Done
-I- Starting Driver -Done
-I- Restarting MST -Done
-I- FW was loaded successfully.
sh-5.1# flint -d /dev/mst/mt4129_pciconf1 query
Image type: FS4
FW Version: 28.43.1014
FW Release Date: 7.11.2024
Product Version: 28.43.1014
Rom Info: type=UEFI version=14.36.16 cpu=AMD64,AARCH64
type=PXE version=3.7.500 cpu=AMD64
Description: UID GuidsNumber
Base GUID: e09d730300125fc4 16
Base MAC: e09d73125fc4 16
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000001244
Security Attributes: secure-fw
Once the firmware update has been completed and validate we can remove the container as this completes the firmware update example.
Hopefully this gives an idea of what is required to use this container method which hopes to simplify the ability of upgrading Mellanox/NVIDIA firmware in a image based operating system like Red Hat CoreOS in OpenShift Container Platform.