Saturday, February 28, 2026

OpenShift Passthrough For Some


I wanted to provide a simple mechanism to configure vfio-pci devices of a certain device type when some of those device types are in use by the base operating system. For example on some Grace Hopper nodes the only network devices might be BlueField-3 interfaces. If I want one BlueField-3 to provide networking access to the base operating system I need to leave the kernel driver in place. However I might want to take the additional Bluefield-3 devices and use them in passthrough mode which would require them to be unbound from mlx5 drivers and bound to vfio-pci. The following writeup provides a working example both manually and then automatically in the context of OpenShift.  

Why

There are going to be use cases where the workloads running in virtual machines on OpenShift worker nodes will need to have the network devices in passthrough mode. While this is not a problem when the OpenShift worker node cluster interface is on a different network card type then those those that need to be passed to the virtual machine.   It does becomes an issue on systems that are outfitted with all the same network interface types. This means that the device id for all the network cards are the same. It also means that from a traditional sense I cannot use the current method of enabling passthrough for the network cards. That current method involves blacklisting the network kernel driver from loading and then configuring the device ids to attach to the vfio-pci driver. If we were to implement that on a system with all of the same network cards when the system rebooted to apply the machineconfig the node would come up without any networking and show as NotReady. That is why in the rest of this document we will demonstrate a different practical approach to this problem.

Manually Configure

Kernel driver unbinding and binding was introduces back in kernel 2.6.13 back in 2005 so its a technology that has been around for quite some time. This is the exact feature that we will be using to show how to only make some of our network cards vfio-pci bound. To begin let's take a look at our network interfaces via lspci where I have filtered out the devices by the device id 15b3:a2dc. We can see here that I have 4 network card ports on an OpenShift node in a debug pod.

sh-5.2# lspci -nn |grep 15b3:a2dc 0000:01:00.0 Ethernet controller [0200]: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller [15b3:a2dc] (rev 01) 0000:01:00.1 Ethernet controller [0200]: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller [15b3:a2dc] (rev 01) 0002:01:00.0 Ethernet controller [0200]: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller [15b3:a2dc] (rev 01) 0002:01:00.1 Ethernet controller [0200]: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller [15b3:a2dc] (rev 01)

Now let's examime the physical interface names for these 4 ports.

sh-5.2# grep PCI_SLOT_NAME /sys/class/net/*/device/uevent /sys/class/net/enP2s2f0np0/device/uevent:PCI_SLOT_NAME=0002:01:00.0 /sys/class/net/enP2s2f1np1/device/uevent:PCI_SLOT_NAME=0002:01:00.1 /sys/class/net/enp1s0f0np0/device/uevent:PCI_SLOT_NAME=0000:01:00.0 /sys/class/net/enp1s0f1np1/device/uevent:PCI_SLOT_NAME=0000:01:00.1

Now we have to see which one is already in use by OpenShift so we do not inadvertently work with the wrong card. This will always be the one where the master-

sh-5.2# ovs-vsctl --no-heading --format=table --columns=name,type find Interface type=system| awk '{print $1}' enp1s0f0np0

We can see enp1sf0np0 which correlates to the 0000:01:00.0 card. So we will focus on the 0002:01:00.0 & 0002:01:00.1.

Now that we have determined which cards we can use we will begin the process of unbinding them from their current driver which is mlx5_core.

echo -n "0002:01:00.0" > /sys/bus/pci/drivers/mlx5_core/unbind echo -n "0002:01:00.1" > /sys/bus/pci/drivers/mlx5_core/unbind

At this point if looked at the lspci output we would see these two devices no longer have a "Kernel driver in use" line in the output. Rather then four lines here we only see two which are the two ports related the system network card.

sh-5.2# lspci -k -s 0002:01:00.0 0002:01:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel modules: mlx5_core sh-5.2# lspci -k -s 0002:01:00.1 0002:01:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel modules: mlx5_core

We are now ready to for them to use the vfio-pci driver but first we may need to load that driver.

modprobe vfio-pci

We can validate that the vfio-pci driver is loaded with lsmod.

sh-5.2# lsmod|grep vfio vfio_pci 16384 0 vfio_pci_core 90112 1 vfio_pci vfio_iommu_type1 49152 0 vfio 73728 3 vfio_pci_core,vfio_iommu_type1,vfio_pci iommufd 131072 1 vfio

Now that we have unbound the two devices drivers let's override the kernel driver they should use with vfio-pci.

sh-5.2# echo vfio-pci > /sys/bus/pci/devices/0002:01:00.0/driver_override sh-5.2# echo vfio-pci > /sys/bus/pci/devices/0002:01:00.1/driver_override

With the vfio-driver override in place we can now bind our two devices to that driver.

sh-5.2# echo "0002:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind sh-5.2# echo "0002:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind

And finally we can validate that the driver for those devices is now using the vfio-pci driver.

sh-5.2# lspci -k -s 0002:01:00.0 0002:01:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: vfio-pci Kernel modules: mlx5_core sh-5.2# lspci -k -s 0002:01:00.1 0002:01:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: vfio-pci Kernel modules: mlx5_core

Automatically Configure

While one can manually configure the vfio-pci passthrough like we did above this won't be scalable in a large cluster especially after OpenShift upgrades so we need something that is more automatic. The answer to this is twofold in that we first need a script that can automate the process above and then a mechanism of running that script on OpenShift nodes.

For the automation script we can use the example code in this repository here. This script will identify all the interfaces of a certain device type and then determine which ones can be used as passthrough devices. The factor that prohibits the device from being used as a passthrough is if the device has an OVS bridge associated to it. Once we have idenfitied the list it will go ahead and unbind the kernel driver in use on that device and then override the driver and bind it to vfio-pci so it is available for passthrough.

Here is a manuall run of the system we had to test on.

sh-5.2# ./passthrough-some-nics.sh -n 15b3:a2dc NIC Name NIC Bus ID Kernel Driver OCP BR NIC PassThru Eligible ==================================================================================================== enp1s0f0np0 0000:01:00.0 mlx5_core Yes No enp1s0f1np1 0000:01:00.1 mlx5_core Yes No enP2s2f0np0 0002:01:00.0 mlx5_core No Yes enP2s2f1np1 0002:01:00.1 mlx5_core No Yes Loading vfio-pci......Done! Unbinding device 0002:01:00.0 from mlx5_core kernel driver... Applying driver override to device 0002:01:00.0... Binding device 0002:01:00.0 to vfio-pci... Device kernel driver validation... 0002:01:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: vfio-pci Kernel modules: mlx5_core Unbinding device 0002:01:00.1 from mlx5_core kernel driver... Applying driver override to device 0002:01:00.1... Binding device 0002:01:00.1 to vfio-pci... Device kernel driver validation... 0002:01:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: vfio-pci Kernel modules: mlx5_core

Notice the script changes the kernel driver in use for the two devices. If we run the script again we should see that no changes can be made because there are no other eligible passthrough devices.

sh-5.2# ./passthrough-some-nics.sh -n 15b3:a2dc NIC Name NIC Bus ID Kernel Driver OCP BR NIC PassThru Eligible ==================================================================================================== enp1s0f0np0 0000:01:00.0 mlx5_core Yes No enp1s0f1np1 0000:01:00.1 mlx5_core Yes No NA 0002:01:00.0 vfio-pci No Complete NA 0002:01:00.1 vfio-pci No Complete vfio_pci 16384 0 - Live 0xffffb968aee88000

Now that we have seen the script work let's make this more relatable to OpenShift. First we will have to base64 encode the script by piping it through base64 command.

$ BASE64_SCRIPT=$(cat passthrough-some-nics.sh | base64 -w 0) $ echo $BASE64_SCRIPT IyEvYmluL2Jhc2gKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjCiMgVGhpcyBzY3JpcHQgcGFzc2VzIHRocm91Z2ggc29tZSBvZiB0aGUgTklDcyB3aGVuIGFsbCB0aGUgTklDcyBhcmUgdGhlIHNhbWUgZGV2aWNlIHR5cGUgICAgICAgICAgICAgICAgICAgIwojIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMKCiMgSG93IHRvIHVzZSB0aGUgc2NyaXB0IGlmIHVzZXIgZG9lcyBub3Qga25vdyBob3cKaG93dG8oKXsKICBlY2hvICJVc2FnZTogcGFzc3Rocm91Z2gtc29tZS1uaWNzLnNoIC1uIDxuaWMtZGV2aWNlLWlkPiIKICBlY2hvICJFeGFtcGxlIFNpbmdsZSBEZXZpY2UgSUQ6IHBhc3N0aHJvdWdoLXNvbWUtbmljcy5zaCAtbiAxNWIzOmEyZGMiCiAgZWNobyAiRXhhbXBsZSBNdWx0aSBEZXZpY2UgSUQ6IHBhc3N0aHJvdWdoLXNvbWUtbmljcy5zaCAtbiAxZGQ4OjEwMDJ8MTViMzoxMDIxIgp9CgojIEdldG9wdHMgc2V0dXAgZm9yIHZhcmlhYmxlcyB0byBwYXNzIGZyb20gb3B0aW9ucwp3aGlsZSBnZXRvcHRzIGc6bjp1OnI6aCBvcHRpb24KZG8KY2FzZSAiJHtvcHRpb259IgppbgpuKSBuaWNpZD0ke09QVEFSR307OwpoKSBob3d0bzsgZXhpdCAwOzsKXD8pIGhvd3RvOyBleGl0IDE7Owplc2FjCmRvbmUKCiMgTWFrZSBzdXJlIHRoZSB2YXJpYWJsZXMgYXJlIHBvcHVsYXRlZCB3aXRoIHZhbHVlcyBvdGhlcndpc2Ugc2hvdyBob3d0bwppZiAoWyAteiAiJG5pY2lkIiBdKSB0aGVuCiAgIGhvd3RvCiAgIGV4aXQgMQpmaQoKIyBTZXQgdGFibGUgaGVhZGVyIGZvcm1hdCAKZGl2aWRlcj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09CmRpdmlkZXI9JGRpdmlkZXIkZGl2aWRlciRkaXZpZGVyCmhlYWRlcj0iXG4gJS0xMnMgJS0xNnMgJS0xNHMgJS0xNHMgJS0xNHNcbiIKZm9ybWF0PSIgJS0xNHMgJS0xNHMgJS0xNHMgJS0xNHMgJS0xNHNcbiIKd2lkdGg9MTAwCgojIFNsdXJwIGluIG5pYyBkZXZpY2UgdHlwZSBpZHMgZnJvbSBsc3BjaQpuaWNpZD1gZWNobyAkbmljaWQgfHNlZCAncy8sL1x8L2cnYAptYXBmaWxlIC10IG15X25pY3MgPCA8KGxzcGNpIC1ufGdyZXAgLUUgJG5pY2lkKQoKIyBQcmludCBvdXQgaGVhZGVycyAKcHJpbnRmICIkaGVhZGVyIiAiTklDIE5hbWUiICJOSUMgQnVzIElEIiAiS2VybmVsIERyaXZlciIgIk9DUCBCUiBOSUMiICJQYXNzVGhydSBFbGlnaWJsZSIKcHJpbnRmICIlJHdpZHRoLiR7d2lkdGh9c1xuIiAiJGRpdmlkZXIiCgojIEdyYWIgaW50ZXJmYWNlIGFzc29jaWF0ZWQgdG8gb3ZzLXN5c3RlbSBicmlkZ2UuICBCb25kcyBkbyBub3Qgd29yayBoZXJlIHlldApicnBoeWludD1gb3ZzLXZzY3RsIC0tbm8taGVhZGluZyAtLWZvcm1hdD10YWJsZSAtLWNvbHVtbnM9bmFtZSx0eXBlIGZpbmQgSW50ZXJmYWNlIHR5cGU9c3lzdGVtfCBhd2sgJ3twcmludCAkMX0nYApicnBoeWJ1cz1gZ3JlcCBQQ0lfU0xPVF9OQU1FIC9zeXMvY2xhc3MvbmV0LyovZGV2aWNlL3VldmVudHxncmVwICRicnBoeWludHwgYXdrIC1GICI9IiAne3ByaW50ICQyfSdgCgojIERlY2xhcmUgZW1wdHkgYXJyYXkgdG8gc3RvcmUgbmljIGRldGFpbHMgb24gdGhvc2UgdGhhdCBjYW4gYmUgdW5ib3VuZApkZWNsYXJlIC1hIHBhc3N0aHJvdWdoPSgpCgpmb3IgKCggbmljPTA7IG5pYzwkeyNteV9uaWNzW0BdfTsgbmljKysgKSkKZG8KICAgbmljYnVzaWQ9YGVjaG8gJHtteV9uaWNzWyRuaWNdfSB8IGF3ayAne3ByaW50ICQxfSdgCiAgIG5pY2tkcnY9YGxzcGNpIC1rbiAtcyAkbmljYnVzaWQgfCBncmVwICJLZXJuZWwgZHJpdmVyIGluIHVzZToifCBhd2sgLUYgIjogIiAne3ByaW50ICQyfSdgCiAgIG5pY25hbWU9YGdyZXAgUENJX1NMT1RfTkFNRSAvc3lzL2NsYXNzL25ldC8qL2RldmljZS91ZXZlbnR8Z3JlcCAkbmljYnVzaWR8IGF3ayAtRiAnLycgJ3twcmludCAkNX0nYAogICBpZiBbICIkbmljbmFtZSIgPSAiIiBdOyB0aGVuCiAgICAgIG5pY25hbWU9Ik5BIgogICBmaQoKICAgIyBPYnRhaW4gZmlyc3QgMTEgY2hhcmFjdGVycyBvZiBlYWNoIHZhcmlhYmxlIHN0cmluZyB0byB1c2UgZm9yIGNvbXBhcmUKICAgc3VibmljYnVzaWQ9IiR7bmljYnVzaWQ6MDoxMX0iCiAgIHN1YmJycGh5YnVzPSIke2JycGh5YnVzOjA6MTF9IgoKICAgIyBDb21wYXJlIHRoZSBzdWJzdHJpbmdzCiAgIGlmIFtbICIkc3VibmljYnVzaWQiID09ICIkc3ViYnJwaHlidXMiIF1dOyB0aGVuCiAgICAgIHN5c25pYz0iWWVzIgogICAgICBwYXNzdGhydT0iTm8iCiAgICAgICMgRGlzcGxheSB0byBjb25zb2xlIHRoZSBkZXRhaWxzCiAgICAgIHByaW50ZiAiJGZvcm1hdCIgJG5pY25hbWUgJG5pY2J1c2lkICRuaWNrZHJ2ICRzeXNuaWMgJHBhc3N0aHJ1CiAgIGVsc2UKICAgICAgc3lzbmljPSJObyIKICAgICAgaWYgWyAiJG5pY2tkcnYiID0gInZmaW8tcGNpIiBdOyB0aGVuCiAgICAgICAgIHBhc3N0aHJ1PSJDb21wbGV0ZSIKICAgICAgZWxzZQogICAgICAgICBwYXNzdGhydT0iWWVzIgogICAgICAgICBwYXNzdGhyb3VnaCs9KCIkbmljYnVzaWR8JG5pY2tkcnYiKQogICAgICBmaQogICAgICAjIERpc3BsYXkgdG8gY29uc29sZSB0aGUgZGV0YWlscwogICAgICBwcmludGYgIiRmb3JtYXQiICRuaWNuYW1lICRuaWNidXNpZCAkbmlja2RydiAkc3lzbmljICRwYXNzdGhydQogICBmaQpkb25lCgppZiAhIGdyZXAgLUUgIl52ZmlvX3BjaSAiIC9wcm9jL21vZHVsZXM7IHRoZW4KICBlY2hvICIgIgogIGVjaG8gLW4gIkxvYWRpbmcgdmZpby1wY2kuLi4iCiAgbW9kcHJvYmUgdmZpby1wY2kKICBlY2hvICIuLi5Eb25lISIKICBlY2hvICIgIgpmaQoKCmZvciAoKCBwYXNzPTA7IHBhc3M8JHsjcGFzc3Rocm91Z2hbQF19OyBwYXNzKysgKSkKZG8KICAgbmljYnVzaWQ9YGVjaG8gJHtwYXNzdGhyb3VnaFskcGFzc119IHwgYXdrIC1GICJ8IiAne3ByaW50ICQxfSdgCiAgIG5pY2tkcnY9YGVjaG8gJHtwYXNzdGhyb3VnaFskcGFzc119IHwgYXdrIC1GICJ8IiAne3ByaW50ICQyfSdgCiAgIGVjaG8gIiAiCiAgIGVjaG8gIlVuYmluZGluZyBkZXZpY2UgJG5pY2J1c2lkIGZyb20gJG5pY2tkcnYga2VybmVsIGRyaXZlci4uLiIKICAgZWNobyAtbiAiJG5pY2J1c2lkIiA+IC9zeXMvYnVzL3BjaS9kcml2ZXJzL21seDVfY29yZS91bmJpbmQKICAgZWNobyAiQXBwbHlpbmcgZHJpdmVyIG92ZXJyaWRlIHRvIGRldmljZSAkbmljYnVzaWQuLi4iCiAgIGVjaG8gdmZpby1wY2kgPiAvc3lzL2J1cy9wY2kvZGV2aWNlcy8kbmljYnVzaWQvZHJpdmVyX292ZXJyaWRlCiAgIGVjaG8gIkJpbmRpbmcgZGV2aWNlICRuaWNidXNpZCB0byB2ZmlvLXBjaS4uLiIKICAgZWNobyAiJG5pY2J1c2lkIiA+IC9zeXMvYnVzL3BjaS9kcml2ZXJzL3ZmaW8tcGNpL2JpbmQKICAgZWNobyAiRGV2aWNlIGtlcm5lbCBkcml2ZXIgdmFsaWRhdGlvbi4uLiIKICAgbHNwY2kgLWsgLXMgJG5pY2J1c2lkCmRvbmUKZXhpdCAwCg==

We will also set our device id variable that will get embedded in the machineconfig as the argument for the script. Please note if we wanted to use multiple device ids we would pipe delimite them.

$ DEVICEID="15b3:a2dc" # Single device id $ DEVICEID="1dd8:1002|15b3:1021" # Multiple device ids

We also have to set the the length of wait time to allow system to come up. 120 seconds is a good rule of thumb.

$ SLP="120"

Then we have to configure a MachineConfig that will place the base64 encoded script on the system and establish a systemd service to run the script everytime the node boots.

$ cat > passthrough-for-some-machineconfig.yaml << EOF kind: MachineConfig apiVersion: machineconfiguration.openshift.io/v1 metadata: name: passthrough-for-some-systemd-service labels: machineconfiguration.openshift.io/role: master spec: config: ignition: version: 3.2.0 systemd: units: - name: passthrough-for-some.service enabled: true contents: | [Unit] Description=Identifies and enabled passthough on select network interfaces After=NetworkManager-wait-online.service openvswitch.service Wants=NetworkManager-wait-online.service openvswitch.service [Service] RemainAfterExit=yes ExecStart=/etc/scripts/passthrough-some-nics.sh -n $DEVICEID -s $SLP Type=oneshot [Install] WantedBy=multi-user.target storage: files: - filesystem: root path: "/etc/scripts/passthrough-some-nics.sh" contents: source: data:text/plain;charset=utf-8;base64,$BASE64_SCRIPT verification: {} mode: 0755 overwrite: true EOF

Now let's create the MachineConfig on the cluster.

$ oc create -f passthrough-for-some-machineconfig.yaml machineconfig.machineconfiguration.openshift.io/passthrough-for-some-systemd-service created

We need to wait for the node to reboot. Once oc get mcp is responsive and confirms the node is updated we can start to validate.

$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-c88d4164a5bd26edb3d4025d24a5d2f8 True False False 1 1 1 0 6d7h worker rendered-worker-9890b2fbe760e8e731e68bf217b87278 True False False 0 0 0 0 6d7h

Let's check the status of the service on the node. We can see from the below output it already identified the interfaces that can be made passthrough.

# systemctl status passthrough-for-some.service ● passthrough-for-some.service - Identifies and enabled passthough on select network interfaces Loaded: loaded (/etc/systemd/system/passthrough-for-some.service; enabled; preset: disabled) Active: activating (start) since Thu 2026-02-19 22:27:01 UTC; 5min ago Job: 408 Invocation: 29eaf89183be4424a9f2fb4a2bd249a4 Main PID: 4282 (passthrough-som) Tasks: 1 (limit: 3084134) Memory: 1.5M (peak: 10.8M) CPU: 213ms CGroup: /system.slice/passthrough-for-some.service └─4282 /bin/bash /etc/scripts/passthrough-some-nics.sh -n 15b3:a2dc Feb 19 22:32:01 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: ==================================================================================================== Feb 19 22:32:01 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: enp1s0f0np0 0000:01:00.0 mlx5_core Yes No Feb 19 22:32:01 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: enp1s0f1np1 0000:01:00.1 mlx5_core Yes No Feb 19 22:32:01 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: enP2s2f0np0 0002:01:00.0 mlx5_core No Yes Feb 19 22:32:01 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: enP2s2f1np1 0002:01:00.1 mlx5_core No Yes Feb 19 22:32:01 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: Feb 19 22:32:02 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: Loading vfio-pci......Done! Feb 19 22:32:02 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: Feb 19 22:32:02 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: Feb 19 22:32:02 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com passthrough-some-nics.sh[4282]: Unbinding device 0002:01:00.0 from mlx5_core kernel driver...

Let's look at the lspci output for the devices we saw in the logs. We can see the first two interfaces stayed bound to mlx5_core because those ports are part of the same card and associated to the OVS bridge. The last two ports though were unbound from mlx5_core and bound to vfio-pci to enable passthrough.

# lspci -k -s 0000:01:00.0 0000:01:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: mlx5_core Kernel modules: mlx5_core # lspci -k -s 0000:01:00.1 0000:01:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: mlx5_core Kernel modules: mlx5_core # lspci -k -s 0002:01:00.0 0002:01:00.0 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: vfio-pci Kernel modules: mlx5_core # lspci -k -s 0002:01:00.1 0002:01:00.1 Ethernet controller: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller (rev 01) Subsystem: Mellanox Technologies Device 0009 Kernel driver in use: vfio-pci Kernel modules: mlx5_core

One final thing we can do is run the script manually on the node again to also confirm our findings.

# /etc/scripts/passthrough-some-nics.sh -n 15b3:a2dc NIC Name NIC Bus ID Kernel Driver OCP BR NIC PassThru Eligible ==================================================================================================== enp1s0f0np0 0000:01:00.0 mlx5_core Yes No enp1s0f1np1 0000:01:00.1 mlx5_core Yes No NA 0002:01:00.0 vfio-pci No Complete NA 0002:01:00.1 vfio-pci No Complete vfio_pci 16384 0 - Live 0xffffd5d69072b000

Openshift Virtualization Passthrough

Now that our devices are set to passthrough we can configure OpenShift Virtualization to see them as an available resource. We will need to edite the hyperconverged setup on our OpenShift cluster and add the following section.

permittedHostDevices: pciHostDevices: - pciDeviceSelector: 15b3:a2dc resourceName: nvidia.com/BF3_CX7 resourceRequirements:

We can make the edit by doing the following and inserting the section above right before the resourceRequirements section of the spec file.

$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged edited

Then we can confirm the resources are exposed by the OpenShift node using oc describe node.

$ oc describe node | grep -E 'Capacity:|Allocatable:' -A12 Capacity: cpu: 72 devices.kubevirt.io/kvm: 1k devices.kubevirt.io/tun: 1k devices.kubevirt.io/vhost-net: 1k ephemeral-storage: 936709572Ki hugepages-1Gi: 0 hugepages-2Mi: 0 hugepages-32Mi: 0 hugepages-64Ki: 0 memory: 493510268Ki nvidia.com/BF3_CX7: 2 pods: 250 Allocatable: cpu: 71500m devices.kubevirt.io/kvm: 1k devices.kubevirt.io/tun: 1k devices.kubevirt.io/vhost-net: 1k ephemeral-storage: 862197798302 hugepages-1Gi: 0 hugepages-2Mi: 0 hugepages-32Mi: 0 hugepages-64Ki: 0 memory: 492359292Ki nvidia.com/BF3_CX7: 2 pods: 250

Now when we go launch a virtual machine in OpenShift we will want to include the following section in our virtual machine spec file nested under spec->domain->devices.

hostDevices: - deviceName: nvidia.com/BF3_CX7 name: hostDevices-turquoise-hornet-42

And if all goes well once we launch our virtual machine and it's running we should be able to see the passthrough ethernet interface.

$ oc get vmi -n openshift-cnv NAMESPACE NAME AGE PHASE IP NODENAME READY openshift-cnv rhel9-red-locust-96 10m Running 10.128.0.49 nvd-srv-36.nvidia.eng.rdu2.dc.redhat.com True $ virtctl console rhel9-red-locust-96 -n openshift-cnv Successfully connected to rhel9-red-locust-96 console. The escape sequence is ^] rhel9-red-locust-96 login: cloud-user Password: Last login: Fri Feb 20 08:08:53 on tty1 [cloud-user@rhel9-red-locust-96 ~]$ sudo bash [root@rhel9-red-locust-96 cloud-user]# lspci -nn|grep Mellanox 0a:00.0 Ethernet controller [0200]: Mellanox Technologies MT43244 BlueField-3 integrated ConnectX-7 network controller [15b3:a2dc] (rev 01)

Hopefully this provides a decent example of enabling passthrough for a subset of devices on a server where all the devices are the same but not all can be passed through due to the need for base networking at the OS level.