Monday, November 24, 2025

NVIDIA OVS-DOCA on OpenShift

Open vSwitch (OVS) is a software-based network technology that enhances virtual machine (VM) communication within internal and external networks. Typically deployed in the hypervisor, OVS employs a software-based approach for packet switching, which can strain CPU resources, impacting system performance and network bandwidth utilization. Addressing this, NVIDIA's Accelerated Switching and Packet Processing (ASAP2) technology offloads OVS data-plane tasks to specialized hardware, like the embedded switch (eSwitch) within the NIC subsystem, while maintaining an unmodified OVS control-plane. This results in notably improved OVS performance without burdening the CPU.

NVIDIA's OVS-DOCA extends the traditional OVS-DPDK and OVS-Kernel data-path offload interfaces (DPIF), introducing OVS-DOCA as an additional DPIF implementation. OVS-DOCA, built upon NVIDIA's networking API, preserves the same interfaces as OVS-DPDK and OVS-Kernel while utilizing the DOCA Flow library with the additional OVS-DOCA DPIF. Unlike the use of the other DPIFs (DPDK, Kernel), OVS-DOCA DPIF exploits unique hardware offload mechanisms and application techniques, maximizing performance and features for NVIDA NICs and DPUs. This mode is especially efficient due to its architecture and DOCA library integration, enhancing e-switch configuration and accelerating hardware offloads beyond what the other modes can achieve.

Workflow

The following experiment, which is not supported, was done in multiple OpenShift 4.18.18 environments on x86 server architecture. I tried this first on a bare metal single node OpenShift node and then on a multi-node cluster setup.  Conceptually the steps are the same its a matter of where should OVS-DOCA run.  This document is broken down into three primary sections which cover creating the image, applying the image and then testing and validation.

  • Build Image Layer
  • Apply Image Layer
  • Validate Image Layer
  • Rolling Back Image Layer

Build Image Layer

Because OpenShift uses RHCOS, which is image based, as the underlying operating system we first need to create an image overlay. The first step is to get the current rhel-coreos image from the cluster where we will be applying the image overlay.  This image layer will be different depending on version of OpenShift.

$ oc adm release info --image-for rhel-coreos quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064

Next we need to create a dockerfile that contains the steps to generate the image overlay. This requires us to add some additional dependency packages, upgrade a few and remove openvswitch. We will also be install the Doca packages using doca-all. The below example is the dockerfile used in my example environment.

$ cat <<EOF > dockerfile.ovs-doca ### Grab oc adm release info --image-for rhel-coreos ### This example was done with 4.18.18 FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064 ### Copy in the rpm packages from the host. Need to download them first since they are not part of this repo. COPY *.rpm /root/ ### Install the dependencies that were copied into image RUN rpm-ostree install /root/libunwind-1.6.2-1.el9.x86_64.rpm RUN rpm-ostree install /root/libzip-devel-1.7.3-8.el9.x86_64.rpm RUN rpm-ostree install /root/libpcap-devel-1.10.0-4.el9.x86_64.rpm RUN rpm-ostree install /root/jsoncpp-1.9.5-1.el9.x86_64.rpm RUN rpm-ostree install /root/libyaml-devel-0.2.5-7.el9.x86_64.rpm RUN rpm-ostree install /root/openssl-devel-3.0.7-29.el9_4.x86_64.rpm ### These packages need to replace existing ones with packages copied into image RUN rpm-ostree override replace /root/unbound-libs-1.16.2-18.el9_6.x86_64.rpm RUN rpm-ostree override replace /root/unbound-1.16.2-18.el9_6.x86_64.rpm RUN rpm-ostree override replace /root/bzip2-libs-1.0.8-10.el9_5.x86_64.rpm RUN rpm-ostree override replace /root/bzip2-devel-1.0.8-10.el9_5.x86_64.rpm ### Remove current openvswitch from RHCOS image RUN rpm-ostree override remove openvswitch3.4 ### Install Doca Repo Local RUN rpm-ostree install /root/doca-host-3.0.0-058000_25.04_rhel94.x86_64.rpm ### Replace packages that come from doca repo RUN rpm-ostree override replace libibverbs rdma-core --experimental --from repo='doca' ### Create this directory otherwise the doca-all will fail halfway through RUN mkdir /var/opt ### Install the doca-all which includes Openvswitch and all the drivers etc. Maybe heavy handed but this is test. RUN rpm-ostree install doca-all ### Remove the Doca Repo Local RUN rpm-ostree override remove doca-host ### Remove the repos in image RUN rm -r -f /etc/yum.repos.d/* ### Remove the installation rpms RUN rm -r -f /root/*.rpm ### Create the commit RUN ostree container commit EOF

Notice inside the dockerfile we reference some packages that will get copied into the image and then installed/upgraded. The Red Hat packages I just grabbed from my entitled RHEL9 system where I was building the image using a simple dnf download <packagename>. The libunwind came from the EPEL repository. Finally the Doca 3.0 package came from NVIDIA here.

Once all the required packages are downloaded into the same directory as the dockerfile we can go ahead and build the image.

$ podman build -t quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18 -f dockerfile.ovs-doca STEP 1/21: FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064 STEP 2/21: COPY *.rpm /root/ --> Using cache 0338d4eddd5279dfb287838e9a5c654f597a27fbe1024abc33c65c362e7b27d3 --> 0338d4eddd52 STEP 3/21: RUN rpm-ostree install /root/libunwind-1.6.2-1.el9.x86_64.rpm --> Using cache e769aad5beeadb1fb71e599f91e312ea4db57b266aec4a5a2b09737dbe0aa3ef --> e769aad5beea STEP 4/21: RUN rpm-ostree install /root/libzip-devel-1.7.3-8.el9.x86_64.rpm --> Using cache ace83c90c17411afbc294eedb695e981d14beb15695bde824d67117b1cc076a9 --> ace83c90c174 STEP 5/21: RUN rpm-ostree install /root/libpcap-devel-1.10.0-4.el9.x86_64.rpm --> Using cache 56ffa9d4e4aeeb8b6286caa99ac528e97f740c200abc1cdc6e9dfa7cc78ff8ae --> 56ffa9d4e4ae STEP 6/21: RUN rpm-ostree install /root/jsoncpp-1.9.5-1.el9.x86_64.rpm --> Using cache 3ab7f72e664c4f0a10da85dffbc0b15e16bb248a60e323de380860fa989401e1 --> 3ab7f72e664c STEP 7/21: RUN rpm-ostree install /root/libyaml-devel-0.2.5-7.el9.x86_64.rpm --> Using cache ff99f8e9c0e563b30b947fa1b12dd01d8296681ed4731b18f2f091c3449ca324 --> ff99f8e9c0e5 STEP 8/21: RUN rpm-ostree install /root/openssl-devel-3.0.7-29.el9_4.x86_64.rpm --> Using cache 0b8775c2baec4a9cea8af488ab3926184fdae74bfaf61308c622a088358260d1 --> 0b8775c2baec STEP 9/21: RUN rpm-ostree override replace /root/unbound-libs-1.16.2-18.el9_6.x86_64.rpm --> Using cache 74189b3a141874ad92c514f40d31a3c31136c9600eeaef809bc0b029a67dcc05 --> 74189b3a1418 STEP 10/21: RUN rpm-ostree override replace /root/unbound-1.16.2-18.el9_6.x86_64.rpm --> Using cache 45a3d5a2c905fc5d6718af925f7d9e362450c81f92b97fe3312cded64831c6fd --> 45a3d5a2c905 STEP 11/21: RUN rpm-ostree override replace /root/bzip2-libs-1.0.8-10.el9_5.x86_64.rpm --> Using cache b6535e27d919fbbd0b39e9c319e30dbc05f2b2c304aa6836d3e748f83b7ef521 --> b6535e27d919 STEP 12/21: RUN rpm-ostree override replace /root/bzip2-devel-1.0.8-10.el9_5.x86_64.rpm --> Using cache c35bdde7a95ae7d83bb53d0503dfa9fb1743bbe8db0f03e24d770d49d35997e1 --> c35bdde7a95a STEP 13/21: RUN rpm-ostree override remove openvswitch3.4 --> Using cache 2e486cfa147a2339e3d91881938686509f73efea1621759758ab8a9cc7839f31 --> 2e486cfa147a STEP 14/21: RUN rpm-ostree install /root/doca-host-3.0.0-058000_25.04_rhel94.x86_64.rpm --> Using cache f71acdfb1a5d823858a6d56f32734e1dc552cd5048457fc54b6b60a29e09aaa9 --> f71acdfb1a5d STEP 15/21: RUN rpm-ostree override replace libibverbs rdma-core --experimental --from repo='doca' --> Using cache 5281c478b6930c903e2fc3988e2d7f05e0a365e992240594684a7c1d28c90981 --> 5281c478b693 STEP 16/21: RUN mkdir /var/opt --> Using cache dc9c0e3fa4c63f0370826ac8bfb75ae5e4c25cce39082239f37f5a8e029bd636 --> dc9c0e3fa4c6 STEP 17/21: RUN rpm-ostree install doca-all --> Using cache 2938b9c1b177984f200e7a694a4fd7a1072b301ff5f1436a28363940dfb700e7 --> 2938b9c1b177 STEP 18/21: RUN rpm-ostree override remove doca-host --> Using cache e013cc9d87d83eedf30962f6849bd69f08999f5576c21ed456c567b2de406d49 --> e013cc9d87d8 STEP 19/21: RUN rm -r -f /etc/yum.repos.d/* --> Using cache 395dce56b87285d6b76c08bd0fef64bc0c03058b2b132de7175ef4d5b9c91f11 --> 395dce56b872 STEP 20/21: RUN rm -r -f /root/*.rpm --> Using cache 3b71079501cea2600d2068237fc60213d5a92f16bbc2e800b92bd0040fea4dea --> 3b71079501ce STEP 21/21: RUN ostree container commit --> Using cache f51e6f47355f0eac8eb68a7ab4f001c1d71787b9dde6dc06307af01e338d524e COMMIT quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18 --> f51e6f47355f Successfully tagged quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18 Successfully tagged quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18-new f51e6f47355f0eac8eb68a7ab4f001c1d71787b9dde6dc06307af01e338d524e

Once the image is created we can push it to a registry our OpenShift cluster will be able to access.

podman push quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18

If everything went well we can move onto applying the image to our OpenShift cluster.

Apply Image Layer

Now that the image has been created and pushed to a registry we can take that image and apply it to an OpenShift cluster. Remember we derived the image for a OpenShift 4.18.18 cluster so make sure that is the version in use. If the version is different go back and generate a new image with the correct base rhcos image for that version of OpenShift. To apply the image we have to generate a machineconfig that looks something like the example below and references the image we generated.

$ cat <<EOF > doca-ovs-machineconfig.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: doca-ovs-layer-machineconfig spec: osImageURL: quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18 EOF

I found accidentally that for worker nodes only I needed to have hugepages configured.   Further in this blog in the rolling back image section below there is an example of rolling off the image in the event hughpages were not enabled.  We will need to create the following hugepage machineconfig and apply it to our cluster.

$ cat 50-kargs-1g-hugepages.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 50-kargs-1g-hugepages spec: kernelArguments: - default_hugepagesz=1G - hugepagesz=1G - hugepages=16 $ oc create -f 50-kargs-1g-hugepages.yaml machineconfig.machineconfiguration.openshift.io/50-kargs-1g-hugepages created

Once the nodes reboot for the hugepage machineconfig we can then apply the machineconfig resource file to apply the OVS-DOCA image to the cluster.

$ oc create -f doca-ovs-machineconfig.yaml machineconfig.machineconfiguration.openshift.io/doca-ovs-layer-machineconfig created

Once the machine config is created we can observe that oc get mcp will show the node (since this is a SNO example) is updating. This will take a bit and the node will reboot.

$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-a202f531d9140f133bed92703e0f6757 False True False 1 0 0 0 173m worker rendered-worker-7c2aa2d41cd8936b50979161c38c5eb8 True False False 0 0 0 0 173m

Once the node reboots and starts to come up we should see it come all the way back up to the point where all services are accessible. If the node does not come back then something went terribly wrong or a step was missed.

Validate Image Layer

Hopefully the node came back and once it does we can do some validation. First let's open a debug pod.

$ oc debug node/nvd-srv-28.nvidia.eng.rdu2.dc.redhat.com Starting pod/nvd-srv-28nvidiaengrdu2dcredhatcom-debug-6z55q ... To use host binaries, run `chroot /host` Pod IP: 10.6.135.7 If you don't see a command prompt, try pressing enter. sh-5.1# chroot /host

We can see if we look for any openv packages that only the doca-openvswitch one is listed.

sh-5.1# rpm -qa|grep openv openvswitch-selinux-extra-policy-1.0-39.el9fdp.noarch doca-openvswitch-3.0.0-0056_25.04_based_3.3.5.el9.x86_64

We can also dump out the Open vSwitch and see the references to DOCA. In this example the host did not have the Nvidia Network Operator nor its NicClusterPolicy deployed.

sh-5.1# ovs-vsctl list open_vswitch _uuid : 16de7e20-158c-47f8-94e4-05584185e14c bridges : [322924a2-25fc-4d4a-945a-646a7b36f9f8, 9faee574-fa6b-41be-be92-e52c3a8d083e] cur_cfg : 364 datapath_types : [doca, netdev, system] datapaths : {system=9feef28c-87e7-4c12-9519-9fe5946093df} db_version : "8.5.1" doca_initialized : false doca_version : "3.0.0058" dpdk_initialized : false dpdk_version : "MLNX_DPDK 22.11.2504.1.0" external_ids : {hostname=nvd-srv-28.nvidia.eng.rdu2.dc.redhat.com, ovn-bridge-mappings="physnet:br-ex", ovn-enable-lflow-cache="true", ovn-encap-ip="10.6.135.7", ovn-encap-type=geneve, ovn-is-interconn="true", ovn-memlimit-lflow-cache-kb="1048576", ovn-monitor-all="true", ovn-ofctrl-wait-before-clear="0", ovn-openflow-probe-interval="180", ovn-remote="unix:/var/run/ovn/ovnsb_db.sock", ovn-remote-probe-interval="180000", ovn-set-local-ip="true", rundir="/var/run/openvswitch", system-id="28e16fc6-eca6-4165-bd1c-235bf5884961"} iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan] manager_options : [] next_cfg : 364 other_config : {bundle-idle-timeout="180", ovn-chassis-idx-28e16fc6-eca6-4165-bd1c-235bf5884961="", vlan-limit="0"} ovs_version : "3.0.0-0056-25.04-based-3.3.5" ssl : [] statistics : {} system_type : rhcos system_version : "4.18"

In comparison a OpenShift worker node that does not have OVS-DOCA installed looks like this.

sh-5.1# ovs-vsctl list open_vswitch _uuid : 1bd70b6e-60d2-45b5-9c66-8b535aa5b8ff bridges : [5fa50e55-6306-4e3b-aee4-68e13364f861, 846a2c77-92d5-479a-8531-a1c9955c3934] cur_cfg : 4019 datapath_types : [netdev, system] datapaths : {system=3ec8dbee-824f-4e1b-999d-9ed77bcccee7} db_version : "8.8.0" dpdk_initialized : false dpdk_version : "DPDK 23.11.3" external_ids : {hostname=nvd-srv-29.nvidia.eng.rdu2.dc.redhat.com, ovn-bridge-mappings="physnet:br-ex", ovn-enable-lflow-cache="true", ovn-encap-ip="10.6.135.8", ovn-encap-type=geneve, ovn-is-interconn="true", ovn-memlimit-lflow-cache-kb="1048576", ovn-monitor-all="true", ovn-ofctrl-wait-before-clear="0", ovn-openflow-probe-interval="180", ovn-remote="unix:/var/run/ovn/ovnsb_db.sock", ovn-remote-probe-interval="180000", ovn-set-local-ip="true", rundir="/var/run/openvswitch", system-id="512f7a47-01d9-42fd-bdf9-906ecda172bb"} iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan] manager_options : [] next_cfg : 4019 other_config : {bundle-idle-timeout="180", doca-init="true", hw-offload="true", ovn-chassis-idx-512f7a47-01d9-42fd-bdf9-906ecda172bb="", vlan-limit="0"} ovs_version : "3.4.3-66.el9fdp" ssl : [] statistics : {} system_type : rhcos system_version : "4.18"

Here is an example on a Dell R760xa with Network Operator and its NicClusterPolicy also deployed. We can see that both doca_initialized and dpdk_initialized are set to true. Further in the other_config options we can see that doca-init and hw-offload is also set to true.

sh-5.1# ovs-vsctl list open_vswitch _uuid : 901442c2-069e-424a-92b0-40d5dd785ba2 bridges : [0c21ea64-1bb4-48b5-a9c3-39f9b08bb41a, bc402e73-8dd0-494b-83fc-06b7de9ce13e] cur_cfg : 3959 datapath_types : [doca, netdev, system] datapaths : {system=f2f9d8f2-be83-4d04-aaab-73d9b83d3765} db_version : "8.5.1" doca_initialized : true doca_version : "3.0.0058" dpdk_initialized : true dpdk_version : "MLNX_DPDK 22.11.2504.1.0" external_ids : {hostname=nvd-srv-30.nvidia.eng.rdu2.dc.redhat.com, ovn-bridge-mappings="physnet:br-ex", ovn-enable-lflow-cache="true", ovn-encap-ip="10.6.135.9", ovn-encap-type=geneve, ovn-is-interconn="true", ovn-memlimit-lflow-cache-kb="1048576", ovn-monitor-all="true", ovn-ofctrl-wait-before-clear="0", ovn-openflow-probe-interval="180", ovn-remote="unix:/var/run/ovn/ovnsb_db.sock", ovn-remote-probe-interval="180000", ovn-set-local-ip="true", rundir="/var/run/openvswitch", system-id="bfaff038-def5-433b-ac43-1cc421728f88"} iface_types : [bareudp, doca, docavdpa, docavhostuser, docavhostuserclient, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan] manager_options : [] next_cfg : 3959 other_config : {bundle-idle-timeout="180", doca-init="true", hw-offload="true", ovn-chassis-idx-bfaff038-def5-433b-ac43-1cc421728f88="", vlan-limit="0"} ovs_version : "3.0.0-0056-25.04-based-3.3.5" ssl : [] statistics : {} system_type : rhcos system_version : "4.18"

Now let's see if GPUDirect RDMA works any different with OVS-DOCA. I will use the standard test I have been using in previous blogs but found I got the same results on similar hardware.  For brevity of this blog I am not going to show those test steps or the results here.

Rolling Back Image Layer

Sometimes things do not go the way we expect. When using a bare metal worker I found that the image layering will fail if hugepages were not enabled. This is what I saw from Open vSwitch when the OVS-DOCA layer was applied to worker nodes.

2025-07-10T14:53:22.723Z|00014|dpdk|INFO|Using MLNX_DPDK 22.11.2504.1.0 2025-07-10T14:53:22.723Z|00015|dpdk|INFO|DPDK Enabled - initializing... 2025-07-10T14:53:22.723Z|00016|dpdk|INFO|Setting max memzones to 10000 2025-07-10T14:53:22.723Z|00017|dpdk|INFO|EAL ARGS: ovs-vswitchd -a 0000:00:00.0 --file-prefix=ovs-5338 --in-memory -l 0. 2025-07-10T14:53:22.726Z|00018|dpdk|INFO|EAL: Detected CPU lcores: 128 2025-07-10T14:53:22.726Z|00019|dpdk|INFO|EAL: Detected NUMA nodes: 2 2025-07-10T14:53:22.726Z|00020|dpdk|INFO|EAL: Detected static linkage of DPDK 2025-07-10T14:53:22.727Z|00021|dpdk|INFO|EAL: rte_mem_virt2phy(): cannot open /proc/self/pagemap: Permission denied 2025-07-10T14:53:22.727Z|00022|dpdk|INFO|EAL: Selected IOVA mode 'VA' 2025-07-10T14:53:22.727Z|00023|dpdk|WARN|EAL: No free 2048 kB hugepages reported on node 0 2025-07-10T14:53:22.727Z|00024|dpdk|WARN|EAL: No free 2048 kB hugepages reported on node 1 2025-07-10T14:53:22.727Z|00025|dpdk|WARN|EAL: No free 1048576 kB hugepages reported on node 0 2025-07-10T14:53:22.727Z|00026|dpdk|WARN|EAL: No free 1048576 kB hugepages reported on node 1 2025-07-10T14:53:22.727Z|00027|dpdk|ERR|EAL: Cannot get hugepage information. 2025-07-10T14:53:22.727Z|00028|dpdk|EMER|Unable to initialize DPDK: Permission denied

Now we did not see this behavior on a bare metal single node OpenShift node but only on workers. The first step was to rollback my change but just deleting the machineconfig here was not enough because the worker node never got back to a ready state. First I ssh'd into the node as the core network because it did have network connectivity.

$ ssh core@nvd-srv-30.nvidia.eng.rdu2.dc.redhat.com Red Hat Enterprise Linux CoreOS 418.94.202506121335-0 Part of OpenShift 4.18, RHCOS is a Kubernetes-native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`). WARNING: Direct SSH access to machines is not recommended; instead, make configuration changes via `machineconfig` objects: https://docs.openshift.com/container-platform/4.18/architecture/architecture-rhcos.html --- Last login: Thu Jul 10 14:51:08 2025 from 10.22.89.172 [systemd] Failed Units: 2 NetworkManager-wait-online.service ovs-vswitchd.service [core@nvd-srv-30 ~]$ sudo bash [systemd] Failed Units: 2 NetworkManager-wait-online.service ovs-vswitchd.service

I could see once I became root that Open vSwitch had not started and as I pointed above the logs showed issues with hugepages not being configured. Next I looked at the rpm-ostree status which shows we have our running image and our previous image.

[root@nvd-srv-30 core]# rpm-ostree status State: idle Deployments: ● ostree-unverified-registry:quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18 Digest: sha256:9fabd9c17f9124b443aa5d43d67a7b118ef510ee938aa7970ae41bd4d8d7697e Version: 418.94.202506121335-0 (2025-07-09T13:28:35Z) ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064 Digest: sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064 Version: 418.94.202506121335-0 (2025-06-12T13:39:57Z)

I opted to rollback to the previous image which I knew I could do.

[root@nvd-srv-30 core]# rpm-ostree rollback Moving 'd1fb888c12bc35c6d59679aa521c0f950013f80bce397ed0af73181c33305679.0' to be first deployment Transaction complete; bootconfig swap: no; bootversion: boot.0.0, deployment count change: 0 Downgraded: bzip2-libs 1.0.8-10.el9_5 -> 1.0.8-8.el9_4.1 libibverbs 2501mlnx56-1.2504061 -> 48.0-1.el9 rdma-core 2501mlnx56-1.2504061 -> 48.0-1.el9 unbound-libs 1.16.2-18.el9_6 -> 1.16.2-8.el9_4.1 Removed: bzip2-devel-1.0.8-10.el9_5.x86_64 clusterkit-1.15.470-1.2504061.20250428.80af081.x86_64 cmake-filesystem-3.26.5-2.el9.x86_64 collectx-clxapi-1.21.1-1.x86_64 collectx-clxapidev-1.21.1-1.x86_64 doca-all-3.0.0-058000.x86_64 doca-apsh-config-3.0.0058-1.el9.x86_64 doca-bench-3.0.0058-1.el9.x86_64 doca-caps-3.0.0058-1.el9.x86_64 doca-comm-channel-admin-3.0.0058-1.el9.x86_64 doca-devel-3.0.0-058000.x86_64 doca-dms-3.0.0058-1.el9.x86_64 doca-flow-tune-3.0.0058-1.el9.x86_64 doca-ofed-3.0.0-058000.x86_64 doca-openvswitch-3.0.0-0056_25.04_based_3.3.5.el9.x86_64 doca-pcc-counters-3.0.0058-1.el9.x86_64 doca-perftest-1.0.1-1.el9.x86_64 doca-runtime-3.0.0-058000.x86_64 doca-samples-3.0.0058-1.el9.x86_64 doca-sdk-aes-gcm-3.0.0058-1.el9.x86_64 doca-sdk-aes-gcm-devel-3.0.0058-1.el9.x86_64 doca-sdk-apsh-3.0.0058-1.el9.x86_64 doca-sdk-apsh-devel-3.0.0058-1.el9.x86_64 doca-sdk-argp-3.0.0058-1.el9.x86_64 doca-sdk-argp-devel-3.0.0058-1.el9.x86_64 doca-sdk-comch-3.0.0058-1.el9.x86_64 doca-sdk-comch-devel-3.0.0058-1.el9.x86_64 doca-sdk-common-3.0.0058-1.el9.x86_64 doca-sdk-common-devel-3.0.0058-1.el9.x86_64 doca-sdk-compress-3.0.0058-1.el9.x86_64 doca-sdk-compress-devel-3.0.0058-1.el9.x86_64 doca-sdk-devemu-3.0.0058-1.el9.x86_64 doca-sdk-devemu-devel-3.0.0058-1.el9.x86_64 doca-sdk-dma-3.0.0058-1.el9.x86_64 doca-sdk-dma-devel-3.0.0058-1.el9.x86_64 doca-sdk-dpa-3.0.0058-1.el9.x86_64 doca-sdk-dpa-devel-3.0.0058-1.el9.x86_64 doca-sdk-dpdk-bridge-3.0.0058-1.el9.x86_64 doca-sdk-dpdk-bridge-devel-3.0.0058-1.el9.x86_64 doca-sdk-erasure-coding-3.0.0058-1.el9.x86_64 doca-sdk-erasure-coding-devel-3.0.0058-1.el9.x86_64 doca-sdk-eth-3.0.0058-1.el9.x86_64 doca-sdk-eth-devel-3.0.0058-1.el9.x86_64 doca-sdk-flow-3.0.0058-1.el9.x86_64 doca-sdk-flow-devel-3.0.0058-1.el9.x86_64 doca-sdk-flow-trace-3.0.0058-1.el9.x86_64 doca-sdk-pcc-3.0.0058-1.el9.x86_64 doca-sdk-pcc-devel-3.0.0058-1.el9.x86_64 doca-sdk-rdma-3.0.0058-1.el9.x86_64 doca-sdk-rdma-devel-3.0.0058-1.el9.x86_64 doca-sdk-sha-3.0.0058-1.el9.x86_64 doca-sdk-sha-devel-3.0.0058-1.el9.x86_64 doca-sdk-sta-3.0.0058-1.el9.x86_64 doca-sdk-sta-devel-3.0.0058-1.el9.x86_64 doca-sdk-telemetry-3.0.0058-1.el9.x86_64 doca-sdk-telemetry-devel-3.0.0058-1.el9.x86_64 doca-sdk-telemetry-exporter-3.0.0058-1.el9.x86_64 doca-sdk-telemetry-exporter-devel-3.0.0058-1.el9.x86_64 doca-sdk-urom-3.0.0058-1.el9.x86_64 doca-sdk-urom-devel-3.0.0058-1.el9.x86_64 doca-socket-relay-3.0.0058-1.el9.x86_64 doca-sosreport-4.9.0-1.el9.noarch doca-telemetry-utils-3.0.0058-1.el9.x86_64 dpa-gdbserver-25.04.2725-0.el9.x86_64 dpa-resource-mgmt-25.04.0169-1.el9.x86_64 dpa-stats-25.04.0169-0.el9.x86_64 dpacc-1.11.0.6-1.el9.x86_64 dpacc-extract-1.11.0.6-1.el9.x86_64 flexio-samples-25.04.2725-0.el9.noarch flexio-sdk-25.04.2725-0.el9.x86_64 glib2-devel-2.68.4-14.el9_4.1.x86_64 hcoll-4.8.3230-1.20250428.1a4e38d7.x86_64 ibacm-2501mlnx56-1.2504061.x86_64 ibarr-0.1.3-1.2504061.x86_64 ibdump-6.0.0-1.2504061.x86_64 ibsim-0.12-1.2504061.x86_64 ibutils2-2.1.1-0.22200.MLNX20250423.g91730569c.2504061.x86_64 infiniband-diags-2501mlnx56-1.2504061.x86_64 jsoncpp-1.9.5-1.el9.x86_64 kernel-headers-5.14.0-570.25.1.el9_6.x86_64 kmod-iser-25.04-OFED.25.04.0.6.1.1.rhel9u4.x86_64 kmod-isert-25.04-OFED.25.04.0.6.1.1.rhel9u4.x86_64 kmod-kernel-mft-mlnx-4.32.0-1.rhel9u4.x86_64 kmod-knem-1.1.4.90mlnx3-OFED.23.10.0.2.1.1.rhel9u4.x86_64 kmod-mlnx-ofa_kernel-25.04-OFED.25.04.0.6.1.1.rhel9u4.x86_64 kmod-srp-25.04-OFED.25.04.0.6.1.1.rhel9u4.x86_64 kmod-xpmem-2.7.4-1.2504061.rhel9u4.rhel9u4.x86_64 libblkid-devel-2.37.4-18.el9.x86_64 libffi-devel-3.4.2-8.el9.x86_64 libgfortran-11.5.0-5.el9_5.x86_64 libibumad-2501mlnx56-1.2504061.x86_64 libibverbs-utils-2501mlnx56-1.2504061.x86_64 libmount-devel-2.37.4-18.el9.x86_64 libnl3-devel-3.9.0-1.el9.x86_64 libpcap-devel-14:1.10.0-4.el9.x86_64 libquadmath-11.5.0-5.el9_5.x86_64 librdmacm-2501mlnx56-1.2504061.x86_64 librdmacm-utils-2501mlnx56-1.2504061.x86_64 libselinux-devel-3.6-1.el9.x86_64 libsepol-devel-3.6-1.el9.x86_64 libunwind-1.6.2-1.el9.x86_64 libxpmem-2.7.4-1.2504061.rhel9u4.x86_64 libxpmem-devel-2.7.4-1.2504061.rhel9u4.x86_64 libyaml-devel-0.2.5-7.el9.x86_64 libzip-1.7.3-8.el9.x86_64 libzip-devel-1.7.3-8.el9.x86_64 meson-0.61.2-1.el9.noarch mft-4.32.0-120.x86_64 mlnx-dpdk-22.11.0-2504.1.0.2504061.x86_64 mlnx-dpdk-devel-22.11.0-2504.1.0.2504061.x86_64 mlnx-ethtool-6.11-1.2504061.x86_64 mlnx-iproute2-6.12.0-1.2504061.x86_64 mlnx-ofa_kernel-25.04-OFED.25.04.0.6.1.1.rhel9u4.x86_64 mlnx-ofa_kernel-devel-25.04-OFED.25.04.0.6.1.1.rhel9u4.x86_64 mlnx-ofa_kernel-source-25.04-OFED.25.04.0.6.1.1.rhel9u4.x86_64 mlnx-tools-25.01-0.2504061.x86_64 mpitests_openmpi-3.2.24-2ffc2d6.2504061.x86_64 ninja-build-1.10.2-3.el9~bootstrap.x86_64 nvhws-25.04-1.el9.x86_64 nvhws-devel-25.04-1.el9.x86_64 ofed-scripts-25.04-OFED.25.04.0.6.1.x86_64 openmpi-3:4.1.7rc1-1.2504061.20250428.6d9519e4c3.x86_64 opensm-5.23.00.MLNX20250423.ac516692-0.1.2504061.x86_64 opensm-devel-5.23.00.MLNX20250423.ac516692-0.1.2504061.x86_64 opensm-libs-5.23.00.MLNX20250423.ac516692-0.1.2504061.x86_64 opensm-static-5.23.00.MLNX20250423.ac516692-0.1.2504061.x86_64 openssl-devel-1:3.0.7-29.el9_4.x86_64 pcre-cpp-8.44-3.el9.3.x86_64 pcre-devel-8.44-3.el9.3.x86_64 pcre-utf16-8.44-3.el9.3.x86_64 pcre-utf32-8.44-3.el9.3.x86_64 pcre2-devel-10.40-5.el9.x86_64 pcre2-utf16-10.40-5.el9.x86_64 pcre2-utf32-10.40-5.el9.x86_64 perftest-25.04.0-0.84.g97da83e.2504061.x86_64 python3-file-magic-5.39-16.el9.noarch python3-pexpect-4.8.0-7.el9.noarch python3-ptyprocess-0.6.0-12.el9.noarch python3-pyverbs-2501mlnx56-1.2504061.x86_64 rdma-core-devel-2501mlnx56-1.2504061.x86_64 rshim-2.3.8-0.geaa5c03.x86_64 sharp-3.11.0.MLNX20250423.66d243a0-1.2504061.x86_64 srp_daemon-2501mlnx56-1.2504061.x86_64 sysprof-capture-devel-3.40.1-3.el9.x86_64 ucx-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 ucx-cma-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 ucx-devel-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 ucx-ib-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 ucx-ib-mlx5-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 ucx-knem-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 ucx-rdmacm-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 ucx-xpmem-1.19.0-1.2504061.20250428.6ecd4e5ae.x86_64 unbound-1.16.2-18.el9_6.x86_64 xpmem-2.7.4-1.2504061.rhel9u4.x86_64 xz-devel-5.2.5-8.el9_0.x86_64 zlib-devel-1.2.11-40.el9.x86_64 Added: openvswitch3.4-3.4.2-66.el9fdp.x86_64 Changes queued for next boot. Run "systemctl reboot" to start a reboot

Next I checked the rpm-ostree status again to see that the original deployment was at the top of my list. I needed to reboot next.

[root@nvd-srv-30 core]# rpm-ostree status State: idle Deployments: ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064 Digest: sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064 Version: 418.94.202506121335-0 (2025-06-12T13:39:57Z) Diff: 4 downgraded, 156 removed, 1 addedostree-unverified-registry:quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18 Digest: sha256:9fabd9c17f9124b443aa5d43d67a7b118ef510ee938aa7970ae41bd4d8d7697e Version: 418.94.202506121335-0 (2025-07-09T13:28:35Z) [root@nvd-srv-30 core]# [root@nvd-srv-30 core]# reboot [root@nvd-srv-30 core]# Connection to nvd-srv-30.nvidia.eng.rdu2.dc.redhat.com closed by remote host.

Now this does not completely solve my backout strategy. My machineconfig that applied the updated layer was gone, my worker was running the original image but the oc get mcp still showed a updating and degraded state.

$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-0230114d55788bab601fd33f8c816798 True False False 3 3 3 0 20d worker rendered-worker-c8361c0ad5c75212f16f53fa60772292 False True True 2 1 1 1 20d

This was because the MachineConfig Operator still thought the system should be using the layered image. I could see that by the following.

$ oc describe mcp worker Name: worker Namespace: Labels: machineconfiguration.openshift.io/mco-built-in= pools.operator.machineconfiguration.openshift.io/worker= Annotations: sriovnetwork.openshift.io/state: Paused API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfigPool (...) Last Transition Time: 2025-07-10T15:09:15Z Message: Node nvd-srv-30.nvidia.eng.rdu2.dc.redhat.com is reporting: "unexpected on-disk state validating against rendered-worker-0483f3f9c265f75685e1b23edf5d261d: expected target osImageURL \"quay.io/redhat_emp1/ecosys-nvidia/ocp-4.18-doca-all:4.18.18\", have \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8fea916602e93f2d504affb82cb6eceb0d45c2c80fdc26c9c363bd61ade8c064\"" Reason: 1 nodes are reporting degraded status on sync Status: True Type: NodeDegraded (...) Degraded Machine Count: 1 Machine Count: 2 Observed Generation: 13 Ready Machine Count: 1 Unavailable Machine Count: 1 Updated Machine Count: 1 Events: <none>

To resolve this I needed to look for my last good rendered worker which was this one: rendered-worker-c8361c0ad5c75212f16f53fa60772292.

$ oc get mc|grep rendered-worker rendered-worker-0483f3f9c265f75685e1b23edf5d261d efe259e04ba98784102ba603941ecbbb75233c6b 3.4.0 4h26m rendered-worker-0c4bed761bad4468c07325ab74dd8d7a efe259e04ba98784102ba603941ecbbb75233c6b 3.4.0 3h18m rendered-worker-1ecf444812421713d125d8c2fab0c8b5 efe259e04ba98784102ba603941ecbbb75233c6b 3.4.0 4h14m rendered-worker-1f9846d31d24be3134459fe31a1e3eb9 00143af1a51bedf0290496a6a97e47cf60b18693 3.4.0 21d rendered-worker-5ace9f4a37135e9a1ad365d34e36d5a4 00143af1a51bedf0290496a6a97e47cf60b18693 3.4.0 21d rendered-worker-770c72f8be98e98497c57232e1e284f0 00143af1a51bedf0290496a6a97e47cf60b18693 3.4.0 21d rendered-worker-c8361c0ad5c75212f16f53fa60772292 efe259e04ba98784102ba603941ecbbb75233c6b 3.4.0 25h rendered-worker-cf0ffc3899672cb089c88c470682ea27 00143af1a51bedf0290496a6a97e47cf60b18693 3.4.0 16d rendered-worker-e7ff5d837ac6c17c9dc7f0417eb30ce0 00143af1a51bedf0290496a6a97e47cf60b18693 3.4.0 16d rendered-worker-efcaf1659d34673a2efabce6dc580638 00143af1a51bedf0290496a6a97e47cf60b18693 3.4.0 21d

Then I generate a backup yaml from that rendered worker.

$ oc get mc/rendered-worker-c8361c0ad5c75212f16f53fa60772292 -o yaml > rendered-mc-backup.yaml

Then I edit the rendered-mc-backup.yaml and update the rendered worker value to the one it expected: rendered-worker-0483f3f9c265f75685e1b23edf5d261d and comment out 3 lines. This is just the relevant section in the rendered-mc-back.yaml that I updated.

apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: annotations: machineconfiguration.openshift.io/generated-by-controller-version: efe259e04ba98784102ba603941ecbbb75233c6b machineconfiguration.openshift.io/release-image-version: 4.18.18 #creationTimestamp: "2025-07-09T19:20:50Z" #generation: 1 name: rendered-worker-0483f3f9c265f75685e1b23edf5d261d ownerReferences: - apiVersion: machineconfiguration.openshift.io/v1 blockOwnerDeletion: true controller: true kind: MachineConfigPool name: worker uid: 1653432e-a894-4a17-92f7-d3636c82efa9 #resourceVersion: "9616677" #uid: 36507fb2-934b-46ac-ae8d-8e663678ad16

Then I delete the original rendered-worker which was the overlay image.

$ oc delete mc rendered-worker-0483f3f9c265f75685e1b23edf5d261d machineconfig.machineconfiguration.openshift.io "rendered-worker-0483f3f9c265f75685e1b23edf5d261d" deleted

Recreate the new rendered-worker with the same id as the one I just deleted but using the known good backup state.

$ oc create -f rendered-mc-backup.yaml machineconfig.machineconfiguration.openshift.io/rendered-worker-0483f3f9c265f75685e1b23edf5d261d created

And finally touch the machine-config-daemon-force to force a reconciliation.

$ oc debug node/nvd-srv-30.nvidia.eng.rdu2.dc.redhat.com -- touch /host/run/machine-config-daemon-force Starting pod/nvd-srv-30nvidiaengrdu2dcredhatcom-debug-jbbmv ... To use host binaries, run `chroot /host` ^C Removing debug pod ...

This got me back to a good known state and I could proceed with adding my hughpage machineconfig and then reapplying the OVS-DOCA image to my cluster.

Hopefully this write-up provided some insight into this experimental yet not supported activity of trying to get the OVS-DOCA version of Open vSwitch running on an OpenShift cluster.