Saturday, February 07, 2026

OpenShift Hosted Control Planes Multi-Arch

 

 A hosted control plane (HCP) is a cloud-native architecture where the management components of a Red Hat® OpenShift® cluster, specifically the control plane, are decoupled from the worker nodes and managed as a service. HCP offers a consolidated, efficient, and secure approach to managing OpenShift and other Kubernetes clusters at scale. Instead of running on dedicated infrastructure (for the masters) within each cluster, the control plane components are hosted on a separate management cluster and managed as regular OpenShift workloads. This separation offers many advantages for organizations looking to optimize their OpenShift deployments especially for cost, strong isolation, and fast cluster provisioning time.

Some of the benefits of hosted control planes are as follows:

  • Reduced Costs: Smaller resource footprint and efficient resource utilization increases ROI
  • Fast Provisioning: Control plane containers spin up much faster than first having to deploy RHCOS on metal or virtual machines
  • Isolation: Dedicated infrastructure and security for the control plane enhance isolation, minimize attack surfaces, and improve overall security posture.
  • Scalability: The decoupled architecture enables independent scaling of control plane and worker nodes.

All of these benefits make HCP an attractive solution for businesses looking to get the most value out of their infrastructure. The rest of this blog will cover the process of configuring and then deploy an HCP cluster. First let's take a look at the environment we are working with so we understand what we are starting from.

Environment

The base environment starts with an x86 architecture of OpenShift 4.20.8 installed in a hyper-converged three node control/worker setup on virtual machines. It already has OpenShift Data Foundation installed and configured. We will configure MultiCluster Engine Operator, MetalLB Operator and deploy a hosted cluster on Arm64 worker nodes. The environment is depicted in the following diagram:

Since we are going to be deploying an HCP cluster made up of Arm worker nodes let's first confirm the cluster has multi architecture enabled.

We can confirm multi-architecture is enabled on OpenShift by running the following command.

$ oc adm release info -o jsonpath="{ .metadata.metadata}" {"url":"https://access.redhat.com/errata/RHBA-2025:23103"}

From the output above it appears we are not set up for multi architecture. But that is an easy fix because we can enable that as a day two operator. Running the following command should resolve our issue.

$ oc adm upgrade --to-multi-arch Requested update to multi cluster architecture

After a few minutes we can run our multi architecture command to check again.

$ oc adm release info -o jsonpath="{ .metadata.metadata}" {"release.openshift.io/architecture":"multi","url":"https://access.redhat.com/errata/RHBA-2025:23103"

Now our cluster looks good for us to move forward in our journey.

Install & Configuring MultiCluster Engine Operator

One of the challenges of scaling OpenShift environments is managing the lifecycle of a growing fleet. To meet that challenge, we can use the Multicluster Engine Operator. The operator delivers full lifecycle capabilities for managed OpenShift Container Platform clusters and partial lifecycle management for other Kubernetes distributions. It is available in two ways:

  • As a standalone operator that we install as part of your OpenShift Container Platform or OpenShift Kubernetes Engine subscription
  • As part of Red Hat Advanced Cluster Management for Kubernetes

For Hosted Control Planes this operator is required and for this demonstration we will us it in standalone mode. The first step is to install the operator with the following custom resource file.

$ cat <<EOF >multicluster-engine-operator.yaml --- apiVersion: v1 kind: Namespace metadata: name: multicluster-engine --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: multicluster-engine namespace: multicluster-engine spec: targetNamespaces: - multicluster-engine --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: multicluster-engine namespace: multicluster-engine spec: channel: stable-2.10 installPlanApproval: Automatic name: multicluster-engine source: redhat-operators sourceNamespace: openshift-marketplace EOF

One we have generated the custom resource file we can create it on the cluster.

$ oc create -f multicluster-engine-operator.yaml namespace/multicluster-engine created operatorgroup.operators.coreos.com/multicluster-engine created subscription.operators.coreos.com/multicluster-engine created

We can verify the instances of the operator is up and running by running the following command.

$ oc get pods -n multicluster-engine NAME READY STATUS RESTARTS AGE multicluster-engine-operator-6dd66fff8-gphcf 1/1 Running 0 9m20s multicluster-engine-operator-6dd66fff8-tq6mx 1/1 Running 0 9m20s

Now that the operator is up and running we need to go ahead and create a multicluster engine instance. The following custom resource file contains the values to create that instance.

$ cat <<EOF >multicluster-engine-instance.yaml apiVersion: multicluster.openshift.io/v1 kind: MultiClusterEngine metadata: name: multiclusterengine spec: availabilityConfig: Basic targetNamespace: multicluster-engine EOF

With the custom resource file generated we can create it on the cluster.

$ oc create -f multicluster-engine-instance.yaml multiclusterengine.multicluster.openshift.io/multiclusterengine created

Once the multicluster engine is up and running we should see the following pods under the multicluster-engine namespace.

$ oc get pods -n multicluster-engine NAME READY STATUS RESTARTS AGE cluster-curator-controller-7c66f8b67f-hbhkr 1/1 Running 0 8m30s cluster-image-set-controller-6879c9fdf7-vhvsp 1/1 Running 0 8m29s cluster-manager-847d499df7-kb5bx 1/1 Running 0 8m29s cluster-manager-847d499df7-w2sdj 1/1 Running 0 8m29s cluster-manager-847d499df7-z65kp 1/1 Running 0 8m29s cluster-proxy-addon-manager-86484759b9-mhgpg 1/1 Running 0 6m38s cluster-proxy-addon-user-5fff4bbf8-57r7v 2/2 Running 0 6m38s cluster-proxy-fbf4447f4-ch8p9 1/1 Running 0 5m clusterclaims-controller-dfcf6dcd4-b4p44 2/2 Running 0 8m29s clusterlifecycle-state-metrics-v2-7c66dbd6f9-pslqq 1/1 Running 0 8m30s console-mce-console-7dbbc66784-bb292 1/1 Running 0 8m32s discovery-operator-7997f54695-6mdct 1/1 Running 0 8m31s hcp-cli-download-5c4dfbfd6c-lgdhz 1/1 Running 0 4m59s hive-operator-6545b5986b-6pttn 1/1 Running 0 8m31s hypershift-addon-manager-64797b9868-h26wg 1/1 Running 0 6m44s infrastructure-operator-5f9d89c69-k9b82 1/1 Running 0 8m30s managedcluster-import-controller-v2-75b55d65bd-4h8b4 1/1 Running 0 8m27s multicluster-engine-operator-6dd66fff8-gphcf 1/1 Running 0 25m multicluster-engine-operator-6dd66fff8-tq6mx 1/1 Running 0 25m ocm-controller-84964b45bb-h5hvs 1/1 Running 0 8m28s ocm-proxyserver-8cbffb748-mj5hx 1/1 Running 0 8m26s ocm-webhook-7d99759b8d-5dv9j 1/1 Running 0 8m28s provider-credential-controller-6f54b788b5-zm9bd 2/2 Running 0 8m30s

Next we need to patch the multicluster-engine to enable hosted control planes (aka hypershift).

$ oc patch mce multiclusterengine --type=merge -p '{"spec":{"overrides":{"components":[{"name":"hypershift","enabled": true}]}}}' multiclusterengine.multicluster.openshift.io/multiclusterengine patched

We can validate it's enabled with the following.

$ oc get managedclusteraddons -n local-cluster hypershift-addon NAME AVAILABLE DEGRADED PROGRESSING hypershift-addon True False False

We also need to generate a provisioning configuration to watch all namespaces.

$ cat <<EOF >provisioning-config.yaml apiVersion: metal3.io/v1alpha1 kind: Provisioning metadata: name: provisioning-configuration spec: provisioningNetwork: "Disabled" watchAllNamespaces: true EOF

Then create the provisioning configuration on the cluster.

$ oc create -f provisioning-config.yaml provisioning.metal3.io/provisioning-configuration created

Now that hosted control planes are enabled we need to create a AgentServiceConfig custom resource file which will set the sizes of our database, filesystem and image storage.  Since we are using ODF and have the RBD block set as the default storage class it will automatically created the right sized PVs.

$ cat <<EOF >agent-service-config.yaml apiVersion: agent-install.openshift.io/v1beta1 kind: AgentServiceConfig metadata: name: agent spec: databaseStorage: accessModes: - ReadWriteOnce resources: requests: storage: 15Gi filesystemStorage: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi imageStorage: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi EOF

With the AgentServiceConfig custom resource file generated let's create it on the cluster.

$ oc create -f agent-service-config.yaml agentserviceconfig.agent-install.openshift.io/agent created

We can validate that the agent service is running by finding the assisted-image-service and assisted-service running under the multicluster-engine namespace.

$ oc get pods -n multicluster-engine NAME READY STATUS RESTARTS AGE agentinstalladmission-679cd54c5f-qjvfn 1/1 Running 0 87s agentinstalladmission-679cd54c5f-slj4s 1/1 Running 0 87s assisted-image-service-0 1/1 Running 0 86s assisted-service-587c875884-qcfb2 2/2 Running 0 88s cluster-curator-controller-7c66f8b67f-hbhkr 1/1 Running 0 24h cluster-image-set-controller-6879c9fdf7-vhvsp 1/1 Running 0 24h cluster-manager-847d499df7-kb5bx 1/1 Running 0 24h cluster-manager-847d499df7-w2sdj 1/1 Running 0 24h cluster-manager-847d499df7-z65kp 1/1 Running 0 24h cluster-proxy-addon-manager-86484759b9-mhgpg 1/1 Running 0 24h cluster-proxy-addon-user-5fff4bbf8-57r7v 2/2 Running 0 24h cluster-proxy-fbf4447f4-ch8p9 1/1 Running 0 24h clusterclaims-controller-dfcf6dcd4-b4p44 2/2 Running 0 24h clusterlifecycle-state-metrics-v2-7c66dbd6f9-pslqq 1/1 Running 0 24h console-mce-console-7dbbc66784-bb292 1/1 Running 0 24h discovery-operator-7997f54695-6mdct 1/1 Running 0 24h hcp-cli-download-5c4dfbfd6c-lgdhz 1/1 Running 0 24h hive-operator-6545b5986b-6pttn 1/1 Running 0 24h hypershift-addon-manager-64797b9868-h26wg 1/1 Running 0 24h infrastructure-operator-5f9d89c69-k9b82 1/1 Running 1 (11h ago) 24h managedcluster-import-controller-v2-75b55d65bd-4h8b4 1/1 Running 1 (11h ago) 24h multicluster-engine-operator-6dd66fff8-gphcf 1/1 Running 0 24h multicluster-engine-operator-6dd66fff8-tq6mx 1/1 Running 0 24h ocm-controller-84964b45bb-h5hvs 1/1 Running 0 24h ocm-proxyserver-8cbffb748-mj5hx 1/1 Running 0 24h ocm-webhook-7d99759b8d-5dv9j 1/1 Running 0 24h provider-credential-controller-6f54b788b5-zm9bd 2/2 Running 0 24h

Now that the multicluster engine is up and running we need to create a few secrets for our hosted cluster. In this example our hosted cluster will be called hcp-adlink. The first secret is for setting the base domain, pull-secret and ssh-key.

$ cat <<EOF >credentials.yaml apiVersion: v1 kind: Secret type: Opaque metadata: name: hcp-adlink namespace: default labels: cluster.open-cluster-management.io/credentials: "" cluster.open-cluster-management.io/type: hostinventory stringData: baseDomain: schmaustech.com pullSecret: PULL-SECRET # Update with pull-secret ssh-publickey: SSH-KEY # Update with ssh-key EOF

Let's create the key on the cluster.

$ oc create -f credentials.yaml secret/hcp-adlink created

Next we need a secret for our infrastructure environment. The following is an example again where our cluster name is hcp-adlink. Also notice here that we are defining the CPU architecture here as arm64 since our hosted workers will be arm64.

$ cat <<EOF >infrastructure-environment.yaml kind: Secret apiVersion: v1 metadata: name: pullsecret-hcp-adlink namespace: hcp-adlink data: '.dockerconfigjson': 'PULL-SECRET-REDACTED' type: 'kubernetes.io/dockerconfigjson' --- apiVersion: agent-install.openshift.io/v1beta1 kind: InfraEnv metadata: name: hcp-adlink namespace: hcp-adlink labels: agentclusterinstalls.extensions.hive.openshift.io/location: Minneapolis networkType: dhcp spec: agentLabels: 'agentclusterinstalls.extensions.hive.openshift.io/location': Minneapolis pullSecretRef: name: pullsecret-hcp-adlink sshAuthorizedKey: SSH-KEY-REDACTED nmStateConfigLabelSelector: matchLabels: infraenvs.agent-install.openshift.io: hcp-adlink cpuArchitecture: arm64 status: agentLabelSelector: matchLabels: 'agentclusterinstalls.extensions.hive.openshift.io/location': Minneapolis --- kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: name: capi-provider-role namespace: hcp-adlink rules: - verbs: - '*' apiGroups: - agent-install.openshift.io resources: - agents EOF

Once we have generated the custom resource file we can create it on the cluster.

$ oc create -f infrastructure-environment.yaml secret/pullsecret-hcp-adlink created infraenv.agent-install.openshift.io/hcp-adlink created role.rbac.authorization.k8s.io/capi-provider-role created

We can also validate our infrastructure environment by the following.

$ oc get infraenv -n hcp-adlink NAME ISO CREATED AT hcp-adlink 2026-02-07T15:59:38Z

This completes the initial configuration of multicluster engine.

Install & Configuring Metallb Operator Host Cluster

Before we move forward with deploying a hosted control plane cluster we need to install the Metallb Operator on our cluster that will host the the hosted control plane. The reason for this is Metallb will provide a loadbalancer and vip ipaddress for the api of our hosted cluster. The first step here is to install the Metallb Operator using the following custom resource file.

$ cat <<EOF >metallb-operator.yaml apiVersion: v1 kind: Namespace metadata: name: metallb-system --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: metallb-operator namespace: metallb-system --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: metallb-operator-sub namespace: metallb-system spec: channel: stable name: metallb-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF

With the custom resource file generated we can create the resources on the cluster.

$ oc create -f metallb-operator.yaml namespace/metallb-system created operatorgroup.operators.coreos.com/metallb-operator created subscription.operators.coreos.com/metallb-operator-sub created

Next we have to generate a MetalLB instance using the following custom resource file.

$ cat <<EOF >metallb-instance.yaml apiVersion: metallb.io/v1beta1 kind: MetalLB metadata: name: metallb namespace: metallb-system EOF

With the custom resource file generated we can create the resource on the cluster.

$ oc create -f metallb-instance.yaml metallb.metallb.io/metallb created

Finally we can check and see if all our MetalLB pods are up and running.

$ oc get pods -n metallb-system NAME READY STATUS RESTARTS AGE controller-7f78f89f5f-hj4vb 2/2 Running 0 28s metallb-operator-controller-manager-84544fc95f-pfm89 1/1 Running 0 3m28s metallb-operator-webhook-server-644c4c9758-5t6xm 1/1 Running 0 3m27s speaker-55xt7 2/2 Running 0 28s speaker-kclzj 2/2 Running 0 28s speaker-mdjjn 2/2 Running 0 28s

If the pods are up and running we have two more steps we need to take. The first is to generate a IPAddressPool for MetalLB so it knows where to get the ipaddresses for resources like our hosted control plane when they request it. We can use the following custom resource file to accomplish that.

$ cat <<EOF >metallb-ipaddresspool.yaml apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: hcp-network namespace: metallb-system spec: addresses: - 192.168.0.170-192.168.0.172 autoAssign: true

With the custom resource file generated we can create the resource on the cluster.

$ oc create -f metallb-ipaddresspool.yaml ipaddresspool.metallb.io/hcp-network created

We also need to configure the L2 advertisement configuration.

$ cat <<EOF >metallb-l2advertisement.yaml apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: advertise-hcp-network namespace: metallb-system spec: ipAddressPools: - hcp-network

With the custom resource file generated we can create it on the cluster.

$ oc create -f metallb-l2advertisement.yaml l2advertisement.metallb.io/advertise-hcp-network created

This completes the steps of configuration for MetalLB on the cluster that will host our hosted control plane cluster.

Deploying a Hosted Control Plane Cluster

In previous steps we went ahead and configured MultiCluster Engine Operator, Hosted Control Planes and MetalLB Operator along with creating an infrastructure environment. At this point we are almost ready to deploy a hosted control plane cluster but we first need to add some nodes to our infrastructure environment. To do this we will first extract the minimal ISO from our infrastructure environment.

$ oc get infraenv -n hcp-adlink hcp-adlink -o jsonpath='{.status.isoDownloadURL}' | sed s/minimal-iso/full-iso/g | xargs curl -kLo ~/discovery-hcp-adlink.iso % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 98.3M 100 98.3M 0 0 10.8M 0 0:00:09 0:00:09 --:--:-- 10.8M $ ls -lh ~/discovery-hcp-adlink.iso -rw-r--r--. 1 bschmaus bschmaus 99M Feb 7 10:08 /home/bschmaus/discovery-hcp-adlink.iso

We are now going to take that discovery-hcp-adlink.iso and boot it on a few of our Arm64 nodes. After letting the nodes boot and waiting a few minutes we can see they have shown up under our agent pool in the hcp-adlink namespace.

$ oc get agent -n hcp-adlink NAME CLUSTER APPROVED ROLE STAGE 0589f6a3-fb83-4f60-b4d4-3617c7023ca7 false auto-assign 5c9e6934-ea82-45b0-ab01-acb0626d86c5 false auto-assign 8c2920ce-6d30-4276-b51e-04ce22dcfae6 false auto-assign

Currently the nodes are not marked as approved so we need to approve them to make them usable in agent pool.

$ oc get agent -n hcp-adlink -ojson | jq -r '.items[] | select(.spec.approved==false) | .metadata.name'| xargs oc -n hcp-adlink patch -p '{"spec":{"approved":true}}' --type merge agent agent.agent-install.openshift.io/0589f6a3-fb83-4f60-b4d4-3617c7023ca7 patched agent.agent-install.openshift.io/5c9e6934-ea82-45b0-ab01-acb0626d86c5 patched agent.agent-install.openshift.io/8c2920ce-6d30-4276-b51e-04ce22dcfae6 patched $ oc get agent -n hcp-adlink NAME CLUSTER APPROVED ROLE STAGE 0589f6a3-fb83-4f60-b4d4-3617c7023ca7 true auto-assign 5c9e6934-ea82-45b0-ab01-acb0626d86c5 true auto-assign 8c2920ce-6d30-4276-b51e-04ce22dcfae6 true auto-assign

At this point we are ready to create our HostedCluster. The following HostedCluster resource file depicts the setting for deploying a cluster called hcp-adlink which will be deployed with OpenShift 4.20.13 and request 3 worker nodes from our agent nodepool.

$ cat <<EOF >hosted-cluster-deployment.yaml --- apiVersion: hypershift.openshift.io/v1beta1 kind: HostedCluster metadata: name: 'hcp-adlink' namespace: 'hcp-adlink' labels: "cluster.open-cluster-management.io/clusterset": 'default' spec: release: image: quay.io/openshift-release-dev/ocp-release:4.20.13-multi pullSecret: name: pullsecret-cluster-hcp-adlink sshKey: name: sshkey-cluster-hcp-adlink networking: clusterNetwork: - cidr: 10.132.0.0/14 serviceNetwork: - cidr: 172.31.0.0/16 networkType: OVNKubernetes controllerAvailabilityPolicy: SingleReplica infrastructureAvailabilityPolicy: SingleReplica olmCatalogPlacement: management platform: type: Agent agent: agentNamespace: 'hcp-adlink' infraID: 'hcp-adlink' dns: baseDomain: 'schmaustech.com' services: - service: APIServer servicePublishingStrategy: type: LoadBalancer - service: OAuthServer servicePublishingStrategy: type: Route - service: OIDC servicePublishingStrategy: type: Route - service: Konnectivity servicePublishingStrategy: type: Route - service: Ignition servicePublishingStrategy: type: Route --- apiVersion: v1 kind: Secret metadata: name: pullsecret-cluster-hcp-adlink namespace: hcp-adlink data: '.dockerconfigjson': <REDACTED PULL SECRET> type: kubernetes.io/dockerconfigjson --- apiVersion: v1 kind: Secret metadata: name: sshkey-cluster-hcp-adlink namespace: 'hcp-adlink' stringData: id_rsa.pub: <REDACTED SSH-KEY> --- apiVersion: hypershift.openshift.io/v1beta1 kind: NodePool metadata: name: 'nodepool-hcp-adlink-1' namespace: 'hcp-adlink' spec: clusterName: 'hcp-adlink' replicas: 3 management: autoRepair: false upgradeType: InPlace platform: type: Agent agent: agentLabelSelector: matchLabels: {} release: image: quay.io/openshift-release-dev/ocp-release:4.20.13-multi --- apiVersion: cluster.open-cluster-management.io/v1 kind: ManagedCluster metadata: annotations: import.open-cluster-management.io/hosting-cluster-name: local-cluster import.open-cluster-management.io/klusterlet-deploy-mode: Hosted open-cluster-management/created-via: hypershift labels: cloud: BareMetal vendor: OpenShift name: 'hcp-adlink' cluster.open-cluster-management.io/clusterset: 'default' name: 'hcp-adlink' spec: hubAcceptsClient: true ---

Once we have generated the HostedCluster custom resource file we can create it on our cluster. Notice we are generating a few different resources here all related to our hcp-adlink cluster.

$ oc create -f hosted-cluster-deployment.yaml hostedcluster.hypershift.openshift.io/hcp-adlink created secret/pullsecret-cluster-hcp-adlink created secret/sshkey-cluster-hcp-adlink created nodepool.hypershift.openshift.io/nodepool-hcp-adlink-1 created managedcluster.cluster.open-cluster-management.io/hcp-adlink created

Once the resources are created we can look at the state of the HostedCluster by using the following command. As the creation progress the information under messages changes. If one wanted to watch this in realtime they could pass the -w flag which will monitor the state similar to the watch command.

$ oc get hostedcluster -n hcp-adlink NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE hcp-adlink Partial False False Cluster infrastructure is still provisioning $ oc get hostedcluster -n hcp-adlink NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE hcp-adlink Partial False False Waiting for hosted control plane kubeconfig to be created $ oc get hostedcluster -n hcp-adlink NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE hcp-adlink hcp-adlink-admin-kubeconfig Partial True False The hosted control plane is available

Besides monitoring the HostedCluster creation we can also observer the nodepool state as the worker nodes are being scaled up.

$ oc get nodepool nodepool-hcp-adlink-1 -n hcp-adlink NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE nodepool-hcp-adlink-1 hcp-adlink 3 False False 4.20.13 False False Scaling up MachineSet to 3 replicas (actual 0)

We can monitor the worker node states from the agent and see their progression from being assigned to the hcp-adlink cluster, them rebooting after the initial RHCOS image is layed down, them joining the cluster and finally the process being completed. We can watch this by either manually running the command below multiple times or by adding a -w to the end of the command to make it behave like a watch command.

$ oc get agent -n hcp-adlink NAME CLUSTER APPROVED ROLE STAGE 0589f6a3-fb83-4f60-b4d4-3617c7023ca7 hcp-adlink true auto-assign 5c9e6934-ea82-45b0-ab01-acb0626d86c5 hcp-adlink true auto-assign 8c2920ce-6d30-4276-b51e-04ce22dcfae6 hcp-adlink true auto-assign $ oc get agent -n hcp-adlink NAME CLUSTER APPROVED ROLE STAGE 0589f6a3-fb83-4f60-b4d4-3617c7023ca7 hcp-adlink true worker Rebooting 5c9e6934-ea82-45b0-ab01-acb0626d86c5 hcp-adlink true worker Rebooting 8c2920ce-6d30-4276-b51e-04ce22dcfae6 hcp-adlink true worker Rebooting $ oc get agent -n hcp-adlink NAME CLUSTER APPROVED ROLE STAGE 0589f6a3-fb83-4f60-b4d4-3617c7023ca7 hcp-adlink true worker Joined 5c9e6934-ea82-45b0-ab01-acb0626d86c5 hcp-adlink true worker Joined 8c2920ce-6d30-4276-b51e-04ce22dcfae6 hcp-adlink true worker Joined $ oc get agent -n hcp-adlink NAME CLUSTER APPROVED ROLE STAGE 0589f6a3-fb83-4f60-b4d4-3617c7023ca7 hcp-adlink true worker Done 5c9e6934-ea82-45b0-ab01-acb0626d86c5 hcp-adlink true worker Done 8c2920ce-6d30-4276-b51e-04ce22dcfae6 hcp-adlink true worker Done

Once the agent list shows the nodes in a complete state we can go back to the nodepool command and see that the cluster has scaled to the desired number of nodes.

$ oc get nodepool nodepool-hcp-adlink-1 -n hcp-adlink NAME CLUSTER DESIRED NODES CURRENT NODES AUTOSCALING AUTOREPAIR VERSION UPDATINGVERSION UPDATINGCONFIG MESSAGE nodepool-hcp-adlink-1 hcp-adlink 3 3 False False 4.20.13 False False

We can then go back and look at the state of the HostedCluster. Here we can see the progress is still partial even though the worker nodes have joined and the control plane is up.

$ oc get hostedcluster -n hcp-adlink NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE hcp-adlink hcp-adlink-admin-kubeconfig Partial True False The hosted control plane is available

To explore why the HostedCluster is still in a partial state we should extract the kubeconfig config from our hosted cluster. While we are at it let's also get the kubeadmin password.

oc get secret -n hcp-adlink hcp-adlink-admin-kubeconfig -ojsonpath='{.data.kubeconfig}'| base64 -d > ~/kubeconfig-hcp-adlink $ oc get secret -n hcp-adlink hcp-adlink-kubeadmin-password -ojsonpath='{.data.password}'| base64 -d h9DyP-tcHpQ-CxBDP-dqVt6

Now that we have the kubeconfig from our HostedCluster let's export it to the KUBECONFIG variable and take a look at the HostedCluster operators output. We can see that we still have a few issues with our cluster. Specifically the ingress operator and the console operator.

$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.20.13 False False True 88m RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hcp-adlink.schmaustech.com): Get "https://console-openshift-console.apps.hcp-adlink.schmaustech.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers) csi-snapshot-controller 4.20.13 True False False 112m dns 4.20.13 True False False 87m image-registry 4.20.13 True False False 88m ingress 4.20.13 True False True 111m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing. Last 2 error messages:... insights 4.20.13 True False False 89m kube-apiserver 4.20.13 True False False 112m kube-controller-manager 4.20.13 True False False 112m kube-scheduler 4.20.13 True False False 112m kube-storage-version-migrator 4.20.13 True False False 89m monitoring 4.20.13 True False False 80m network 4.20.13 True False False 90m node-tuning 4.20.13 True False False 96m openshift-apiserver 4.20.13 True False False 112m openshift-controller-manager 4.20.13 True False False 112m openshift-samples 4.20.13 True False False 87m operator-lifecycle-manager 4.20.13 True False False 112m operator-lifecycle-manager-catalog 4.20.13 True False False 112m operator-lifecycle-manager-packageserver 4.20.13 True False False 112m service-ca 4.20.13 True False False 89m storage 4.20.13 True False False 112m

The reason for the issues with the operators above is because we currently do not have anything answering for our ingress virtual ipaddress. We have to enable that and to do so requires the MetalLB Operator to be installed on our HostedCluster.

Install & Configuring Metallb Operator on Hosted Cluster

Since we need the MetalLB Operator on our HostedCluster let's go ahead and start by creating the same custom resource we used on the host cluster.

$ cat <<EOF >metallb-operator.yaml apiVersion: v1 kind: Namespace metadata: name: metallb-system --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: metallb-operator namespace: metallb-system --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: metallb-operator-sub namespace: metallb-system spec: channel: stable name: metallb-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF

Once we have generated the file we can create the operator on our cluster.

$ oc create -f metallb-operator.yaml namespace/metallb-system created operatorgroup.operators.coreos.com/metallb-operator created subscription.operators.coreos.com/metallb-operator-sub created

We also need to create the MetalLB instance as well.

$ cat <<EOF >metallb-instance.yaml apiVersion: metallb.io/v1beta1 kind: MetalLB metadata: name: metallb namespace: metallb-system EOF

Once we have generated the file we can create the instance on our cluster.

$ oc create -f metallb-instance.yaml metallb.metallb.io/metallb created

Let's do a quick spot check of the pods to ensure everything looks right.

$ oc get pods -n metallb-system NAME READY STATUS RESTARTS AGE controller-7f78f89f5f-94m87 2/2 Running 0 29s metallb-operator-controller-manager-76f58797d-69jdm 1/1 Running 0 92s metallb-operator-webhook-server-6d96484469-5z87l 1/1 Running 0 90s speaker-72wfx 1/2 Running 0 29s speaker-8h9vh 2/2 Running 0 29s speaker-t455x 2/2 Running 0 29s

Just like on the host cluster we also need to create a IPAddressPool for MetalLB.

$ cat <<EOF >metallb-ipaddresspool.yaml apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: hcp-network namespace: metallb-system spec: addresses: - 192.168.0.173-192.168.0.175 autoAssign: true EOF

Once we have the file let's create it on the cluster.

$ oc create -f metallb-ipaddresspool.yaml ipaddresspool.metallb.io/hcp-network created

We also need to configure the L2 advertisement.

$ cat <<EOF >metallb-l2advertisement.yaml apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: advertise-hcp-network namespace: metallb-system spec: ipAddressPools: - hcp-network EOF

Once we have the file we can create it on the cluster.

$ oc create -f metallb-l2advertisement.yaml l2advertisement.metallb.io/advertise-hcp-network created

We can validate the IPAddressPool is available with the following.

$ oc get ipaddresspool -n metallb-system -o yaml apiVersion: v1 items: - apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: creationTimestamp: "2026-02-07T20:03:50Z" generation: 1 name: hcp-network namespace: metallb-system resourceVersion: "21540" uid: 6bf55fee-d634-4789-a19e-8ce505ba8efb spec: addresses: - 192.168.0.173-192.168.0.175 autoAssign: true avoidBuggyIPs: false status: assignedIPv4: 0 assignedIPv6: 0 availableIPv4: 3 availableIPv6: 0 kind: List metadata: resourceVersion: ""

Now comes the most important part which is the Service. We need to create an ingress service for our HostedCluster which will request an ipaddress.

$ cat <<EOF >hcp-adlink-metallb-ingress.yaml kind: Service apiVersion: v1 metadata: annotations: metallb.io/address-pool: hcp-network name: metallb-ingress namespace: openshift-ingress spec: ports: - name: http protocol: TCP port: 80 targetPort: 80 - name: https protocol: TCP port: 443 targetPort: 443 selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default type: LoadBalancer EOF

Once we have generated the file we can create it on the cluster.

$ oc create -f hcp-adlink-metallb-ingress.yaml service/metallb-ingress created

We can validate that the Service was created and see what external ipaddress was assigned. In this case 192.168.0.173 was allocated. We should ensure we update our DNS records to reflect that *.apps.hcp-adlink.schmaustech.com resolves to that ipaddress.

$ oc get service -n openshift-ingress NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE metallb-ingress LoadBalancer 172.31.54.206 192.168.0.173 80:31492/TCP,443:30929/TCP 9s router-internal-default ClusterIP 172.31.142.3 <none> 80/TCP,443/TCP,1936/TCP 119m

Now if we go back and look at the cluster operators output again we can see the ingress and console operators have resolved since we added MetalLB, our Service and our DNS records.

$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE console 4.20.13 True False False 83s csi-snapshot-controller 4.20.13 True False False 121m dns 4.20.13 True False False 96m image-registry 4.20.13 True False False 97m ingress 4.20.13 True False False 120m insights 4.20.13 True False False 98m kube-apiserver 4.20.13 True False False 121m kube-controller-manager 4.20.13 True False False 121m kube-scheduler 4.20.13 True False False 121m kube-storage-version-migrator 4.20.13 True False False 97m monitoring 4.20.13 True False False 89m network 4.20.13 True False False 98m node-tuning 4.20.13 True False False 105m openshift-apiserver 4.20.13 True False False 121m openshift-controller-manager 4.20.13 True False False 121m openshift-samples 4.20.13 True False False 96m operator-lifecycle-manager 4.20.13 True False False 121m operator-lifecycle-manager-catalog 4.20.13 True False False 121m operator-lifecycle-manager-packageserver 4.20.13 True False False 121m service-ca 4.20.13 True False False 98m storage 4.20.13 True False False 121m

We can also test via cli to confirm resolution and service response is working.

$ telnet console-openshift-console.apps.hcp-adlink.schmaustech.com 443 Trying 192.168.0.173... Connected to console-openshift-console.apps.hcp-adlink.schmaustech.com. Escape character is '^]'. ^] telnet> quit Connection closed.

Finally if we go back to the HostedCluster output we can see that indeed our hcp-adlink cluster installation has completed.

$ oc get hostedcluster -n hcp-adlink NAME VERSION KUBECONFIG PROGRESS AVAILABLE PROGRESSING MESSAGE hcp-adlink 4.20.13 hcp-adlink-admin-kubeconfig Completed True False The hosted control plane is available

Hopefully this provided a detailed explaination on how to prepare a host cluster for hosted control planes and then how to deploy a hosted control plane cluster. The goal here was to tell the story in an explicit way so people can be successful when using the software.

Thursday, January 15, 2026

OpenShift On-Cluster Image Mode & Lustre Client


Image mode for OpenShift allows you to easily extend the functionality of your base RHCOS image by layering additional images onto the base image. This layering does not modify the base RHCOS image. Instead, it creates a custom layered image that includes all RHCOS functionality and adds additional functionality to specific nodes in the cluster.

There are two methods for deploying a custom layered image onto your nodes:

  • On-cluster image mode where we create a MachineOSConfig object that includes the Containerfile and other parameters. The build is performed on the cluster and the resulting custom layered image is automatically pushed to your repository and applied to the machine config pool that you specified in the MachineOSConfig object. The entire process is performed completely within your cluster.

  • Out-of-cluster image mode where we create a Containerfile that references an OpenShift Container Platform image and the RPM that we want to apply, build the layered image in your own environment, and push the image to a repository. Then, in the cluster, we create a MachineConfig object for the targeted node pool that points to the new image. The Machine Config Operator overrides the base RHCOS image, as specified by the osImageURL value in the associated machine config, and boots the new image.

While I have written about out-of-cluster image mode before this example will focus on on-cluster image mode and specifically cover an example where I need the incorporate the Lustre client kernel drivers and packages into my OpenShift environment.

To get started we deployed a Single Node OpenShift environment running 4.20.8. Note the process will not be any different if using multinode there will just be more nodes to apply the updated image to.

Next we need to generate our secrets that can be used in the builder process. First let's set some environment variables for our internal registry, the user, the namespace and the token creation.

$ export REGISTRY=image-registry.openshift-image-registry.svc:5000 $ export REGISTRY_USER=builder $ export REGISTRY_NAMESPACE=openshift-machine-config-operator $ export TOKEN=$(oc create token $REGISTRY_USER -n $REGISTRY_NAMESPACE --duration=$((900*24))h)

Next let's create the push-secret using the variables we set in the openshift-machine-config-operator namespace.

$ oc create secret docker-registry push-secret -n openshift-machine-config-operator --docker-server=$REGISTRY --docker-username=$REGISTRY_USER --docker-password=$TOKEN secret/push-secret created

Now we need to extract the push secret and the clusters global pull-secret.

$ oc extract secret/push-secret -n openshift-machine-config-operator --to=- > push-secret.json # .dockerconfigjson $ oc extract secret/pull-secret -n openshift-config --to=- > pull-secret.json # .dockerconfigjson

We will now take the push-secret and global pull secret into one merged secret.

$ jq -s '.[0] * .[1]' pull-secret.json push-secret.json > pull-and-push-secret.json

This new merged secret needs to be create as well in the openshift-machine-config-operator namespace.

$ oc create secret generic pull-and-push-secret -n openshift-machine-config-operator --from-file=.dockerconfigjson=pull-and-push-secret.json --type=kubernetes.io/dockerconfigjson secret/pull-and-push-secret created
$ oc get secrets -n openshift-machine-config-operator |grep push pull-and-push-secret kubernetes.io/dockerconfigjson 1 10s push-secret kubernetes.io/dockerconfigjson 1 114s

Now we need to create MachineOSConfig custom resource file that will define the additional components we need to add to RHCOS. The following example shows that we will be doing the following:

  • This MachineOSConfig will be built and applied to nodes in the master MachineOSConfig pool. If we had workers in the worker pool we could change this and apply it there as well. We can also apply it to custom pools as well.
  • This MachineOSConfig will install EPEL, libyaml-devel and the four Lustre related client packages. Dnf will ensure to pull in any additional packages.
  • This MachineOSConfig has a renderedImagePushSpec pushing and pulling to internal registry of the OCP cluster. This could point to whichever registry where you want to store the image and then pull the image from.
  • We also have our secrets that we created before defined in this file.
$ cat <<EOF > on-cluster-rhcos-layer-mc.yaml apiVersion: machineconfiguration.openshift.io/v1 kind: MachineOSConfig metadata: name: master spec: machineConfigPool: name: master containerFile: - containerfileArch: NoArch content: |- FROM configs AS final RUN dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \ dnf install -y https://mirror.stream.centos.org/9-stream/CRB/x86_64/os/Packages/libyaml-devel-0.2.5-7.el9.x86_64.rpm && \ dnf install -y https://downloads.whamcloud.com/public/lustre/lustre-2.15.7/el9.6/client/RPMS/x86_64/lustre-iokit-2.15.7-1.el9.x86_64.rpm \ https://downloads.whamcloud.com/public/lustre/lustre-2.15.7/el9.6/client/RPMS/x86_64/lustre-client-2.15.7-1.el9.x86_64.rpm \ https://downloads.whamcloud.com/public/lustre/lustre-2.15.7/el9.6/client/RPMS/x86_64/lustre-client-dkms-2.15.7-1.el9.noarch.rpm \ https://downloads.whamcloud.com/public/lustre/lustre-2.15.7/el9.6/client/RPMS/x86_64/kmod-lustre-client-2.15.7-1.el9.x86_64.rpm && \ dnf clean all && \ ostree container commit imageBuilder: imageBuilderType: Job baseImagePullSecret: name: pull-and-push-secret renderedImagePushSecret: name: push-secret renderedImagePushSpec: image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image:latest

Once the MachineOSConfig custom resource file is generated we can create it on our cluster.

$ oc create -f on-cluster-rhcos-layer-mc.yaml machineosconfig.machineconfiguration.openshift.io/worker created

Once the MachineOSConfig has been created we can monitor it via the following command.

$ oc get machineosbuild NAME PREPARED BUILDING SUCCEEDED INTERRUPTED FAILED AGE master-afc1942c842a324aa66271cbf5fcb0d8 False True False False False 16s

We can also observe that a build-worker pod was created.

$ oc get pods -n openshift-machine-config-operator NAME READY STATUS RESTARTS AGE build-master-afc1942c842a324aa66271cbf5fcb0d8-fprgj 0/1 Init:0/1 0 29s kube-rbac-proxy-crio-sno2.schmaustech.com 1/1 Running 9 46h machine-config-controller-78b85fcd9c-h9gmn 2/2 Running 10 45h machine-config-daemon-tlt8g 2/2 Running 18 (6h37m ago) 45h machine-config-nodes-crd-cleanup-29470933-l8jz2 0/1 Completed 0 46h machine-config-nodes-crd-cleanup-29470952-xmlvn 0/1 Completed 0 45h machine-config-operator-658ff78994-bpzpj 2/2 Running 10 46h machine-config-server-mpwff 1/1 Running 5 45h machine-os-builder-65d7b4b97-m97lw 1/1 Running 0 41s

If we want to see more details on what is happening in the build-worker pod we can tail the logs of the image-build container inside the pod. I am only showing the command to obtain the logs here because the output is quite long and verbose. Further the build process takes awhile to run.

$ oc logs -f -n openshift-machine-config-operator build-master-afc1942c842a324aa66271cbf5fcb0d8-fprgj -c image-build

When the build finishes the image will get pushed to the registry defined in the MachineOSConfig. The logs will have a reference in there like this at the end.

+ buildah push --storage-driver vfs --authfile=/tmp/final-image-push-creds/config.json --digestfile=/tmp/done/digestfile --cert-dir /var/run/secrets/kubernetes.io/serviceaccount image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image:master-afc1942c842a324aa66271cbf5fcb0d8 Getting image source signatures Copying blob sha256:3a1265f127cd4df9ca7e05bf29ad06af47b49cff0215defce94c32eceee772bc Copying blob sha256:d87a18a2396ee3eb656b6237ac1fa64072dd750dde5aef660aff53e52c156f56 (...) Copying blob sha256:1d82edb13736f9bbad861d8f95cae0abfe5d572225f9d33d326e602ecc5db5fb Copying blob sha256:eb199ffe5f75bd36c537582e9cf5fa5638d55b8145f7dcd3cfc6b28699b2568d Copying config sha256:3d835eb02f08fe48d26d9b97ebcf0e190c401df2619d45cce1a94b0845d7f4e2 Writing manifest to image destination

At that point if we look at the machineosbuild output we will see the image moved to succeeded.

$ oc get machineosbuild NAME PREPARED BUILDING SUCCEEDED INTERRUPTED FAILED AGE master-afc1942c842a324aa66271cbf5fcb0d8 False False True False False 24m

And we will see that the machine config pool is now in an updating state. At this point the image that was build is getting applied to the system and it will reboot.

$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-2d9e26ad2557d1859aadc76634a4f1a5 False True False 1 0 0 0 46h worker rendered-worker-36ba71179b413c7b7abc3e477e7367d5 True False False 0 0 0 0 46h

Once the node or nodes if multicluster reboot we should be able to open a debug pod on them and validate that are kernel modules and client were installed properly. First let's open a debug prompt.

$ oc debug node/sno2.schmaustech.com Starting pod/sno2schmaustechcom-debug-xcvdh ... To use host binaries, run `chroot /host`. Instead, if you need to access host namespaces, run `nsenter -a -t 1`. Pod IP: 192.168.0.204 If you don't see a command prompt, try pressing enter. sh-5.1# chroot /host sh-5.1#

Next let's confirm the Lustre rpm packages are present.

sh-5.1# rpm -qa|grep lustre kmod-lustre-client-2.15.7-1.el9.x86_64 lustre-client-dkms-2.15.7-1.el9.noarch lustre-client-2.15.7-1.el9.x86_64 lustre-iokit-2.15.7-1.el9.x86_64

The packages are there now let's see if the Lustre kernel module is loaded. It might not be because my understanding is that it requires a process to request it first. If its not there we can manually load it.

sh-5.1# lsmod|grep lustre sh-5.1# modprobe lustre sh-5.1# lsmod|grep lustre lustre 1155072 0 lmv 233472 1 lustre mdc 315392 1 lustre lov 385024 2 mdc,lustre ptlrpc 1662976 7 fld,osc,fid,lov,mdc,lmv,lustre obdclass 3571712 8 fld,osc,fid,ptlrpc,lov,mdc,lmv,lustre lnet 884736 6 osc,obdclass,ptlrpc,ksocklnd,lmv,lustre libcfs 262144 11 fld,lnet,osc,fid,obdclass,ptlrpc,ksocklnd,lov,mdc,lmv,lustre ~

We can see our image has been updated and contains the necessary packages. Hopefully this provides an example of how to add 3rd party drivers and packages to an OpenShift environment. More details can be found on Image Mode here.