Deployment of Kubeflow on EKS with Azure AD Authentication (OIDC) and Programmatic Access- Part 1

Reading Time: 9 minutes

Scaling machine learning workflows in AI/ML ecosystem demands seamless orchestration, reproducibility, and efficiency to drive business impact. Enter Kubeflow—an open-source ML platform built to streamline these challenges by integrating effortlessly with Kubernetes. In this guide, we’ll walk you through deploying Kubeflow v1.9.1 on an Amazon EKS cluster, leveraging OIDC authentication via Azure AD for secure access. Additionally, we will demonstrate how to access/deploy Kubeflow Pipelines using the Pipelines SDK (Python) to the Kubeflow dashboard through programmatic access in an automated CI/CD workflow.

 

What is Kubeflow?

Kubeflow is a Kubernetes-native platform designed to automate and scale machine learning (ML) workflows. Originally developed by Google, it simplifies training, deploying, and managing ML models while leveraging Kubernetes’ inherent flexibility and resource management.

 

Key features of Kubeflow:

 

  • Scalability & automation: Supports distributed training with TensorFlow, PyTorch, and GPU/TPU optimizations.
  • Reproducibility: Containerized workflows ensure consistency across environments.
  • MLOps-ready: Automates hyperparameter tuning (via Katib), model serving (KFServing), and monitoring.
  • Secure multi-user access: Integrated OIDC authentication with Dex allows enterprise-grade security.

With Kubeflow, organizations can efficiently build, train, deploy, and scale ML models while following best practices for MLOps.

Prerequisites for deploying Kubeflow on EKS

Before setting up Kubeflow, it’s essential to have the right tools and configurations in place to ensure a smooth deployment. Make sure you have the following prerequisites:

 

  • AWS account with appropriate IAM permissions
  • AWS CLI installed and configured
  • EKSCTL (for managing EKS clusters)
  • Python 3.9+
  • Kustomize 5.2.1+ (for customizing Kubernetes manifests)
  • Kubeflow Pipelines SDK v1.9.1
  • kubectl (compatible with your Kubernetes version)
  • Domain name & TLS certificate
  • Azure AD (for authentication)

 

Step-by-step Kubeflow v1.9.1 deployment on EKS

1. Setting up the EKS cluster

To run Kubeflow efficiently, our EKS cluster needs to be configured with the following specs:

Requirements

  • Memory: 32 GB RAM
  • Compute: 16 vCPUs
  • Kubernetes Version: 1.30
  • Kustomize Version: 5.2.1+
  • kubectl: Compatible with Kubernetes 1.30

Cluster access & authentication

  • Cluster administration access → Allow IAM principals
  • Cluster auth mode → EKS API and ConfigMap
  • Cluster endpoint → Public & Private

Networking configuration

  • 3 Public subnets (spread across different Availability Zones)
  • 3 Private subnets (for isolated workloads)

Compute nodes configuration

To meet 32 GB RAM & 16 CPU Cores, we use

  • Instance type: m5.xlarge
  • Minimum nodes: 3
  • Desired nodes: 3
  • Maximum nodes: 5

2. Creating the EKS cluster

You can set up the EKS cluster with the above configuration using the following eksctl command:

  eksctl create cluster \
  --name kubeflow-cluster-bd-poc \
  --region us-east-1 \
  --nodegroup-name kubeflow-workers \
  --node-type m5.xlarge \
  --nodes 3 \
  --nodes-min 3 \
  --nodes-max 5 \
  --managed

After cluster creation, we need to perform a few steps in the cluster:

  • Essential add-ons

    After cluster creation, we need to perform a few steps in the cluster:

    • Amazon EKS pod identity agent
    • Amazon EBS CSI driver
  • IAM role permissions
  • IAM role for Node Group:
      AmazonEKS_EBS_CSI_DriverRole
      AmazonEKSClusterPolicy
      AWSLoadBalancerControllerIAMPolicy
      AmazonEKSWorkerNodePolicy
      AmazonEKSServicePolicy
  • IAM role for Node Instance:
      AmazonEKS_CNI_Policy
      AmazonEKSWorkerNodePolicy
      AmazonSSMManagedInstanceCore
      AmazonEC2ContainerRegistryReadOnly

What’s next?

Now that we have a Kubernetes cluster ready, the next step is to install and configure Kubeflow. In the next section, we’ll clone and customize Kubeflow manifests to align with our deployment needs.

Deploying Kubeflow on EKS with Azure AD Authentication

Securely authenticating Kubeflow with OIDC

When deploying Kubeflow on EKS, one of the most critical steps is implementing secure authentication. In this guide, we’ll configure OIDC authentication with Azure AD, ensuring that only authorized users can access the Kubeflow dashboard.

This involves three key steps:

 

  1. Configuring Azure AD for app registration and creating secrets.
  2. Updating oauth2-proxy.cfg to integrate Azure AD authentication
  3. Modifying Istio’s RequestAuthentication resourceto trust tokens from Azure AD.

First we need to clone the repository of kubeflow with version 1.9.1. Make sure you clone the repo with tag v1.9.1 .

Git clone [email protected]:kubeflow/manifests.git

Configuring OIDC authentication

To enable authentication via Azure AD, First we need to register our application and create secrets for it.

Step 1: Register an application in Azure AD

First, we need to register an application in Azure AD that will act as an authentication provider for Kubeflow.

 

  1. Go to Azure Portal → Navigate to Azure Entra ID (Active Directry).
  2. Click on App Registrations.
  3. Click New Registration and provide:
    • App Name: Kubeflow Authentication
    • Supported Account Types: Choose Single Tenant (if used within your org) or Multitenant for broader access.
    • Redirect URI:
  4. Click Register.

Once registered, you’ll get:

  • Application (client) ID
  • Directory (tenant) ID

Step 2: Configure API permissions

Next, we need to grant the right permissions so that Kubeflow can authenticate users and retrieve profile details.

 

  1. In your App Registration, go to Manage → API Permissions.
  2. Click Add a Permission → Choose Microsoft Graph.
  3. Select Delegated Permissions and add the following:
    • email – To fetch user email
    • Group.Read.All – To retrieve user groups. If you have to authenticate group, then only this permission is required.
    • GroupMember.Read.All – allows the application to read group memberships of users.
    • User.Read – allows the application to read the profile information of signed-in users.
    • openid – To authenticate users
    • profile – To access user profile information
  4. Click Grant Admin Consent (requires admin approval).

Now, Kubeflow can authenticate users and access profile details!

Step 3: Create client secrets for authentication

To allow Kubeflow to authenticate users, we need a client secret.

 

  1. Go to App Registration → Certificates & Secrets.
  2. Click New Client Secret.
  3. Add a description (e.g., Kubeflow Secret) and set an expiry duration.
  4. Click Add, then copy the secret value (it will not be visible later).

Important: Store the Client Secret securely! You’ll need it when configuring Kubeflow.

Step 4: Configure user access in enterprise applications

When you register an Azure Entra ID App, it automatically creates an Enterprise Application. We need to assign users who can log in to Kubeflow.

 

  1. Go to Azure Active Directory → Enterprise Applications.
  2. Find the app you just registered (Kubeflow Authentication).
  3. Click Users and Groups.
  4. Click Add User → Select users who need access.

Now, only assigned users can log in to Kubeflow!

Now, we need to update both OAuth2 Proxy settings and Istio’s RequestAuthentication policy.

Update oauth2-proxy.cfg

First, modify the OAuth2 Proxy configuration to use Azure Entra ID(Azure AD) instead of Dex.

We have to configure this file: common/oauth2-proxy/base/oauth2_proxy.cfg

provider = "oidc"
oidc_issuer_url = "https://login.microsoftonline.com/$TENANT_ID/v2.0"
## we are changing this oidc_issuer_url to point to azure Entra ID. TENANT_ID we will get while registering the app on Entra ID. 

scope = "profile email  openid"
## We have removed groups from scope because Azure Entra ID is not returning groups in our case. Its only returning user. 

email_domains = [ "*" ]
# serve a static HTTP 200 upstream on for authentication success
# we are using oauth2-proxy as an ExtAuthz to "check" each request, not pass it on
upstreams = [ "static://200" ]

# skip authentication for these paths
skip_auth_routes = [
  "^/dex/",
]
# requests to paths matching these regex patterns will receive a 401 Unauthorized response
# when not authenticated, instead of being redirected to the login page with a 302,
# this prevents background requests being redirected to the login page,
# and the accumulation of CSRF cookies
api_routes = [
  # Generic
  # NOTE: included because most background requests contain these paths
  "/api/",
  "/apis/",

 # Kubeflow Pipelines
  # NOTE: included because KFP UI makes MANY background requests to these paths but because they are
  #       not `application/json` requests, oauth2-proxy will redirect them to the login page
  "^/ml_metadata",
]

# OIDC Discovery has to be skipped and login url has to be provided directly
# in order to enable relative auth redirect. Using OIDC Discovery would set
# the redirect location to https://kubeflow.your-domain..io in the example
# installation. This address is usually not available through the Web Browser.
# If you have a setup where dex has it's url as other than the in-cluster
# service, this is optional.
# skip_oidc_discovery = true
# login_url = "/dex/auth"
# redeem_url = "https://kubeflow.your-domain.io/dex/token"
# oidc_jwks_url = "https://kubeflow.your-domain.io/dex/keys"

set_authorization_header = true
set_xauthrequest = true
cookie_name = "oauth2_proxy_kubeflow"
cookie_expire = "24h"
cookie_refresh = "1h" # This improves the user experience a lot

redirect_url = "https://kubeflow.your-domain.io/oauth2/callback"
## this redirect url we are setting directly with our DNS. we are not using relative_redirect_url. Thats why it is set to false in below variable. 
relative_redirect_url = false

Key changes:

  • OIDC Issuer: Updated to Azure AD.
  • Session Expiry: Set to 1h to avoid frequent logouts.
  • Redirect URL: Points to our custom domain kubeflow.yourdomain.com
  • Relative_redirect_url: set to false

Modify Istio’s request authentication

The default Istio authentication policy uses Dex as the identity provider. We need to update it to accept Azure AD tokens.

We need to configure this file to update RequestAuthentication resource: common/oauth2-proxy/components/istio-external-auth/requestauthentication.dex-jwt.yaml

apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: dex-jwt
  namespace: istio-system
spec:
  # we only apply to the ingress-gateway because:
  #  - there is no need to verify the same tokens at each sidecar
  #  - having no selector will apply to the RequestAuthentication to ALL
  #    Pods in the mesh, even ones which are not part of Kubeflow
  #  - some Kubeflow services accept direct connections with Kubernetes JWTs,
  #    and we don't want to require that users configure Istio to verify Kubernetes JWTs
  #    as there is no method to do this which works on all distributions.
  selector:
    matchLabels:
      app: istio-ingressgateway

  jwtRules:
  - issuer: https://login.microsoftonline.com/TENANT_ID/V2.0
## change this issuer to point to Azure Entra ID. Token issuer in our case should be Azure Entra ID not dex. 

    # `forwardOriginalToken` is not strictly required to be true.
    # there are pros and cons to each value:
    #  - true: the original token is forwarded to the destination service
    #          which raises the risk of the token leaking
    #  - false: the original token is stripped from the request
    #           which will prevent the destination service from
    #           verifying the token (possibly with its own RequestAuthentication)
    forwardOriginalToken: true

    # This will unpack the JWTs issued by dex into the expected headers.
    # It is applied to BOTH the m2m tokens from outside the cluster (which skip
    # oauth2-proxy because they already have a dex JWT), AND user requests which were
    # authenticated by oauth2-proxy (which injected a dex JWT).
    outputClaimToHeaders:
    - header: kubeflow-userid
      claim: email
    # - header: kubeflow-groups
    #   claim: groups
    ## we have commented on the group here in header because we are only sending user in header to be authenticated. 

    # We explicitly set `fromHeaders` to ensure that the JWT is only extracted from the `Authorization`   header.
    # This is because we exclude requests that have an `Authorization` header from oauth2-proxy.
    fromHeaders:
    - name: Authorization
      prefix: "Bearer "

Key changes:

  • Issuer: Now points to Azure AD.
  • Groups claim removed: Since we are not authenticating groups.

Apply the updated configuration:

kubectl apply -f request-authentication.yaml

Final deployment

After modifying these two files, deploy Kubeflow:

Note: Make sure there is one default storage class in the cluster. It means if we don’t specify any storage class to a pod, this storage class should be assigned by-default.

while ! kustomize build example | kubectl apply –server-side –force-conflicts -f -; do echo “Retrying to apply resources”; sleep 20; done

It will take around 20-30 minutes for Kubeflow to be deployed properly. The while loop will automatically stop when all the components of Kubeflow will be deployed successfully.

After this command, check if all the pods are running or not. If any pod is not running, then check the logs of the pod. Few pods might crash if your Kubernetes cluster does not have appropriate resources. Kindly stick to the configuration of Kubernetes mentioned earlier to deploy Kubeflow smoothly.

Once installed, you can access Kubeflow using port-forwarding:

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

This will expose the Kubeflow UI at http://localhost:8080.

On http://localhost:8080 login page will be visible but it will not be accepting users because we have put issuer as https://kubeflow-domain-name.io in cfg file and RequestAuthentication object in Kubeflow configuration. We need to setup ingress with our DNS entry. After setting-up ingress, login will work properly.

Setting up an ingress for secure Kubeflow access

Now we are enabling external access via an Ingress Server. This will allow users to securely access Kubeflow’s UI through an external DNS with TLS encryption.

The key component here is Istio’s Ingress Gateway, which handles authentication and routing requests to Kubeflow’s central dashboard.

Configuring ingress for Kubeflow

Why Do We Need an Ingress Server?

  1. Secure access – Allows external users to access the Kubeflow Dashboard over HTTPS.
  2. Handles large authentication tokens – The Identity Provider (IDP) sends a large authentication token, so we need to configure the buffer size properly.
  3. TLS Encryption – TEnsures encrypted communication via TLS certificates.

Step 1: Creating a TLS secret for secure authentication

Before setting up the Ingress Server, we need to store the TLS certificate in a Kubernetes Secret.

Run the following command to create a Kubernetes Secret with your TLS certificate and private key:

kubectl create secret tls kubeflow-tls --cert=certificate.crt --key=server.key -n istio-system

Before setting up the Ingress Server, we need to store the TLS certificate in a Kubernetes Secret.

Note: If you don’t have a TLS certificate yet, you’ll need to obtain one first (e.g., using Let’s Encrypt, AWS Certificate Manager, or Cloudflare).

Step 2: Deploying the ingress

Now, we’ll create an Ingress resource that:

  • Routes external requests to the Istio Ingress Gateway S(which handles Kubeflow’s authentication).
  • Configures buffer sizes to handle large authentication tokens.
  • Uses TLS encryption for secure HTTPS access.

Ingress Configuration (ingress.yaml)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: kubeflow-ingress
 namespace: istio-system
 annotations:
   kubernetes.io/ingress.class: nginx
   nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
   nginx.ingress.kubernetes.io/ssl-redirect: "true"
   nginx.ingress.kubernetes.io/proxy-ssl-secret: "istio-system/kubeflow-tls"
   nginx.ingress.kubernetes.io/large-client-header-buffers: "4 64k"
   nginx.ingress.kubernetes.io/proxy-buffer-size: "64k"
   nginx.ingress.kubernetes.io/proxy-buffers: "16 64k"
   nginx.ingress.kubernetes.io/proxy-busy-buffers-size: "128k"
   nginx.org/ssl-services: kubeflow-ingress
spec:
 ingressClassName: nginx
 rules:
 - host: kubeflow.your-domain.com
   http:
     paths:
     - path: /
       pathType: Prefix
       backend:
         service:
           name: istio-ingressgateway
           port:
             number: 80
 tls:
 - hosts:
   - kubeflow.your-domain.com
   secretName: kubeflow-tls

Apply this configuration to Kubernetes:

kubectl apply -f ingress.yaml

Once applied, Kubernetes will provision an external IP address for the Ingress.

Step 3: Creating a DNS entry

To make Kubeflow accessible via a custom domain, follow these steps:

Retrieve the Ingress IP address

kubectl get ingress -n istio-system

  1. This should return an EXTERNAL-IP.
  2. Create a DNS record
  3. Go to your preferred DNS provider (e.g., Cloudflare, AWS Route 53, Google Domains) and create a DNS A Record that maps your domain (e.g., kubeflow.your-domain.com) to the Ingress IP.

Verify the setup

Once the DNS propagation is complete, you should be able to access Kubeflow at:

https://kubeflow-your-domain.com

Troubleshooting tips

  • No IP assigned?

    If kubectl get ingress does not return an IP, your cloud provider might require additional whitelisting of IPs. Check your cloud networking settings.

  • TLS certificate issues?

    Ensure your TLS secret is correctly stored in Kubernetes and that the certificate matches the domain you are using.

  • Dashboard not loading?

    Check the logs of the ingress gateway:

    kubectl logs -n istio-system -l app=istio-ingressgateway

    we’ve:

    • Secured Kubeflow access using enterprise-grade authentication & setup the ingress-server for to view kubeflow dashboard on our DNS.
    • Enabled seamless login for data scientists and MLOps teams.
    • Ensured Istio authentication trusts Azure AD-issued tokens.

Next steps:

  • Test authentication & ensure role-based access works correctly.
  • Enhance security with Istio policies.
Transform data into real-world outcomes with us.