Deployment of Kubeflow on EKS with Azure AD Authentication (OIDC) and Programmatic Access- Part 2

Reading Time: 5 minutes

Continuing from Part 1, where we explored the importance of automation in MLOps, this part delves into enabling seamless programmatic access to Kubeflow Pipelines.

 

Kubeflow’s dashboard is the gateway to managing machine learning workflows, but interacting with it manually isn’t always practical—especially when automation and CI/CD come into play.

 

To enable seamless programmatic access, we need a mechanism that allows scripts, CI/CD pipelines, or external tools to interact with the Kubeflow API. One robust solution is leveraging Kubernetes Service Accounts for authentication and authorization.

 

By assigning the right permissions to a Kubernetes Service Account within the Kubeflow namespace, we can programmatically upload and manage pipelines without manual intervention.

Setting up Kubernetes service account for Kubeflow access

Below is a step-by-step guide to configuring a Kubernetes Service Account to interact with Kubeflow’s API.

1. Creating profile for the user

When we create a profile resource in kubernetes, it creates a namespace in the cluster. The value we assign to ‘name’ in metadata annotation, it is going to be name of the namespace.

 

Expand to see full code
    apiVersion: kubeflow.org/v1
    kind: Profile
    metadata:
     name: kubeflow-username
    spec:
     owner:
       kind: User
       name: [email protected]
    

When we create a profile in Kubernetes, along with ‘namespace’ it also creates service account in that namespace. ‘Default-editor’ service account is also created automatically when we create a namespace. We will use this service account to authenticate pipeline sdk by giving permissions to this ‘default-editor’ service account in Kubeflow-dashboard’s namespace.

 

Here, the mail address we are using is Azure Entra ID credential’s mail address. The mail we are using should match with the mail ID we are preferring for the authentication via creds for the user. When we sign up by programmatic access in the Kubeflow dashboard, the dashboard will open within this user’s namespace. By using namespace, the resources user created in kubeflow will be restored and only be accessible to the current user.

2. Granting the necessary permissions

Now, we need to authorize the default-editor service account to operate within the Kubeflow namespace. This is achieved using a RoleBinding that links the service account to the kubeflow-edit ClusterRole:

 

Expand to see full code
  apiVersion: rbac.authorization.k8s.io/v1
  kind: RoleBinding
  metadata:
   name: allow-namespace-kubeflow-edit
   ## this RoleBinding is in `kubeflow`, because it grants access to `kubeflow`
   namespace: kubeflow
  roleRef:
   apiGroup: rbac.authorization.k8s.io
   kind: ClusterRole
   name: kubeflow-edit
   ## grants edit-level permissions to service account within a Kubeflow profile      namespace.
  subjects:
   - kind: ServiceAccount 
     name: default-editor 
     ## the ServiceAccount lives in `kubeflow-username` namespace.
     namespace: kubeflow-username
  

 

Apply this YAML configuration using:

  kubectl apply -f rolebinding.yaml

3. Updating Istio authorization policies

Kubeflow’s dashboard relies on multiple components, primarily:

 

  • ml-pipeline-ui: Handles dashboard visualization.
  • istio-ingressgateway: Manages API handling.

 

Ultimately Istio-gateway is the one who manage API handling in kubeflow. Till this stage we have setup service account. This is not enough to connect to kubeflow API. We need to let Istio know to grant service account the access to APIs and services to ml-pipeline-ui namespace. Kubeflow Pipeline API and Notebook pipeline APIs resides in ml-pipeline-ui namespace.

 

To enable service account-based access, we need to modify the AuthorizationPolicy for ml-pipeline-ui and include our default-editor service account in the principals list.

 

Modify the policy to add our service account. We need to configure following file to do this:

 

apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml

 

Expand to see full code
  apiVersion: security.istio.io/v1
  kind: AuthorizationPolicy
  metadata:
    name: ml-pipeline-ui
    namespace: kubeflow
  spec:
    selector:
      matchLabels:
        app: ml-pipeline-ui
    rules:
      - from:
          - source:
              principals:
                - cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account
                - cluster.local/ns/kubeflow-user-example-com/sa/default-editor
                     ## it will grant API access to service account
        when:
          - key: request.headers[authorization]
            values:
              - '*'
          - key: request.headers[kubeflow-userid]
            notValues:
              - '*'
  

 

Apply the updated policy:

  kubectl apply -f istio-authorization-config.yaml

Connecting the Kubeflow client with a Kubernetes service account

With the authorization configured, we can now connect to Kubeflow programmatically using the Kubeflow Pipelines SDK.

 

Wrapping the Client Code to connect kubeflow-API in docker Image to be used within cluster.

 

To ensure portability and security, we containerize the client code. Here’s a simple Python script to authenticate with Kubeflow Pipelines using an in-cluster service account:

 

Connect_Client.py

 

Expand to see full code
    import kfp
import os

# Read the service account token from the custom path defined in env variable
def get_service_account_token():
    token_path = os.getenv('KF_PIPELINES_SA_TOKEN_PATH')
    try:
        with open(token_path, "r") as f:
            return f.read().strip()
    except Exception as e:
        raise Exception(f"Failed to load service account token from {token_path}: {str(e)}")

# Configure the client with authentication

PIPELINE_HOST = "http://ml-pipeline.kubeflow.svc.cluster.local:8888"

sa_token = get_service_account_token()

client = kfp.Client(
    host=PIPELINE_HOST,
    existing_token=sa_token
)

namespace = "kubeflow-user"  # Using your namespace from the pod definition
print(client.list_experiments(namespace=namespace))

    

 

Here we are fetching Service Account Token from ENV. Then using that token to connect with the kubeflow api. The kfp syntax used in above code is compatible with kubeflow 1.9.1 deployed earlier in the previous section and have been tested successfully.

 

Since we are performing in-cluster authentication, we directly use ml-pipeline-ui.kubeflow.svc.cluster.local as the API host.

 

requirements.txt for this code :
Kfp==1.8.22

 

Docker File:

 

Expand to see full code
    # Use a base image with Python
FROM python:3.9-slim

# Set the working directory inside the container
WORKDIR /app

# Copy requirements.txt to the working directory
COPY requirements.txt .

# Install dependencies
RUN pip install  requirements.txt

# Copy the Python script to the working directory
COPY Connect_Client.py .

   existing_token=sa_token
# Set the default command to run the Python script
CMD ["python", "Connect_Client.py"]
    

 

existing_token=sa_token

 

Create docker image:

 

docker build -t username/kubeflow-client-test:v1

 

Push docker image:

 

docker push username/kubeflow-client-test:v1

Creating a POD

Now we will create a pod. Here the image we are using for pod consists of the Python SDK code needed to connect to the Kubeflow client. We are giving a command to this pod[“sleep”, “infinity”] so that this pod will keep on running and we can test our connection to Kubeflow client anytime.

The env variable KF_PIPELINES_SA_TOKEN_PATH is a standard variable which stores the token for the service account in Kubernetes. We will require this token while connecting pipeline SDK to Kubeflow pipeline (Kubeflow client). We used this token in the containerized code in the previous section to connect to the Kubeflow client.

 

Expand to see full code
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-rbac-auth
      namespace: kubeflow-username
    spec:
      serviceAccountName: default-editor
      containers:
      - name: test-kfp-auth
        image: username/kubeflow-client-test:v1
        command: ["sleep", "infinity"]
        env:
        - name: KF_PIPELINES_SA_TOKEN_PATH
          value: /var/run/secrets/kubeflow/pipelines/token
        volumeMounts:
        - name: volume-kf-pipeline-token
          mountPath: /var/run/secrets/kubeflow/pipelines
          readOnly: true
      volumes:
      - name: volume-kf-pipeline-token
        projected:
          sources:
          - serviceAccountToken:
              path: token
              expirationSeconds: 7200
              audience: pipelines.kubeflow.org    
    

Verifying the connection

Once the Pod is running, check its status:

  kubectl get pods -n kubeflow-username test-rbac-auth

Retrieve the logs to confirm successful authentication:

  kubectl logs access-kfp-example -n kubeflow-username

 

If everything is working correctly, the output should list the available pipelines:

 

Expand to see full code
    Example Response: 
    id: "12345"
    name: "Example Pipeline"
    description: "A sample Kubeflow pipeline."    
    

 

Note: You can use a similar approach for deploying Kubeflow pipelines using Kubeflow pipeline yaml.

In Conclusion

By following this setup, we have established a scalable, secure, and automated foundation for running Kubeflow Pipelines on Amazon EKS. This ensures that machine learning workflows can be efficiently managed, deployed, and monitored within a Kubernetes environment.

 

Here’s what we’ve accomplished:

 

  • Containerized the Kubeflow Pipelines client to ensure a portable and reproducible execution environment.
  • Deployed it inside the EKS cluster, enabling seamless interaction with Kubeflow.
  • Authenticated the service account to securely access the Kubeflow Pipelines API.

 

Now, this approach can be integrated into CI/CD pipelines, allowing for fully automated ML workflows in Kubeflow!

 

With this setup, your Kubeflow deployment is now production-ready!

References:

Kubeflow Authentication using Oauth2 Proxy

Transform data into real-world outcomes with us.