Deployment of Kubeflow on EKS with Azure AD Authentication (OIDC) and Programmatic Access- Part 2
Reading Time: 5 minutes
Continuing from Part 1, where we explored the importance of automation in MLOps, this part delves into enabling seamless programmatic access to Kubeflow Pipelines.
Kubeflow’s dashboard is the gateway to managing machine learning workflows, but interacting with it manually isn’t always practical—especially when automation and CI/CD come into play.
To enable seamless programmatic access, we need a mechanism that allows scripts, CI/CD pipelines, or external tools to interact with the Kubeflow API. One robust solution is leveraging Kubernetes Service Accounts for authentication and authorization.
By assigning the right permissions to a Kubernetes Service Account within the Kubeflow namespace, we can programmatically upload and manage pipelines without manual intervention.
Setting up Kubernetes service account for Kubeflow access
Below is a step-by-step guide to configuring a Kubernetes Service Account to interact with Kubeflow’s API.
1. Creating profile for the user
When we create a profile resource in kubernetes, it creates a namespace in the cluster. The value we assign to ‘name’ in metadata annotation, it is going to be name of the namespace.
apiVersion: kubeflow.org/v1 kind: Profile metadata: name: kubeflow-username spec: owner: kind: User name: [email protected]
When we create a profile in Kubernetes, along with ‘namespace’ it also creates service account in that namespace. ‘Default-editor’ service account is also created automatically when we create a namespace. We will use this service account to authenticate pipeline sdk by giving permissions to this ‘default-editor’ service account in Kubeflow-dashboard’s namespace.
Here, the mail address we are using is Azure Entra ID credential’s mail address. The mail we are using should match with the mail ID we are preferring for the authentication via creds for the user. When we sign up by programmatic access in the Kubeflow dashboard, the dashboard will open within this user’s namespace. By using namespace, the resources user created in kubeflow will be restored and only be accessible to the current user.
2. Granting the necessary permissions
Now, we need to authorize the default-editor service account to operate within the Kubeflow namespace. This is achieved using a RoleBinding that links the service account to the kubeflow-edit ClusterRole:
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: allow-namespace-kubeflow-edit ## this RoleBinding is in `kubeflow`, because it grants access to `kubeflow` namespace: kubeflow roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kubeflow-edit ## grants edit-level permissions to service account within a Kubeflow profile namespace. subjects: - kind: ServiceAccount name: default-editor ## the ServiceAccount lives in `kubeflow-username` namespace. namespace: kubeflow-username
Apply this YAML configuration using:
kubectl apply -f rolebinding.yaml
3. Updating Istio authorization policies
Kubeflow’s dashboard relies on multiple components, primarily:
- ml-pipeline-ui: Handles dashboard visualization.
- istio-ingressgateway: Manages API handling.
Ultimately Istio-gateway is the one who manage API handling in kubeflow. Till this stage we have setup service account. This is not enough to connect to kubeflow API. We need to let Istio know to grant service account the access to APIs and services to ml-pipeline-ui namespace. Kubeflow Pipeline API and Notebook pipeline APIs resides in ml-pipeline-ui namespace.
To enable service account-based access, we need to modify the AuthorizationPolicy for ml-pipeline-ui and include our default-editor service account in the principals list.
Modify the policy to add our service account. We need to configure following file to do this:
apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: ml-pipeline-ui
namespace: kubeflow
spec:
selector:
matchLabels:
app: ml-pipeline-ui
rules:
- from:
- source:
principals:
- cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account
- cluster.local/ns/kubeflow-user-example-com/sa/default-editor
## it will grant API access to service account
when:
- key: request.headers[authorization]
values:
- '*'
- key: request.headers[kubeflow-userid]
notValues:
- '*'
Apply the updated policy:
kubectl apply -f istio-authorization-config.yaml
Connecting the Kubeflow client with a Kubernetes service account
With the authorization configured, we can now connect to Kubeflow programmatically using the Kubeflow Pipelines SDK.
Wrapping the Client Code to connect kubeflow-API in docker Image to be used within cluster.
To ensure portability and security, we containerize the client code. Here’s a simple Python script to authenticate with Kubeflow Pipelines using an in-cluster service account:
Connect_Client.py
import kfp import os # Read the service account token from the custom path defined in env variable def get_service_account_token(): token_path = os.getenv('KF_PIPELINES_SA_TOKEN_PATH') try: with open(token_path, "r") as f: return f.read().strip() except Exception as e: raise Exception(f"Failed to load service account token from {token_path}: {str(e)}") # Configure the client with authentication PIPELINE_HOST = "http://ml-pipeline.kubeflow.svc.cluster.local:8888" sa_token = get_service_account_token() client = kfp.Client( host=PIPELINE_HOST, existing_token=sa_token ) namespace = "kubeflow-user" # Using your namespace from the pod definition print(client.list_experiments(namespace=namespace))
Here we are fetching Service Account Token from ENV. Then using that token to connect with the kubeflow api. The kfp syntax used in above code is compatible with kubeflow 1.9.1 deployed earlier in the previous section and have been tested successfully.
Since we are performing in-cluster authentication, we directly use ml-pipeline-ui.kubeflow.svc.cluster.local as the API host.
requirements.txt for this code :
Kfp==1.8.22
Docker File:
# Use a base image with Python FROM python:3.9-slim # Set the working directory inside the container WORKDIR /app # Copy requirements.txt to the working directory COPY requirements.txt . # Install dependencies RUN pip install requirements.txt # Copy the Python script to the working directory COPY Connect_Client.py . existing_token=sa_token # Set the default command to run the Python script CMD ["python", "Connect_Client.py"]
existing_token=sa_token
Create docker image:
docker build -t username/kubeflow-client-test:v1
Push docker image:
docker push username/kubeflow-client-test:v1
Creating a POD
Now we will create a pod. Here the image we are using for pod consists of the Python SDK code needed to connect to the Kubeflow client. We are giving a command to this pod[“sleep”, “infinity”] so that this pod will keep on running and we can test our connection to Kubeflow client anytime.
The env variable KF_PIPELINES_SA_TOKEN_PATH is a standard variable which stores the token for the service account in Kubernetes. We will require this token while connecting pipeline SDK to Kubeflow pipeline (Kubeflow client). We used this token in the containerized code in the previous section to connect to the Kubeflow client.
apiVersion: v1 kind: Pod metadata: name: test-rbac-auth namespace: kubeflow-username spec: serviceAccountName: default-editor containers: - name: test-kfp-auth image: username/kubeflow-client-test:v1 command: ["sleep", "infinity"] env: - name: KF_PIPELINES_SA_TOKEN_PATH value: /var/run/secrets/kubeflow/pipelines/token volumeMounts: - name: volume-kf-pipeline-token mountPath: /var/run/secrets/kubeflow/pipelines readOnly: true volumes: - name: volume-kf-pipeline-token projected: sources: - serviceAccountToken: path: token expirationSeconds: 7200 audience: pipelines.kubeflow.org
Verifying the connection
Once the Pod is running, check its status:
kubectl get pods -n kubeflow-username test-rbac-auth
Retrieve the logs to confirm successful authentication:
kubectl logs access-kfp-example -n kubeflow-username
If everything is working correctly, the output should list the available pipelines:
Example Response: id: "12345" name: "Example Pipeline" description: "A sample Kubeflow pipeline."
Note: You can use a similar approach for deploying Kubeflow pipelines using Kubeflow pipeline yaml.
In Conclusion
By following this setup, we have established a scalable, secure, and automated foundation for running Kubeflow Pipelines on Amazon EKS. This ensures that machine learning workflows can be efficiently managed, deployed, and monitored within a Kubernetes environment.
Here’s what we’ve accomplished:
- Containerized the Kubeflow Pipelines client to ensure a portable and reproducible execution environment.
- Deployed it inside the EKS cluster, enabling seamless interaction with Kubeflow.
- Authenticated the service account to securely access the Kubeflow Pipelines API.
Now, this approach can be integrated into CI/CD pipelines, allowing for fully automated ML workflows in Kubeflow!
With this setup, your Kubeflow deployment is now production-ready!
References:
Kubeflow Authentication using Oauth2 Proxy
Featured blogs
Talk to our experts
Get the best ROI with Sigmoid’s services in data engineering and AI