TECHNICAL BLOG

Deep Dives for Engineers

Detailed technical articles covering the real problems we solve in embedded systems, AI, and robotics engineering.

MLOps with MLflow, Docker, and Kubernetes: CI/CD for Machine Learning
Machine Learning

MLOps with MLflow, Docker, and Kubernetes: CI/CD for Machine Learning

Worksprout Team Jan 08, 2025 12 min read

How to build a production MLOps pipeline — experiment tracking with MLflow, containerised model serving with Docker, and automated deployment on Kubernetes with rolling updates.

The MLOps Gap

Most machine learning projects reach a functional model and then stall. The model exists in a notebook, or at best a Python script, on a data scientist's laptop. Moving it to production — with reproducible training, versioned model artefacts, automated retraining, safe deployment, and ongoing monitoring — is the MLOps problem. The tools have matured significantly: MLflow, Docker, and Kubernetes form a coherent stack that addresses every layer of this challenge.

Experiment Tracking with MLflow

MLflow Tracking records parameters, metrics, and artefacts for each training run. Set it up with a shared tracking server so the entire team shares visibility:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

mlflow.set_tracking_uri("http://mlflow.worksprout.internal:5000")
mlflow.set_experiment("anomaly-detection-v2")

with mlflow.start_run():
    mlflow.log_params({"n_estimators": 200, "max_depth": 12, "min_samples_leaf": 3})

    model = RandomForestClassifier(n_estimators=200, max_depth=12)
    model.fit(X_train, y_train)

    metrics = evaluate(model, X_test, y_test)
    mlflow.log_metrics(metrics)
    mlflow.sklearn.log_model(model, "model", registered_model_name="AnomalyDetector")

MLflow's Model Registry provides stage transitions (Staging, Production, Archived) with approval workflows, giving you governance over what runs in production.

Containerising Models with Docker

MLflow can generate a Docker image directly from any logged model:

mlflow models build-docker   --model-uri "models:/AnomalyDetector/Production"   --name worksprout/anomaly-detector:v1.3.2

For more control, write a Dockerfile that exposes your model as a FastAPI service:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8080", "--workers", "2"]

Build multi-architecture images for mixed CPU/ARM fleets using docker buildx.

Kubernetes Deployment

A production model serving deployment on Kubernetes with resource limits and rolling update strategy:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: anomaly-detector
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: model
        image: worksprout/anomaly-detector:v1.3.2
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2"
            memory: "2Gi"
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

Automated Retraining Pipeline

Models degrade as data distribution shifts. Automate retraining with a pipeline that triggers on data volume or performance thresholds:

  1. Monitor model performance metrics in production (accuracy, F1, prediction drift)
  2. When performance drops below threshold, trigger a training job (Kubernetes Job or Argo Workflow)
  3. New model is logged to MLflow and evaluated against held-out validation data
  4. If evaluation passes, promote to Staging; run integration tests
  5. On pass, promote to Production and trigger a rolling deployment

Model Monitoring with Prometheus and Grafana

Instrument your model serving code to emit Prometheus metrics:

from prometheus_client import Counter, Histogram, start_http_server

prediction_counter = Counter("predictions_total", "Total predictions", ["outcome"])
latency_histogram = Histogram("prediction_latency_seconds", "Inference latency")

@latency_histogram.time()
def predict(features):
    result = model.predict(features)
    prediction_counter.labels(outcome=str(result)).inc()
    return result

Build a Grafana dashboard that tracks prediction volume, latency percentiles (p50, p95, p99), and model accuracy against ground-truth labels collected asynchronously.

Conclusion

MLOps is not a luxury for large teams — it is the engineering discipline that turns a prototype model into a maintainable production system. MLflow, Docker, and Kubernetes provide all the primitives needed to build a complete pipeline. Invest in it before your first production deployment, not after your first production incident.

Share

Worksprout Team

The Worksprout engineering team specialises in embedded Linux, RDK-B broadband platforms, edge AI, and robotics systems. Based in Rajshahi, Bangladesh, we design and deploy production embedded intelligence for clients across South Asia and beyond.

Related Posts

Continue reading — handpicked articles you might enjoy