Worksprout | Home- Blog Details

How to build a production MLOps pipeline — experiment tracking with MLflow, containerised model serving with Docker, and automated deployment on Kubernetes with rolling updates.

The MLOps Gap

Most machine learning projects reach a functional model and then stall. The model exists in a notebook, or at best a Python script, on a data scientist's laptop. Moving it to production — with reproducible training, versioned model artefacts, automated retraining, safe deployment, and ongoing monitoring — is the MLOps problem. The tools have matured significantly: MLflow, Docker, and Kubernetes form a coherent stack that addresses every layer of this challenge.

Experiment Tracking with MLflow

MLflow Tracking records parameters, metrics, and artefacts for each training run. Set it up with a shared tracking server so the entire team shares visibility:

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

mlflow.set_tracking_uri("http://mlflow.worksprout.internal:5000")
mlflow.set_experiment("anomaly-detection-v2")

with mlflow.start_run():
    mlflow.log_params({"n_estimators": 200, "max_depth": 12, "min_samples_leaf": 3})

    model = RandomForestClassifier(n_estimators=200, max_depth=12)
    model.fit(X_train, y_train)

    metrics = evaluate(model, X_test, y_test)
    mlflow.log_metrics(metrics)
    mlflow.sklearn.log_model(model, "model", registered_model_name="AnomalyDetector")

MLflow's Model Registry provides stage transitions (Staging, Production, Archived) with approval workflows, giving you governance over what runs in production.

Containerising Models with Docker

MLflow can generate a Docker image directly from any logged model:

mlflow models build-docker   --model-uri "models:/AnomalyDetector/Production"   --name worksprout/anomaly-detector:v1.3.2

For more control, write a Dockerfile that exposes your model as a FastAPI service:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8080", "--workers", "2"]

Build multi-architecture images for mixed CPU/ARM fleets using docker buildx.

Kubernetes Deployment

A production model serving deployment on Kubernetes with resource limits and rolling update strategy:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: anomaly-detector
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: model
        image: worksprout/anomaly-detector:v1.3.2
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2"
            memory: "2Gi"
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5

Automated Retraining Pipeline

Models degrade as data distribution shifts. Automate retraining with a pipeline that triggers on data volume or performance thresholds:

Monitor model performance metrics in production (accuracy, F1, prediction drift)
When performance drops below threshold, trigger a training job (Kubernetes Job or Argo Workflow)
New model is logged to MLflow and evaluated against held-out validation data
If evaluation passes, promote to Staging; run integration tests
On pass, promote to Production and trigger a rolling deployment

Model Monitoring with Prometheus and Grafana

Instrument your model serving code to emit Prometheus metrics:

from prometheus_client import Counter, Histogram, start_http_server

prediction_counter = Counter("predictions_total", "Total predictions", ["outcome"])
latency_histogram = Histogram("prediction_latency_seconds", "Inference latency")

@latency_histogram.time()
def predict(features):
    result = model.predict(features)
    prediction_counter.labels(outcome=str(result)).inc()
    return result

Build a Grafana dashboard that tracks prediction volume, latency percentiles (p50, p95, p99), and model accuracy against ground-truth labels collected asynchronously.

Conclusion

MLOps is not a luxury for large teams — it is the engineering discipline that turns a prototype model into a maintainable production system. MLflow, Docker, and Kubernetes provide all the primitives needed to build a complete pipeline. Invest in it before your first production deployment, not after your first production incident.

TECHNICAL BLOG

Deep Dives for Engineers

MLOps with MLflow, Docker, and Kubernetes: CI/CD for Machine Learning

The MLOps Gap

Experiment Tracking with MLflow

Containerising Models with Docker

Kubernetes Deployment

Automated Retraining Pipeline

Model Monitoring with Prometheus and Grafana

Conclusion

Worksprout Team

Related Posts

Anomaly Detection Systems: Catching Infrastructure Failures Before They Happen

Building Production AI Chatbots with LangChain, FastAPI, and RAG

Agentic AI: Designing Autonomous Multi-Agent Systems for Real-World Tasks