TECHNICAL BLOG
Deep Dives for Engineers
Detailed technical articles covering the real problems we solve in embedded systems, AI, and robotics engineering.
Detailed technical articles covering the real problems we solve in embedded systems, AI, and robotics engineering.
How to build a production MLOps pipeline — experiment tracking with MLflow, containerised model serving with Docker, and automated deployment on Kubernetes with rolling updates.
Most machine learning projects reach a functional model and then stall. The model exists in a notebook, or at best a Python script, on a data scientist's laptop. Moving it to production — with reproducible training, versioned model artefacts, automated retraining, safe deployment, and ongoing monitoring — is the MLOps problem. The tools have matured significantly: MLflow, Docker, and Kubernetes form a coherent stack that addresses every layer of this challenge.
MLflow Tracking records parameters, metrics, and artefacts for each training run. Set it up with a shared tracking server so the entire team shares visibility:
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
mlflow.set_tracking_uri("http://mlflow.worksprout.internal:5000")
mlflow.set_experiment("anomaly-detection-v2")
with mlflow.start_run():
mlflow.log_params({"n_estimators": 200, "max_depth": 12, "min_samples_leaf": 3})
model = RandomForestClassifier(n_estimators=200, max_depth=12)
model.fit(X_train, y_train)
metrics = evaluate(model, X_test, y_test)
mlflow.log_metrics(metrics)
mlflow.sklearn.log_model(model, "model", registered_model_name="AnomalyDetector")
MLflow's Model Registry provides stage transitions (Staging, Production, Archived) with approval workflows, giving you governance over what runs in production.
MLflow can generate a Docker image directly from any logged model:
mlflow models build-docker --model-uri "models:/AnomalyDetector/Production" --name worksprout/anomaly-detector:v1.3.2
For more control, write a Dockerfile that exposes your model as a FastAPI service:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8080", "--workers", "2"]
Build multi-architecture images for mixed CPU/ARM fleets using docker buildx.
A production model serving deployment on Kubernetes with resource limits and rolling update strategy:
apiVersion: apps/v1
kind: Deployment
metadata:
name: anomaly-detector
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
containers:
- name: model
image: worksprout/anomaly-detector:v1.3.2
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
Models degrade as data distribution shifts. Automate retraining with a pipeline that triggers on data volume or performance thresholds:
Instrument your model serving code to emit Prometheus metrics:
from prometheus_client import Counter, Histogram, start_http_server
prediction_counter = Counter("predictions_total", "Total predictions", ["outcome"])
latency_histogram = Histogram("prediction_latency_seconds", "Inference latency")
@latency_histogram.time()
def predict(features):
result = model.predict(features)
prediction_counter.labels(outcome=str(result)).inc()
return result
Build a Grafana dashboard that tracks prediction volume, latency percentiles (p50, p95, p99), and model accuracy against ground-truth labels collected asynchronously.
MLOps is not a luxury for large teams — it is the engineering discipline that turns a prototype model into a maintainable production system. MLflow, Docker, and Kubernetes provide all the primitives needed to build a complete pipeline. Invest in it before your first production deployment, not after your first production incident.
Continue reading — handpicked articles you might enjoy