Skip to main content

Experiment Tracking

Business Value: Never lose an experiment again. Every training run, hyperparameter, metric, and artifact is automatically tracked and versioned — giving your team complete reproducibility across all ML work.

How It Works

The ML Platform includes built-in MLflow 3.x with per-workspace tracking servers. When you run ML code:

  • MLflow tracking URI is pre-configured in your environment
  • Experiments are organized by project and run
  • Metrics log in real-time and display in the MLflow UI
  • Artifacts store in high-performance storage (NFS or S3)
  • Models register to the model registry with versioning

Technical Highlights

  • Per-workspace isolation with dedicated MLflow instances
  • Auto-logging for PyTorch, TensorFlow, HuggingFace, scikit-learn, XGBoost
  • Model Registry with lifecycle stages (Staging, Production, Archived)
  • LLM tracing for capturing inputs, outputs, and latency
  • High-availability backed by PostgreSQL with artifact storage on NFS/S3
  • Full MLflow REST API for programmatic access

Auto-Logging Support

FrameworkAuto-Logged
PyTorchLoss, model architecture, optimizer config
TensorFlow/KerasMetrics, model summary, callbacks
HuggingFaceTraining args, metrics, model config
scikit-learnModel params, metrics, artifacts