Experiment Tracking

Business Value: Never lose an experiment again. Every training run, hyperparameter, metric, and artifact is automatically tracked and versioned — giving your team complete reproducibility across all ML work.

How It Works

The ML Platform includes built-in MLflow 3.x with per-workspace tracking servers. When you run ML code:

MLflow tracking URI is pre-configured in your environment
Experiments are organized by project and run
Metrics log in real-time and display in the MLflow UI
Artifacts store in high-performance storage (NFS or S3)
Models register to the model registry with versioning

Technical Highlights

Per-workspace isolation with dedicated MLflow instances
Auto-logging for PyTorch, TensorFlow, HuggingFace, scikit-learn, XGBoost
Model Registry with lifecycle stages (Staging, Production, Archived)
LLM tracing for capturing inputs, outputs, and latency
High-availability backed by PostgreSQL with artifact storage on NFS/S3
Full MLflow REST API for programmatic access

Auto-Logging Support

Framework	Auto-Logged
PyTorch	Loss, model architecture, optimizer config
TensorFlow/Keras	Metrics, model summary, callbacks
HuggingFace	Training args, metrics, model config
scikit-learn	Model params, metrics, artifacts

How It Works​

Technical Highlights​

Auto-Logging Support​

How It Works

Technical Highlights

Auto-Logging Support