Hyperparameter Tuning in AWS SageMaker
Machine learning models have two kinds of parameters: weights learned from data, and hyperparameters set before training. Finding the right combination of the latter is what separates converging models from failing ones.
Executive Summary
Machine learning models have two kinds of parameters: weights learned from data, and hyperparameters set before training. Finding the right combination of the latter is what separates converging models from failing ones.
Every machine learning model has two distinct types of parameters:
- Weights — learned automatically from data during training
- Hyperparameters — set manually before training begins
Hyperparameter tuning is the systematic process of finding the optimal combination of these external settings. Poorly chosen hyperparameters can cause convergence failures, extreme overfitting, or complete model instability.
Four Search Strategies
1. Grid Search
Exhaustively tests every combination across a predefined discrete parameter grid.
Pros: Guaranteed to find the best point within the defined grid. Cons: Computationally expensive. With 5 parameters × 5 values each = 3,125 training runs.
2. Random Search
Samples random combinations within defined continuous parameter ranges rather than testing discrete points.
Pros: Often discovers high-quality solutions faster than grid search. Cons: No guarantee of optimality; results vary between runs.
3. Bayesian Optimization
AWS SageMaker's default tuning strategy. Builds a probabilistic model of the objective function from previous trial results and concentrates search near promising regions.
tuner = sagemaker.tuner.HyperparameterTuner(
estimator=estimator,
objective_metric_name="validation:accuracy",
hyperparameter_ranges=hyperparameter_ranges,
strategy="Bayesian",
max_jobs=20,
max_parallel_jobs=3
)
4. Hyperband
A resource-efficient early-stopping strategy. Starts many training runs in parallel, then progressively eliminates underperformers based on intermediate results.
Choosing a Strategy
| Strategy | Best For | Compute Cost |
|---|---|---|
| Grid Search | Small, discrete search spaces | High |
| Random Search | Large continuous spaces, quick exploration | Medium |
| Bayesian Optimization | Production tuning, limited budget | Low (efficient) |
| Hyperband | Large-scale experiments with early stopping | Very Low |
For most production ML workloads on AWS SageMaker, Bayesian Optimization delivers the best return on compute investment.
Key Takeaways
- Core Concept: machine-learning
- Difficulty: Intermediate/Advanced
- Author: Gökçe Akçıl (Senior AI/ML Engineer)