Training deep neural networks is fundamentally an optimization problem. We start with a randomly initialized model and iteratively adjust its parameters to minimize a loss function. The choice of optimization algorithm can dramatically affect both training speed and final model performance.