Parallel and Scalable Hyperparameter Optimization for Distributed Deep Learning Methods on High-Performance Computing Systems
Morris Reidel - Professor - Head of National Competence for HPC & AI.
Helmut Wolfram Neukirchen - Professor of Computer Science & Software Engineering at University Iceland.
Andreas Lintermann - Leader Simulation and Data Lab "Highly Scalable Fluids & Solids Engineering" and Coordinator of the European Center of Excellence in Exascale Computing CoE RAISE.
The design of Deep Learning (DL) models is a complex task, involving decisions on the general architecture of the model (e.g., the number of layers or neurons per layer) and on the optimizer-level (e.g. learning rate or batch size). These so-called hyperparameters significantly influence the performance of the final DL model and are, therefore, of great importance. However, optimizing these hyperparameters is a resource-intensive process due to the necessity of evaluating many combinations to identify the best-performing ones. This Ph.D. project aims at leveraging the power of High-Performance Computing (HPC) systems to perform efficient Hyperparameter Optimization (HPO) for DL models trained on massive datasets. On HPC systems equipped with a large number of Graphics Processing Units (GPUs), it becomes possible to not only evaluate multiple models with different hyperparameter combinations in parallel, but to also distributed the training of the models themselves to multiple GPUs. While state-of-the-art HPO methods, based on the concepts of early stopping or evolutionary optimization, have demonstrated significant reductions in runtime of the HPO process, their performance has yet to be evaluated at scale. Additionally, the use of novel compute processors, such as Quantum Annealers, to efficiently perform HPO is explored.The insights of the Ph.D. research are applied to and validated on use-cases from different scientific domains within the European Center of Excellence in Exascale Computing “Research on AI- and Simulation-Based Engineering at Exascale” (CoE RAISE) that connects data and compute-driven use cases from academia and industry with scalable machine learning methods
You can download Marcel's presentation here
Register for this event by filling out the form and we'll save you a chair!
For questions regarding the event, please reach out to us: