Two graduate students at Aarhus University in Denmark have contributed to international research by developing tools for training neural networks on LUMI.
Words: Marie Charllotte Søbye, DeiC
Andreas Larsen Engholm and Jesper Strøm can’t help smiling when it’s mentioned during the interview that, as students, they’ve already made a significant contribution to research. Nevertheless, it’s true because the back-end tools these two young men have developed to train selected sleep scoring models on the LUMI supercomputer will be freely available on GitLab in a user-friendly form. These tools will make it easier for future researchers to load more sleep data. More data leads to better sleep scoring models and more accurate interpretations of sleep data because neural networks become increasingly skilled at automatically reading sleep stages correctly.
Machine Learning with Big Data in Sleep Research
In the field of sleep scoring a ‘gold standard’ exists, where a sleep expert, using a manual, determines the sleep stage of the individual sleeper every 30 seconds throughout a night’s recording. This is a task that calls heavily for automation. Can we create an analysis model that replicates what a sleep expert would have answered? The task was to train neural networks to perform sleep scoring based on 20,000 PSG (Polysomnography) recordings to see the impact of working with such a large dataset. It was all about how well they could train the neural networks.
The Work Begins: Normalising 21 Datasets Takes Time
A significant part of the work in this project involved programming the back end to be able to load 20,000 nights’ worth of data (which was the sum of the 21 datasets) in a sensible way.
“The major work of normalising all the data for our models was actually what took the longest time, and that pre-processing pipeline is now accessible to other researchers and students, making it much easier to load dataset number 22. We emphasised finding a sustainable, scalable solution that could be used by others in the future,” Andreas Engholm says.
Without LUMI, we probably would have abandoned the project
The project utilised a total of 3500 GPU hours. If a single GPU had done the work, it would have taken 145 days, longer than the entire 4-month thesis period.
“In reality, we probably would have abandoned the project if we didn’t have access to LUMI. We would have had to move data back and forth because there wasn’t enough space, making it very inconvenient,” Jesper Strøm explains.
Fact Box
- When: April to June 2023
- Allocation: 5000 Terabyte hours, 3500 GPU hours on LUMI via DeiC’s “Sandbox”
- Solution: Software designed to run on parallel GPU nodes and temporary storage of up to 50 TB of data
- Student: Andreas Larsen Engholm, M.Sc. Computer Engineering, AU
- Student: Jesper Strøm, M.Sc. Computer Engineering, AU
- Advisor: Kaare Mikkelsen, Assistant Professor, Biomedical Technology, Department of Electrical and Computer Engineering, AU
Resources
- LUMI supercomputer: https://www.lumi-supercomputer.eu
- Apply for resources on LUMI: https://lumi-supercomputer.eu/
- HPC/LUMI Sandbox:https://www.deic.dk/en/Supercomputing/Instructions-and-Guides/Access-to-HPC-Sandbox
- SLURM Learning:https://www.deic.dk/en/news/2022-11-21/virtual-slurm-learning-environment-ready
- Cotainr for LUMI: https://www.deic.dk/en/news/2023-9-20/cotainr-tool-should-make-it
- GitLab tools developed for pre-processing of sleep data on LUMI: https://gitlab.au.dk/tech_ear-eeg/common-sleep-data-pipeline
This article is featured on CONNECT 44, the latest issue of the GÉANT CONNECT Magazine!
Read or download the full magazine here