Jan Jona Javoršek is head of the Network Infrastructure Centre at the Jožef Stefan Institute in Ljubljana, where he is involved in infrastructure development and advanced computing for research and academia. The Institute is connected to the Slovenian NREN, Arnes, since 1993. Jona was involved in the creation of a national supercomputing network consortium SLING and the building of the national EOSC community, took part in several AI and supercomputing endeavours, and has participated in initiatives and infrastructures such as WLCG, EGI, EuroHPC, EOSC and the establishment of the Slovenian EuroHPC node, Vega EuroHPC.
We heard you were behind the 130Gbps data increase from May to September 2022 in one of GÉANT’s backbone trunk links between Geneva and Milan and Arnes access ports to GÉANT. Can you tell us what happened and what were you working on?
In April, Vega EuroHPC, the new Slovenian national supercomputer and also the first EuroHPC peta-scale supercomputer, was deployed at IZUM, the Institute for Information Science in Maribor. Our design for Vega was innovative, since we wanted our supercomputer to have a large storage system, a pool of support machines for user project management and a broad connection to the GÉANT network to support large data tasks and work with large research collaborations that use GÉANT and understand federated access. Vega may well have been the first HPC machine to offer this kind of environment right from the start.
Prof. Andrej Filipčič and colleagues from our institute’s ATLAS group helped to convince the vendors to accept the design, but our team also helped to improve the architecture, optimise the CEPH storage and I/O subsystems and tune the system during the early test runs. So, ATLAS computing jobs were our natural testing load and became one of the first production allocations on the system. At the time, Vega without its accelerated NVIDIA partition nearly doubled the total computing power available to ATLAS, and its design with Arnes and GÉANT’s excellent NORDUnet connectivity allowed it to fully exploit that power. So, the 130 Gbps increase was Vega streaming the ATLAS data while also happily running our users’ jobs. (See David Cameron’s slides from the LHCOPN-LHCONE meeting #49 – CERN for details.)
And the network created by Arnes, GÉANT and NORDUnet in support of facilities like Vega EuroHPC as well as experiments like the ATLAS collaboration, has been described as a “rock-solid, highly reliable building block”. What is the role of the supercomputer in the wider HPC community?
All HPC systems are at the forefront of technology, and each comes with unique challenges. For us, the large but steady and well-understood ATLAS load helped us to work through the initial issues and to fine-tune the machine, the storage, our software and the network. But for ATLAS, it meant a fully refreshed run-2 dataset with all the associated simulations just before the restart of the LHC!
In the wider HPC community, Vega held a special place since it was not just a national system, but also the first EuroHPC machine. It is designed for tasks in a wide variety of domains, and it supports many workloads: computational chemistry, computer vision, language models and genomics were among our early adopters. Our colleague Barbara Krašovec, who previously set up the HPC effort at Arnes and led the grid computing team, has been instrumental in establishing high standards at Vega and ensuring that user support means that we put an effort into scaling applications and pushing to get to the results in the time allotted. In this way, we are continuing a tradition that is part of our birthright: in Slovenia, supercomputing first took off in the 1990’s at the Jožef Stefan Institute, where the National Supercomputing Centre was established under direction of my predecessor Vladimir Alkalaj right next door to what later became Arnes. The same group was responsible for Slovenia’s extremely early web penetration which promoted academic networking and computer science in schools. So, it was natural to build a national supercomputing infrastructure with Arnes and to involve also IZUM and our universities and research institutions.
What does the collaboration between GÉANT, NORDUnet and Arnes actually mean in practice for the users on the ground?
From the point of view of the Jožef Stefan Institute and our researchers, the close cooperation with Arnes and NORDUnet, as well as the access to initiatives and infrastructures within GÉANT, have become an indispensable a part of our working environment. The ability to rely on them when working with our colleagues in large collaborations and across borders is a prerequisite for participating in key scientific challenges we face today. But the ideas of collaboration, federation and interdisciplinary research that GÉANT is built on, and that universities and research institutions understand, are a cornerstone of our ability to address such challenges. The collaboration with Arnes and NORDUnet is a prime example: building on efforts within the EGEE and EGI initiatives, as well as regional and national efforts, GÉANT enabled Arnes and NORDUnet to help us participate in a pan-Nordic effort to provide a regional Tier-2 storage facility for ATLAS. The expansion to Slovenia and other countries, made possible by this collaboration, was a great demonstration of the versatility and power of the GÉANT infrastructure. Arnes and NORDUnet worked directly with us as their users. It was not only a success, but also a proving ground for dedicated scientific VLAN usage in the context of big data pipelining to computer centres. It opened the door to a new approach where our users can relay to use multiple 100Gbps lines for data transfer between storage and computing in multinational research collaborations or within national networks.
What’s next for the Slovenian supercomputer?
Vega was built as part of our national HPC RIVR project. The aim was to upgrade regional and national high-performance computing capacities, and it’s been a great success, not only with Vega, but also with HPC Maister at the University of Maribor, HPC RIVR project coordinator. The general interest has prompted Arnes to upgrade its HPC facility so that we have a modern system that is also available for training and education. But all this interest means that all our machines, not just Vega, are in great demand, while our staff is also busy working on projects to improve software and security, working on scalability, expanding federation support for HPC centres and large data repositories etc.
I have to mention MAX (MAterials design at the eXascale), the EuroHPC centre of excellence, in which our group is directly involved and which allows us first-hand experience of running advanced software on different machines and architectures. We are also preparing the infrastructure for a successor to Vega, but at the moment we are spending more time on our work within the Leonardo consortium, supporting HPC Leonardo, one of the most powerful machines in the world.
In the long term, however, our efforts to expand the national network computing federation are probably more important. Based on our Arnes AAI federation and the initiatives of eduGAIN, eID and AARC, we are working on building a federation with EuroHPC systems. Based on our experience with GÉANT and the Nordic federations we hope to enable a similar collaboration in supercomputing and large data repositories. But we need to address new challenges, such as advanced software support, security, hyperconnectivity and communication with user communities – and for all of these issues GÉANT is key.
Personally, I believe that GÉANT could provide us the context in which to develop better, simpler tools for researchers and students. EuroHPC aims to build 1Tbps hyper-connected supercomputers, but we also need to build the user communities that will be able to make use of these new systems. So, in the end, what we need to address is the challenge of academic networking in every sense of the word.