C50 - Security Features Connect50

CESNET datasets are transforming cybersecurity research

Researchers from the CESNET Association and Faculty of Information Technology CTU Prague. Credits to CESNET.

Researchers from CESNET and the Czech Technical University in Prague have released unique datasets of real network traffic. These were created within the research project “Flow-based Encrypted Traffic Analysis” (FETA), funded by the Ministry of the Interior of the Czech Republic. Publication of these datasets represents a major step forward, enabling cutting‑edge research on traffic analysis and cyberthreat detection. Two large datasets gained international recognition through articles published in Nature Scientific Data. 

Monitoring network traffic is essential for keeping the internet reliable and secure. It enables early threat detection, prevents outages, and optimizes infrastructure usage. Developing AI models for identifying threats requires high‑quality, realistic datasets. 

A year‑long view of network traffic

The dataset CESNET‑TLS‑Year22 captures an anonymized year of activity across the CESNET infrastructure. Such an extensive, publicly accessible dataset is globally unique. Long‑term collection on high‑speed networks is technically demanding and expensive, which is why most public datasets cover only a few days. That short span prevents realistic testing of algorithms and can lead to accuracy overestimation. 

A year‑long dataset allows researchers to study data drift—changes in traffic over time that make models obsolete. As researcher Karel Hynek explains: “Machine learning models often rely on training data from the past. For example, if traffic behavior changes due to new attacks or services, detection accuracy can degrade.” 

Development of transferred data captured in the dataset. Credits to CESNET

The largest dataset for anomaly detection and prediction

The CESNET‑TimeSeries24 dataset was created for research in network traffic forecasting and anomaly detection. It contains over 800,000 anonymized time series from real network traffic of computers, servers, and network devices inside the CESNET infrastructure. According to researcher Josef Koumar: “The dataset reflects real traffic, making not only development but also rigorous testing of algorithms for anomaly detection possible. Such anomalies may reveal malicious actions, configuration errors, or other operational issues.” 

Types of anomalies in network traffic captured in the dataset. Credits to CESNET.

Tools for data‑driven research

Working with massive datasets is both complex and time‑consuming. Therefore, the team developed CESNET DataZoo and CESNET TS‑Zoo tools to allow easy access to full datasets or properly sampled subsets. They also support data processing, letting researchers focus on methodology rather than technical issues. Both the datasets and tools serve as valuable benchmarks for algorithm comparison and reproducible research. 

Project results

The FETA project was carried out by CESNET, Czech Technical University in Prague, and Brno University of Technology. It produced public datasets, open‑source tools, innovative methods for dataset evaluation, and many research papers on advances in machine learning, detection, and encrypted traffic analysis. The overall goal was to move research from a lab into real‑world settings.  This aim has been successfully achieved. 

Read more about the dataset

Open-source library for working with datasets: github.com/CESNET/cesnet-tszoo/tree/main and https://cesnet.github.io/cesnet-tszoo/ 


This article is featured on CONNECT50, the latest issue of the GÉANT CONNECT Magazine!

Read the full online magazine here

 

 

Skip to content