Researchers from CESNET and the Czech Technical University in Prague have released unique datasets of real network traffic. These were created within the research project “Flow-based Encrypted Traffic Analysis” (FETA), funded by the Ministry of the Interior of the Czech Republic. Publication of these datasets represents a major step forward, enabling cutting‑edge research on traffic analysis and cyberthreat detection. Two large datasets gained international recognition through articles published in Nature Scientific Data.
Monitoring network traffic is essential for keeping the internet reliable and secure. It enables early threat detection, prevents outages, and optimizes infrastructure usage. Developing AI models for identifying threats requires high‑quality, realistic datasets.
A year‑long view of network traffic
The dataset CESNET‑TLS‑Year22 captures an anonymized year of activity across the CESNET infrastructure. Such an extensive, publicly accessible dataset is globally unique. Long‑term collection on high‑speed networks is technically demanding and expensive, which is why most public datasets cover only a few days. That short span prevents realistic testing of algorithms and can lead to accuracy overestimation.
A year‑long dataset allows researchers to study data drift—changes in traffic over time that make models obsolete. As researcher Karel Hynek explains: “Machine learning models often rely on training data from the past. For example, if traffic behavior changes due to new attacks or services, detection accuracy can degrade.”

The largest dataset for anomaly detection and prediction
The CESNET‑TimeSeries24 dataset was created for research in network traffic forecasting and anomaly detection. It contains over 800,000 anonymized time series from real network traffic of computers, servers, and network devices inside the CESNET infrastructure. According to researcher Josef Koumar: “The dataset reflects real traffic, making not only development but also rigorous testing of algorithms for anomaly detection possible. Such anomalies may reveal malicious actions, configuration errors, or other operational issues.”

Tools for data‑driven research
Working with massive datasets is both complex and time‑consuming. Therefore, the team developed CESNET DataZoo and CESNET TS‑Zoo tools to allow easy access to full datasets or properly sampled subsets. They also support data processing, letting researchers focus on methodology rather than technical issues. Both the datasets and tools serve as valuable benchmarks for algorithm comparison and reproducible research.
Project results
The FETA project was carried out by CESNET, Czech Technical University in Prague, and Brno University of Technology. It produced public datasets, open‑source tools, innovative methods for dataset evaluation, and many research papers on advances in machine learning, detection, and encrypted traffic analysis. The overall goal was to move research from a lab into real‑world settings. This aim has been successfully achieved.
Open-source library for working with datasets: github.com/CESNET/cesnet-tszoo/tree/main and https://cesnet.github.io/cesnet-tszoo/

Read the full online magazine here







