Partner webinar: Modern research data engineering in the cloud

In this second in the series of cloud webinars, Microsoft and LLPA continue to examine Data Engineering in the Cloud.

The second session moves from concepts into practice, focusing on the concrete engineering of research data pipelines. It covers the full journey from raw data ingestion through transformation, analysis, and handling of unstructured content — demonstrating how cloud-native tools support reproducible, auditable, and collaborative research workflows at scale.

A particular focus is on accessibility: Dataflow Gen2 enables researchers without engineering backgrounds to build automated ingestion pipelines through a visual, no-code interface, while Apache Spark notebooks provide the scalable compute environment that computational researchers will recognise. MLflow experiment tracking is presented as the digital equivalent of a lab notebook.

The session also addresses unstructured research content — PDFs, reports, and instrument logs that exist outside tidy tables — and closes with real-time data ingestion for sensor-based and IoT research scenarios.

Learning Outcomes

Build or describe a cloud-native research data ingestion pipeline using Dataflow Gen2 and Data Factory
Explain how MLflow experiment tracking supports computational reproducibility
Identify scenarios where Azure Content Understanding adds value for unstructured research documents
Describe real-time data ingestion via Eventstream for sensor-based or continuous monitoring research
Outline an end-to-end cloud architecture for a research data workflow from your own domain

Designed for academic researchers and research IT professionals, this workshop examines how Azure supports emerging research paradigms and how Microsoft Fabric enables the ingestion, governance, and analysis of diverse research datasets. Learn how cloud‑based approaches support collaborative, transparent, and reproducible research practices.