Interviews Magazine

The importance of open science and open data in the fight against COVID-19 and the role of EMBL-EBI in supporting international research collaboration

Words: Rolf Apweiler, Director, EMBL-EBI is interviewed by Karl Meyer, GÉANT

Rolf, thank you for your time. Could you explain the history and role of EMBL-EBI?

In the 1980s we began with the world’s first nucleotide sequence database: the EMBL Nucleotide Sequence Data Library at EMBL in  Heidelberg, Germany. The original goal was to establish a central database of DNA sequences. What began as a modest task of abstracting
information from scientific literature soon grew into a major database activity, with researchers submitting their data directly and an ever-increasing demand for highly-skilled biologists and informaticians to manage it all. High-profile genome projects brought more attention to the project, and the commercial sector began to see the relevance of public data.

The core philosophy, then as now, was that all data should be open and accessible as long as credit to the originators is given. In a sense  this was a precursor to the concepts of Open Science and FAIR data.

In 1992 plans were agreed to establish the EMBL-European Bioinformatics Institute (EMBL-EBI) and locate it on the Wellcome Trust Genome Campus in Hinxton, UK, where it would be in close proximity to the major sequencing efforts at the Wellcome Trust Sanger Institute. In September 1994, EMBL-EBI was firmly established in the UK.

Since then, the EMBL-EBI has played a major part in the bioinformatics revolution.

So, what is Bioinformatics?

Life-science experiments are generating a flood of data every day, which is good news for researchers but poses practical challenges. The amount of data produced is often doubling quicker than computer storage and processing power, and this rate seems to be increasing. Bioinformatics makes it possible to collect, store and add value to these data so that researchers in many fields can retrieve and analyse
them efficiently. EMBL-EBI is one of very few places in the world that has the capacity and expertise to fulfil this important task. We now provide the world’s most comprehensive range of molecular databases and offer an extensive user training programme. We also work with our international collaborators to share data, so that data uploaded in the USA or Asia is available to European users the next day, and
European data is also shared overnight. Last month we served over 1 billion data requests from 5 million IP addresses, so it is clear there is a huge demand for data.

Of course, Biomedical Research is really in the headlines at the moment. How is EMBL-EBI helping with research into COVID-19?

EBI has developed the COVID-19 Data Portal which provides access to a range of data covering virus sequences, gene expression data, proteins and protein structures. In total there are already over 13,000 data sets from around the world accessible to researchers. This portal was created using the systems we had already developed with our deposition databases and the knowledgebases which are built on them, so we were able to produce this very quickly. https://www.covid19dataportal.org/

In many ways what you’re doing in the Biomedical Sphere is very similar to the core concepts behind EOSC. Do you see EOSC as the future of Open Science?

Certainly, within EMBL-EBI, we have always seen academia and industry as equal partners and we can definitely recognise the benefits of openness in scientific research. However, with openness comes the need to manage access to data. For example, organisations are collecting gene sequences from patients with severe and mild symptoms of COVID-19 to attempt to identify any factors influencing
symptoms. This personal data, of course, needs secure federated access control and our work within ELIXIR (https://elixir-europe.org/) has shown us the benefit of federated systems to support open science.

Where do you see the future for EMBL-EBI and Open Science in general?

In the biomedical area I think we need to see more links to front-line healthcare to enable us to use their data to inform research. In fields such as oncology this is already happening, but with pandemics such as COVID-19, faster access to the data is needed.

In general, we’re seeing an acceleration of research in relevant  areas with five year timeframes now shrinking to closer to two years. Of
course, this is very exciting but will raise a lot of challenges for organisations around the world.

It certainly seems to be a very exciting time to be involved in Bioinformatics. Thank you very much!

For more information on EMBLEBI visit: https://www.embl.org/

The COVID-19 data portal is available at: https://www.covid19dataportal.org/