Full article originally published on Science|Business
Unlike many valuable resources, real-time data is both abundant and growing rapidly. But it also needs to be handled with great care.
That was one of the key takeaways from an online workshop produced by Science|Business’ Data Rules group, which explored what the rapid growth in real-time data means for artificial intelligence (AI). Real-time data is increasingly feeding machine learning systems that then adjust the algorithms they use to make decisions, such as which news item to display on your screen or which product to recommend.
“With AI, especially, you want to make sure that the data that you have is consistent, replicable and also valid,” noted Chris Atherton, senior research engagement officer at GÉANT, who described how data captured by the European Space Agency’s satellites is transmitted to researchers across the world via the GÉANT network. He explained that the images of earth taken by satellites are initially processed at three levels to correct for the atmospheric conditions at the time, the angle of the viewpoint and other variables, before being made more widely available for researchers and users to process further. The satellite data is also “validated against ground-based sources…in-situ data to make sure that it is actually giving you a reliable reading,” Chris added.
Depending on the orbit of the satellites and the equipment involved, the processing can take a few hours or a few days before it is made available to the wider public. One way to speed things up post publication is to place the pre-processed data into so-called data cubes, Atherton noted, which can then be integrated with AI systems. “You can send queries to the data cube itself rather than having to download the data directly to your own location to process it on your machine,” he explained.