A handbook to help research network users make an informed choice of public cloud services and avoid nasty surprises
Words: Claudia Battista, Director of GARR
In the world of research networks, the design, implementation, and management of connectivity or above-the-network services have always been planned with users. One of the main purposes of research networks is to drive technological evolution while maintaining transparency and full control of the network, and the capacity to adapt to specific user requirements.
However, the offer of ICT services by commercial providers has dramatically grown over the years and today the adoption of applications and information services on public clouds is a thing even in the scientific, academic, and cultural environment. A national research network can only acknowledge this trend and try to adopt policies to offer the best support to connected organisations that choose this option.
Yet, it is important to underline that this choice has important consequences and calls for attention to some technical, functional, and strategic aspects. Moving one or more services from the research domain to a public cloud can immediately impact performances, as well as our capacity to control our data and applications; but most importantly in the medium-long term, it can have effects on the technical and economic capacity to switch to solutions more suitable to the user’s needs, and ultimately constitute a limitation to the freedom to do so.
We are talking about digital sovereignty, which for us at GARR means having full control over the tools, technologies, and conditions for carrying out scientific research, but also the knowledge of the context and the technical skills needed to choose appropriately.
GARR aims to raise awareness and spread this culture as much as possible and to provide some key elements for making conscious decisions and obtaining the best conditions when opting to use public cloud solutions. This is an even more important aspect today, with the kick-off of dozens of projects under the EU-funded National Plan for Recovery and Resilience, which will build new digital infrastructures or enhance existing ones, whose performance, portability of data and applications, and sustainability must be guaranteed.
Public cloud providers: dangerous liaisons?
In recent years, GARR, like other research networks, has received many solicitations coming, directly or indirectly, from cloud providers, aiming at the configuration of dedicated links. Often, however, the proposed solutions are not in the user’s best interest. Let’s see why.
Cloud providers try to “get as close as possible” to the user’s work environment (universities, research laboratories, etc.) by proposing cloud service delivery conditions that simulate a private cloud (e.g. Microsoft and AWS have “Expressroute” and “Direct Connect” in their portfolio): technical configurations are therefore proposed which allow the user to logically extend their local network domain within the provider’s data centre, e.g. through a level 2 network transport service, which should necessarily be provided by the NREN on its own infrastructure.
This request would significantly involve the NREN in the provision of the end-to-end service, but without any visibility within the cloud provider’s domains of competence, which are normally closed for commercial reasons. In this scenario, there is no way of setting boundaries between different management and control domains, therefore guaranteeing the service quality and end-to-end reliability from a functional and performance point of view.
Another caveat is that there are potentially hundreds of cloud providers, and satisfying the requests coming from them all would cause a scalability problem. On the other hand, choosing to follow only the big players’ specifications could lead to a polarisation of the market, a scenario that is far from desirable for the research community.
Asking the “right” questions
Due to these critical issues, GARR, like other research networks, believes that interconnection solutions such as those proposed by the two big players, which envisage the extension of a user’s domain on a public cloud outside their perimeter of action, should be avoided. This does not necessarily mean that users who need to turn to public cloud providers for their research activities should give them up: however, they must do so consciously and be able to correctly frame the scenario of these public or hybrid cloud proposals, by asking prospective suppliers, and internally, some key questions.
The first step is to know the architecture of the application that the user wishes to use in the cloud: is it a single-site or multi-site application? Which reliability mechanisms are foreseen? How much network capacity is required to access it, and which are the applications’ functional performances? Where are the geographical sites hosting the service located? Which routing policies towards the NREN are adopted by the specific cloud provider or their connectivity supplier, and are there possibilities and willingness to optimise or change them? Is the candidate provider willing to activate a direct peering with the NREN network?
A matter of (direct) peering
From a network point of view, direct peering represents an element that guarantees the performance and interoperability of services. There are various ways to implement it, the most natural being to establish a peering connection within a NAP (Neutral Access Point). Based on the needs and capacities involved, it is possible to either use the switching infrastructures collectively used by the different players who are present in the NAP or to create individual cross-connections. For example, for several years now GARR has established a direct 40Gbps peering with Google, given the relevance of traffic exchange between the research community and Google.
Including the presence in one or more national NAP among the mandatory requirements to qualify as a cloud provider for Research and University is therefore a good idea. If not possible, candidate providers could state their availability to activate peerings on European NAPs, where the pan-European backbone GÉANT is available with very high capacities. Also, a qualifying requirement should be, in our opinion, the bandwidth capacity the candidate provider has at the NAP.
Whichever the preferred solution and the bandwidth available, having appropriate routing policies or being available to change them accordingly should be a key requirement for any candidate provider of cloud services for the research community: it is not just a matter of ensuring performance, but also of protecting users from things that are out of our control, that can impact the services offered in a public or hybrid cloud. In particular, massive DDoS can overload the upstream providers’ links towards the global Internet, from which most attacks come. In the absence of suitable routing policies and dedicated peering, DDoS can thus seriously impact this kind of service even when they are not the object of the attack, and determine a decline in performance, if not the unavailability of the service or some of its functionalities.
Avoiding vendor and data lock-in
The second aspect on which we need to be clear about when we go for a public cloud solution is the data transfer model in case switching providers is in order. Due to contractual, economic, or technical reasons, as well as to the evolution of the organisation’s requirements, the need may arise to get a new provider for a service, but without losing our data or applications. Being in a free market, cloud providers don’t necessarily have interests in facilitating such switches: on the contrary, without carefully formulated agreements they may make life difficult for those users who want to leave them for another provider: it is, therefore, important to collect information beforehand about which guarantees are offered in this case.
Importing and exporting data to and from different public cloud providers is a cause of concern not only from an economic point of view. Big scientific collaborations are in the position to generate data in the order of hundreds of petabytes per year, but many providers charge a cost for migrating data outside their cloud, which could easily become an unbearable cost. Moreover, the existing links between competing public cloud providers are often insufficient to ensure optimal conditions for massive data transfers, nor it is expected that this will change soon, and providers will upgrade them. Besides the economic aspects, a predominant concern especially in case a user wants to switch providers without losing their data assets, there are other concerns connected to the efficiency and timeliness of such data transfers, but the key one is by far interoperability.
Interoperability and collaboration
The last point we would like to draw attention to is interoperability in the framework of collaborations among organisations using different cloud infrastructures. There are several international collaborations among research infrastructures or organisations, that could adopt different commercial providers for their services: in this case, ensuring that data and services are reciprocally accessible and usable is a key priority.
Under the pay-to-move-out model, these partnerships would be paying two cloud service providers at the same time to have their own data available. There are derogations in scenarios such as OCRE that could mitigate this problem from an economic point of view, but they are not sufficient to guarantee efficiency and interoperability, plus there is a real risk that these scenarios negatively affect the FAIRness of scientific data, limiting their accessibility by researchers.
The international aspect is key, because of the very nature of scientific endeavours, but also because the ecosystem of research networks is characterised by natively interoperable solutions in a transnational and intercontinental multi-domain environment. In addition to having greater control over the infrastructure, we, the research networks, have always agreed and applied common policies for routing, access, traffic segregation, and quality of service (QoS) applied to certain types of traffic (e.g. real-time). Among NRENs, technological development is always agreed upon and managed, so as to ensure the transparency and visibility of the network, but also to implement agreed changes when needed to optimise the performance of transnational applications and accommodate user needs.
In conclusion: even though resorting to a commercial cloud provider may look like a simple solution to our needs, all that glitters is not gold. If we are planning to use these solutions for our research needs, a careful and informed assessment is needed, also from the point of view of our international scientific collaborations, and this assessment must be made “by design”, to understand if the choices we are making are interoperable and compatible with the objectives we have set ourselves in our organisation and for our collaborations.