Handling Security Incidents in eduGAIN

The Duplicate-Identifier-Attribute-Issue And What to Learn From It

Author: Lukas Hämmerle, SWITCH

eduGAIN interconnects identity federations around the world and is used by 40 federations to enable more than 2,000 identity providers and over 1,400 service providers to securely enable authentication and authorisation decisions. As a fully distributed system, what are the challenges in handling large scale incidents?

On 17 November 2016 an email from ORCID to the REFEDS mailing list put federation operators around the world in alert mode. What followed in the coming days was one of the first major incidents since eduGAIN started on April 27th 2011. This blog post summarises the events and also serves as high-level debriefing. The content of this blog article was written with contributions from and approved by the involved federation operators (InCommon, SIR, SURFconext) as well as ORCID.

What is ORCID and what is its relation to eduGAIN?

ORCID is a non-profit organisation that provides researchers with a unique digital identifier that can be used to associate their work with their ORCID identity. More than 2.8 million users globally have an ORCID account already. Since May 2016 ORCID also allows linking an organisational identity to an ORCID identity, which has benefits for the ORCID user as well as for ORCID. The ORCID linking service is thus one of more than 1400 eduGAIN service providers (SPs) and has users from 577 identity providers (around 1/4 of IdPs in eduGAIN). ORCID identifies users by an identifier attribute, either the user’s presumably unique eduPersonTargetedID or the SAML persistent NameID attribute, or the eduPersonUniqueID.

What was the issue and how was it discovered?

During a mid November workshop at a Spanish university, PhD students were given an overview about ORCID. They were shown how to link their federated institutional account to their ORCID accounts. When the first users started to link their university identity to their newly created ORCID identity, they noticed that they were logged in on the ORCID platform as another user. They could not only see but potentially also modify this other ORCID user’s identity data. This is a situation that of course never should happen. The ORCID support learned about this issue by email from at least one user who reported this strange behaviour.

What was ORCID’s reaction?

ORCID staff first tried to understand what happened and why some users gained the identity of a different person. They found out that only users from one organisation were affected. The cause was soon identified. It seemed that users from a Spanish university all had the same eduPersonTargtedID attribute. This attribute should by specification contain a unique value for each user of an identity provider. While investigating this issue, ORCID also discovered through another user complaint that another SAML identity provider was affected by a similar issue. In the case of this second identity provider, which was published in eduGAIN via the US InCommon federation, it was the SAML Persistent Name ID that contained duplicate values for users from the same identity provider. Given the proximity of the two events, and the difficulty of discovering what other identity providers might be affected, ORCID disabled the service until further investigation and controls could be put in place.

What was the impact of going public?

Soon after making this issue public and announcing that federated login via eduGAIN was disabled on the ORCID service, many federation operators became aware of the issue for the first time. While the public announcement helped to raise awareness for this issue, it caused some parallel communication on different channels. In some cases this caused confusion among the involved parties about particular aspects of the issue, as overall coordination to resolve the issue was lacking. Ann Harding, activity leader of Trust and Identity developments in the EU-funded GÉANT project that created eduGAIN, therefore appointed the author of this blog post to coordinate efforts to investigate and resolve this issue. As a first step, a small group of representatives from ORCID, SIR, InCommon, SURFconext and eduGAIN was formed to get more details on the issue, share information and deal with its after effects.

What were the immediate measures taken to resolve this incident?

The SIR federation operator removed the metadata of the affected IdP from the eduGAIN metadata file. The InCommon IdP immediately disabled its integration with ORCID. In both cases the reaction time was less than 24 hours. These measures effectively stopped duplicate identifier attributes from spreading until further information could be gathered. Based on log files and statements from the two IdP operators, it could be determined that the InCommon IdP only released Persistent Name IDs to ORCID. Therefore, this was the only SP affected from this IdP’s point of view. As for the SIR IdP, more than just the ORCID SP could have received duplicate identifier values according to the configuration.

What was the wider impact? Were there after effects?

In case of the SIR IdP, logs indicated that users accessed 14 eduGAIN SPs in the two months before the incident was discovered. However, only 3 of them requested the eduPersonTargetedID attribute. One of these three SPs was ORCID. The other two SPs were informed about the issue and were asked by the Spanish federation operator to check if they saw signs of duplicate identifier attributes. According to one of these SPs it does not seem that this error caused them any issues. The other SP had three users from the Spanish IdP. All of them had, however, different eduPersonTargetedID values. Therefore, it is very likely that ORCID was the only service provider suffering from that issue.

Why were the affected IdPs not announced in public?

A few operators of service providers requested on the REFEDS list that the affected IdPs be made public. They wanted to verify themselves whether the issue might also have affected their services. However, in the case of the US IdP it was very plausible that ORCID was the only service that was affected by the duplicate PersistentIDs Name ID values it released. As for the Spanish IdP, more SPs could potentially have received duplicate eduPersonTargetedIDs. However, given that the Spanish SIR federation is a hub & spoke federation with a central hub that logs all transactions, logs showed which SPs actually could have received duplicate eduPersonTargetedIDs. Therefore, SIR as the federation operator could identify and notify these SPs individually. Furthermore, publicly identifying possibly vulnerable IdPs so soon after the incident could have led to further problems and have invited attackers to try and abuse the scenario.

What was the actual cause of the issue?

At first it was unclear whether this issue was caused by a malicious attacker (somebody impersonating another person by compromising an IdP and spoofing an identifier), a generic software problem in one of the SAML implementations, or a simple configuration error. It soon became clear that in all cases, configuration errors were responsible for the duplicate identifier attributes.
One of the affected IdPs is using IBM’s Tivoli Federated Identity Manager and the other IdP is using adAS SSO. Both are not very widely used in eduGAIN. In both cases a software upgrade and lack of SAML know-how triggered an unintentional misconfiguration. This misconfiguration then caused the SAML identity provider to generate the eduPersonTargetedID/SAML Persistent NameID values by hashing value, which was empty for all users. Therefore, all users of the two IdPs had the same eduPersonTargetedID/SAML Persistent NameID values. Testing such an error is non-trivial because at least two user accounts are needed to properly test this case.

Once the SAML implementation of the two affected IdPs was known, the federation operators also informed other known users of these two SAML identity provider implementations to check if the issue potentially could also affect them. In case of the Spanish federation, one additional affected IdP could be identified. In case of InCommon, another IdP could be identified.

What else was done in the context of this issue?

After identifying the actual root causes, developers of several SAML identity provider implementations were informed to ensure that their software would not generate non-unique identifier attribute values by hashing empty values. In the case of Shibboleth, the most widely used SAML identity provider in eduGAIN, the code already prevented situations like this, according to the developers. In the case of SimpleSAML PHP, some code has been added and a security advisory was published at https://simplesamlphp.org/security/201612-04 to achieve the same. However, it is clear that no software can prevent all configuration errors. Even though measures were or are being added to prevent some issues, it always will be possible to misconfigure an IdP.

And what about ORCID?

ORCID temporarily disabled federated login immediately after making the issue public. Soon after federated login was reactivated with additional controls and logging to better help detect this type of error. The linking ability was limited to identity providers that could successfully prove that they are not generating non-unique identifier attributes. This is checked with a test where at least two users need to access the ORCID service with different identifier values.

Were there other IdPs identified which released duplicate identifiers?

Yes, three weeks after going public with this issue, ORCID informed InCommon about another IdP that failed their test and released duplicate persistent SAML NameID values. This IdP was identified thanks to the above-mentioned linking checks that ORCID added to their service. After being contacted by InCommon, this third affected IdP disabled its integration with all eduGAIN SPs that requested a persistent nameID until the issue could be corrected. This happened within less than 6 hours. The cause in this third case also was a wrong identity provider configuration (SimpleSAML PHP). A full fix for this third IdP was applied and verified with ORCID’s help within less than two days. Log analysis showed that no SP other than ORCID was affected by the bad configuration in place.

What went well dealing with this issue?

Even though there were no formal rules or policies established at the time the incident occurred (neither of the IdPs nor the SPs claimed to support SIRTFI), the reaction time of the federation operators can be considered quite good. As a reference: The NIST 800-63 Electronic Authentication Guideline states that credentials tokens should be revoked within 72 hours after being notified that they are compromised. The affected IdPs were stopped from proliferating duplicate identifiers in less than a day. The Spanish and the US federation operators cooperated well to resolve this issue by providing log data and other information. They also individually were having debriefing meetings with the administrators of the affected IdPs and they were actively identifying other IdPs which potentially could have been affected.

What were the gaps and shortcomings?

Dealing with an incident of this kind was new for the community, also because eduGAIN in the past few years was spared incidents like this. In case federation operators would not respond so quickly or behave as well as the Spanish and the US one, it is unspecified what the eduGAIN community could do about this. Generally, it was unclear or undefined what the responsibilities were of the involved parties (SP, IdP, federation operators, eduGAIN operators) as this has not been consistently defined or written down as best practices in this level of detail.

For the federation operators it apparently was not that easy to get in touch with the right person at the identity providers to resolve the issue. The same was confirmed by ORCID staff. It would have helped if there was a published security contact for identity providers in the metadata. Even more difficult for ORCID was to determine relevant contacts at the federation operator.

What should be done in the future?

At an eduGAIN steering group meeting on 14 December, it was proposed to have a security contact (role-based email address) per federation. This could help getting in touch faster with security experts that best know how to deal with various incidents in the federation context.

GÉANT currently is working to establish an enhanced eduGAIN support team, which also will assist in case of eduGAIN-related incidents like the above. However, the role of this team will be primarily to coordinate the incident management on the eduGAIN/interfederation level similar to the coordination carried out by the author of this blog post. In this area, eduGAIN continues to depend on the cooperation of its member federations and the administrators of the entities in these federations as having the most detailed access to information and systems.

What can federation operators do?

There are a few things that federation operators can do in order to facilitate incident management:

Prominently publish up-to-date contact and helpdesk information for your federation. Because eduGAIN participants generally do not have direct relationships with each other, having proper contacts to escalate to greatly helps with dealing with issues like the above.
Get familiar with, support and advertise SIRTFI or at least publish security contacts at least for IdPs.
Recommend that organisations should use widely-used and well-documented SAML software if possible.
Provide good documentation and testing instructions for identity provider administrators to help them provide a good service.
Use proactive measures and tools to discover issues that could cause security errors, before they can manifest themselves in production.