It’s a problem facing librarians around the globe: How do you ensure someone entitled to use your content can access it whenever, and wherever, they need it? With the growth of mobile devices and remote working, the task of identifying legitimate users has become increasingly complex.
RA21 (Resource Access for the 21st Century) aims to find a solution. This joint initiative by STM, the International Association of Scientific, Technical and Medical Publishers, and NISO, the US-based National Information Standards Organization, is currently exploring alternatives to traditional access and identity management tools.
For this article, Library Connect Editor Colleen DeLory interviewed Todd Carpenter, Executive Director of NISO. Todd helped us to understand the rationale behind the project so we could present these questions and answers.
How does access work right now and why it is it a problem?
On campus: Every device connected to your institution’s computer network is assigned a number known as an IP (internet protocol) address. Sets of these addresses are passed on to a content provider whenever you subscribe to its content. When someone requests access, the provider checks to see if it recognizes the IP address. If it does, the request is approved. However, this isn’t always foolproof, as Todd explains: “You could be a student on campus on your iPad. You could suddenly move from using the campus wifi to a cellular provider because the wifi signal drops. As a result, the content provider no longer recognizes your IP address and access is broken.”
Off campus: This is where it gets really tricky — users aren’t using an on-campus device, so they don’t have a recognized IP address. To get around this problem, many institutions use a proxy server, which checks the user’s credentials on the content provider’s behalf. If the proxy server can confirm you are from University X, then it will display an IP address from University X to the content provider and your access request will be approved. However, the proxy server can only recognize that you are from University X if you are clicking on a “proxy-enabled” URL, i.e., a link that contains a string of characters unique to your institution. This works fine if you click on a link on your institutional website, as these are usually proxy-enabled. But, what if you click on a Google search result, or a link someone has emailed you? This is known as the “where are you from” problem, or WAYF for short.
According to Todd, proxy servers bring other disadvantages: “If someone takes an action that the server considers threatening, such as downloading an unusually large volume of content, or using compromised credentials, the content provider can’t identify that individual, so it removes access for everyone by shutting down the proxy server’s access until the problem is solved. Users are often confused about how or why they need to use a proxy server for off-campus access. Another big concern for the librarian community is that users who are unable to access content via their institutions’ systems often turn to other channels instead. As a result, libraries aren’t serving the needs of their patrons, and they can’t accurately track usage of their own content or even assess the size of the problem.”
Where does RA21 come in?
In a nutshell, RA21 wants to provide users with a simple, seamless, customizable and secure way to access scholarly information resources. Most of us log into multiple websites in our daily lives, and publishers and libraries should aim to make the login systems as similar to those experiences as possible. Signing onto library resources should be as easy as logging into Facebook or Google. Todd and his colleagues believe this means finding an alternative to IP-based authentication.
They have chosen to build upon the widespread adoption of a federated authentication systems by institutions. In order to pool resources, many institutions use identity federations to manage the exchange of credentials for their thousands of users. These federations provide a single point of contact for the hundreds of publishers that each one deals with, which allows a user to have a single sign-on (or login) that accesses those hundreds of services — instead of having separate user names and passwords for each of the sites you log into, a single ID and password are created.
The federated authentication system RA21 is proposing is based on an open standard called SAML (security assertion mark-up language) that has existed for around 20 years.
Todd explains: “Essentially SAML is a structure for describing how information is exchanged about the rights that allow someone to access something. It’s a simple messaging protocol. We have chosen it because it has the ability to protect privacy and allow the user and their institution to decide what personal information, if any, is released to the content provider.
“SAML works well when there is a one-on-one relationship in terms of access, i.e., one page and access to one service. The problem in the library context is that users are coming from multiple locations — for example, Google, citation links or CrossRef — and they can go anywhere that offers published content. That many-to-many relationship is where the system breaks down. This is the ‘where are you from’ or WAYF problem, and it’s the problem we are trying to help the user navigate. We are asking users to provide their institution’s identity provider, which is something they probably don’t know, not their institution, which they do know.”
RA21 is currently running three pilots. The first focuses on the pharmaceutical industry and its corporate libraries. Todd says: “The RA21 project actually started with that community. In 2016, a group of pharmaceutical librarians approached STM about the problems they were having with authenticated access, as they have patrons and staff all over the world. This pilot explores creating an identity provider specifically for their industry.”
The other two pilots are exploring different ways in which a user’s institutional login credentials can be used to solve the WAYF problem. Todd explains: “It is looking at how information can be stored, both in the network (better information about the institutions and what domain names might they be using) and on the user’s computer. For example, ‘I’ve used MIT in the past, therefore take me to the MIT login page.’ It’s like a cookie, but not using that technology.”
The third pilot is a web-based system that stores information in the cloud about institutions and the identity providers they use, as well as information about devices the patron has used in the past, while also using a cloud-based system to pass people from one system to the next.
Todd says: “We have another group looking at how we can improve the user experience on the pilots and with single sign-on systems. So, while there is a technological component to RA21, we also need user interface design guidance to navigate people through these systems.”
What RA21 will not do is build a specific technical solution or an industry-wide authentication platform; its goal is to use these pilots to test various alternatives and then make a recommendation to the industry. And this recommendation will not take the form of a single portal, says Todd. “If you are trying to get access to Elsevier ScienceDirect, for example, you will still need to log in via your institution. The goal is that you won’t have to log in 25 times, just once and the system will know and retain your credentials. It will work in a similar way to the apps on your phone or when logging into Facebook; as long as you don’t delete the app or reset your browser, the system will know who you are. Every once in a while — say, if you update the app — the connection is broken and you have to log in again. That said, those settings will be controlled by the institution. Some institutions, e.g., the Department of Defense, might require you to log in every half hour. Other institutions might set up an access control window of maybe a semester.”
What will be the time investment for librarians and how will they benefit?
According to Todd, most institutions already have SAML-based access control for other systems or services, such as Shibboleth, Dropbox or OpenAthens, just not for access to library materials. “Institutional IT departments are probably very familiar with it. One of the challenges is libraries will need to improve communications between the library and the institution’s IT department.”
He adds: “Is RA21 going to take a lot of library resources? Probably not. Will it take time and cultural shifts? Yes. Historically, IP authentication has been the best way the library has avoided talking to the IT department.”
On the plus side, the RA21 website lists a series of benefits for librarians, including:
- No more time-consuming monitoring and updating of IP address information to multiple resource providers
- Reduced barriers to off-site access for users, allowing you to maximize the use of the resources your library has purchased
- Improved granular reporting of usage to libraries, while allowing you to protect the identities of your users
- The ability to work with publishers to easily and more quickly identify instances of illegal or fraudulent activity and undertake targeted resolution
- Less subject to network perimeter intrusion and man-in-the-middle attacks on your IP addresses
What does it mean for the walk-in user at a library?
This will be up to the library, which may choose to use smart cards, one-time access codes or certificates installed on library workstations, or continue to use the IP address authorization for on-site usage. Libraries may also consider transitioning to a guest account service.
What about data privacy?
According to the RA21 website, the SAML technology used in the pilots has built-in mechanisms for preserving privacy, and users will have control over what information they share.
Existing SAML identifiers are typically unique and persistent, but opaque, so content providers can personalize services for users without knowing the actual identity of the individual.
“Most of RA21 technology is really flexible in terms of control by the institution,” says Todd. “Essentially, you are passing a token to the publisher that says this person has access. You can pass as much information with that token as you want. You could say, ‘Here is Todd Carpenter, he was born in July and he’s a PhD candidate in biochemistry.’ Or you could just say, ‘Here’s a token, let this person in.’ As an institution, this allows you to create usage profiles, for example, if you want to know how the faculty are using information resources or how freshmen use systems differently than seniors. You could include information about the user’s status on the token, but not their name.”
What is the timeline for implementation?
Todd explains: “We are working on the pilots right now and hope to have the technology done by the end of the first quarter. We will do user testing and make a recommendation towards the third quarter (late summer). Then we’ll move into more of an adoption phase. If it’s going to be successful, you need hundreds and hundreds of publishers to set it up, thousands of libraries to start using it, and millions of users understanding what needs to happen. This is going to be multi-year rollout. Once we have the standard in place, it will probably go through 12 to 18 months tweaking and improvement and then five to 10 years of rollout.
You can find out more about the pilots and sign up to keep in touch with developments on the RA21 website.
IP (internet protocol) address is a numerical label assigned to each device connected to a computer network on the Internet. IP addresses identify the device on the network and provide a communication location for interacting with the network. IP addresses are usually written and displayed in human-readable notations, in sets of either four or six digits, such as 126.96.36.199.
Proxy server is a computer system or application that acts as an intermediary for requests from clients (devices requesting information) seeking resources from other servers, such as a file, a connection, a webpage or other resource. A client device connects to the proxy server, requesting some service, and the proxy server evaluates the request, validates it and then manages the interaction. In libraries, proxy servers, such as EZ Proxy, are used to provide validated access to subscribed resources.
WAYF (where are you from) services are used for online interactions of identity management, with the purpose of guiding a user to his/her identity provider. A WAYF service presents the user a list of identity providers to whom identity credentials can be sent and redirects the user's web browser to the selected identity provider and then back to the subscribed content.
Federated authentication system provides a single access point to multiple systems across different organizations that provide access control via a shared infrastructure. These systems gather identity data and then validate requests for access based on information about the user and the rights that an organization has to access materials.
SAML (security assertion markup language) is an open standard for exchanging authentication and authorization data between parties — in particular, between an identity provider (e.g., an identity federation, a library or institution) and a service provider (e.g., a publisher, software provider or website).
CrossRef is a digital object identifier (DOI) registration agency launched in early 2000 as a cooperative effort among publishers to enable persistent cross-publisher citation linking in online academic journals. CrossRef provides a registration and resolution system for identification of scholarly content.
Shibboleth is a single sign-on (login) system for computer networks and the internet. It allows people to sign in using just one identity to various systems run by federations of different organizations or institutions. It is one application of SAML technology for providing federated access to content among members of the consortium. The project is managed by the Shibboleth Consortium (www.shibboleth.net).
OpenAthens develops and supports identity and access management software for institutions and provides access to more than 2 million end users of library resources worldwide. OpenAthens is part of Eduserv (www.eduserv.org.uk/), a UK-based nonprofit organization.