Authoritative researcher metadata in one place via VIVO

Published Mar 18, 2013 in the Newsletter Issue: Research Data Management — 2013

Paul Albert of Weill Cornell Medical College (WCMC) describes his institution’s experience in implementing VIVO. Albert joined WCMC in 2007 as Digital Services Librarian and is currently the Assistant Director of Research and Digital Services. As the VIVO product manager, he assigns and prioritizes tasks, acquires data, promotes VIVO usage, and communicates with stakeholders.

Library Connect: Would you tell us about VIVO?

Paul Albert: VIVO is an institution-driven, semantic tool for accessing authoritative data about faculty and researchers. VIVO is open source with its own ontology. When institutions map their data in a semantic way, it enables coordination without cooperation.

Data in VIVO is not necessarily authoritative, but often is ingested from authoritative internal sources such as human resources, grants, demographics, classes, and faculty affairs, or authoritative external sources such as PubMed or NIH RePORTER. When data is updated in those sources, VIVO presents the most current data.

How did WCMC become involved in VIVO?

The National Institutes of Health put out a request for proposal for a researcher profile system to reduce redundancies and improve cooperation among researchers. Working in collaboration with our New York City-based Clinical and Translational Science Center (CTSC), WCMC was asked to participate in this grant. Our existing researcher profile system was becoming a bit long in the tooth, so we enthusiastically agreed.

What role has the WCMC library played in implementing VIVO?

At WCMC, the library is part of Information Technologies & Services, though in the case of larger institutions, the library may have its own IT department. Libraries have a history of caring about data quality as well as the needs of their users. Because the library can play the role of visionary in how the data is used, it was natural in our case for us to take a leadership role.

What kind of human resources did it take to implement VIVO?

At any time, we average the equivalent of approximately three full-time positions working on the project. The most frequent contributors are a full-time programmer and a half-time product manager. Others involved include an information/data architect, principal investigator, project executive (someone who gets the landscape of all the source systems — he’d be able to answer “What system has a middle initial?” in a nuanced way), a server/maintenance person, and a user interface specialist.

What are your data sources?

That’s a question best answered visually. The flowchart in Figure 1 depicts our data sources.

Figure 1: Data Sources for the CTSC’s VIVO

In addition to data from WCMC we’re negotiating for access to data from other institutions associated with the CTSC such as Memorial Sloan-Kettering Cancer Center and Hospital for Special Surgery.

How has WCMC benefited from implementing VIVO?

Our top administrators expect to make decisions based on evidence. Because VIVO aggregates data from across institutional silos, we can easily access and parse that data. For example, we provide a monthly report of Weill Cornellauthored articles appearing in journals of a certain ranking, or determine which researchers have published the most in a given list of journals in the last five years.

We aim to provide a 100 percent complete, accurate, and automatically populated list of all a given researcher’s journal articles from PubMed. Toward that end, our end user feedback has been very positive thus far.

Also, we take data from VIVO, particularly publications, and ingest it into a custom Drupal installation, which allows users to perform advanced visualizations.

What lessons did you learn from implementing VIVO at WCMC?

As early as possible, think about how you might take advantage of the data in VIVO to address your institution’s real-world needs: generating biosketches, making grant recommendations, reporting on faculty publication patterns, automatically filling out forms. Often, your best customers don’t articulate what they want. There was very little clamouring for iPhones before they came out, because people didn’t know they wanted them! Using your VIVO data creates a feedback loop, which improves its accuracy, not to mention stokes enthusiasm among your key stakeholders.

It makes sense to have someone administering your project, such as a Chief Information Officer or Vice Chancellor of Research, who can ask for data and get a timely response. This is a person who can also allocate the staff resources necessary to tackle deep-rooted data problems. At WCMC, a given faculty member might have a phone number listed in up to seven sources!

You need to be able to sum up VIVO in a compelling elevator pitch such as “authoritative researcher metadata in one place.” Finally, I have found that strength in numbers can be important, e.g., walking into a meeting with three to four colleagues on my side of the table. Make sure to have your colleagues nod vigorously and say, “I totally agree” at key points in the presentation.