This article is the first in a three-part series. Sign up for the Library Connect RSS to be alerted immediately when parts 2 (bottlenecks) and 3 (ways forward) are published.Introduction
Research data has always been at the core of much scientific research, though the primary conduit of scientific communication has been the peer-reviewed journal article. The article summarizes, synthesizes and interprets the raw data; places the data in the context of theory and hypotheses and mechanisms; and provides an interpretation of the data. However, in its current form, the article alone does not provide sufficient details of the data to facilitate integration within larger data contexts, or to allow for reconstruction of the experiment or alternative analyses, syntheses or interpretations.
The era of Big Data launched with advances in technology power and analytic software (1) and propelled a fast-growing trend, resulting in great demand for open data programs (2) and influential studies highlighting the problems and challenges with the current informal data practices (3,4). Many have argued that the value of the journal article will decrease (4) as the value of available research data increases over the next few years, and recommendations abound for what needs to be done (3,5,6,7,8).
Reasons for the low participation by researchers include a fear of being scooped and a sense of a lack of rewards for storing and sharing data.
As a result, there is increased pressure in the research community to make research data (raw and summarized) available, both linked to publications and directly into open repositories, for preservation and use by other researchers. At present, in most scientific disciplines (genomics, astronomy, physics are notable exceptions), little research data are made available to other scientists. Reasons for the low participation by researchers include a fear of being scooped and a sense of a lack of rewards for storing and sharing data. In the big picture, many researchers feel they do not currently have proper incentives for sharing their research data compared with the long-term, career-related incentives (i.e., tenure) of having articles published. In the day-to-day picture, many fear it being a time sink and are not clear on what is required from funding body mandates. Presently many systems and tools are in place to store research data in domain-specific, institutional, local and global repositories. However, no coordinated set of practices or even instructions exist to enable the majority of researchers to incorporate effective modes of research data management into their workflow.
Funding agencies are increasingly concerned with improving the reproducibility of research and allowing the public to hold scientists accountable for the results of their experiments. They are implementing policy statements to improve data storage, curation and sharing (9,10). It is not clear yet where the burden for compliance will ultimately land. At many research institutions, libraries, IT departments, and offices of research are increasingly preparing to meet that obligation, on behalf of and in collaboration with the institution’s scientists, engineers and scholars.
The goal of this article is to sketch a view of the various aspects involved in managing, describing, preserving and making research data available and accessible to appropriate audiences, and to propose a series of projects to tackle issues preventing their effective implementation. After sharing our views of the current state of research data management inside institutions, we propose pilot projects with a number of institutions to explore how to provide research data management designed for the needs of each specific institution. Each engagement will be unique, but together these projects can paint a landscape of needs and solutions. Each party can reference this landscape in determining how best to contribute value to the most effective and efficient solution, and how to jointly move forward.
A. Research data management in institutions: Stakeholders and information flows
Figure 1: Overview of the parties involved in RDM within an institution and the research data information flow among parties
Source: Image created by Victor Henning and Anita de Waard, © 2013 Elsevier.
Technical and policy changes herald a brave new world of linked research data, but different participants in the research data management workflow feel the pressure to bring about this change. Figure 1 sketches the various stakeholders within the institution and the flows of information involved. In particular:
- Researchers have to conform to reproducibility requirements for their data, and need safe, efficient and policy-compliant tools and processes for storing and annotating their research data.
- Data Repositories are asked to deliver more cost-effective ways to dramatically increase the volumes of data they curate and store. Though usually separately funded, these repositories are technically located inside an institution, and share physical and technical infrastructures with the campus.
- Libraries run the risk of being disintermediated in an open access world, and are looking for ways to use their skills and systems to connect research data to the repositories and knowledge management systems they curate.
- Offices of Research Administration are anticipating the need to track the full set of digital artifacts created inside the institution to ensure compliance with contractual data sharing policies.
Several types of information flows connect these parties:
- The data flow: As data is created by researchers it gets deposited and curated in one (or more) of a multitude of possible repositories: the institutional repository (IR), external (whether domain-specific, e.g., Protein Data Bank, PetDB, or domain-agnostic, e.g., DataDryad, Figshare) research databases, or cloud-based storage facilities such as Dropbox.
- The indexing flow: To allow cross-repository search, these data must be indexed.
- Usage reporting: For compliance and merit assessment purposes Research Offices are interested in usage and viewing data for the deposited research data.
1. Hey, T., Tansley, S. & Tolle, K. (Eds.). (2010). The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, WA: Microsoft Research.
5. The Royal Society (2012). Science as an Open Enterprise. The Royal Society Science Policy Centre Report 02/12. London: The Royal Society.
6. OECD (2007). Principles and Guidelines for Access to Research Data from Public Funding. Paris: OECD Publications.
7. National Academy of Sciences (2009). Ensuring the Integrity, Accessibility and Stewardship of Research Data in the Digital Age. Washington: National Academy of Sciences.