Building on the authoritative scholarship of the past is a critical component of progress in academic study. How can researchers identify authoritative, trustworthy sources for their research?
CrossRef, the not-for-profit organization of publishers that makes reference linking in scholarly content possible, is creating tools to help researchers identify what content can be trusted. Two programs, CrossCheck Plagiarism Screening and the soon-to-bepiloted CrossMark program, address this need from different angles.
Comparing duplicate documents: The first step in plagiarism detection
CrossCheck, powered by iThenticate, protects scholarly authors from unauthorized copying and acts as a deterrent to those few authors submitting manuscripts that are not original. CrossCheck includes two major pieces:
- A database of published scholarly content against which publishers can check submissions, and
- A software system, created by the company behind Turnitin, that compares documents for similarity.
Since its launch in 2008, CrossCheck has grown to include more than 65 publishers, including Elsevier. The CrossCheck database now includes 24 million articles or other content items representing 40,000 journals, books and conference proceedings.
How does CrossCheck work?
First, the iThenticate crawler indexes published scholarly content from publishers’ websites. These articles are added to the CrossCheck database. CrossCheck publishers may display a “CrossCheck Deposited” logo on their content indicating that it is deposited. This logo serves as a deterrent to would-be plagiarists.
When an author submits a manuscript for publication, the publisher can then run it through iThenticate, which compares the document against the CrossCheck database and against content from other major data providers and documents on the openWeb. The publisher receives a report indicating what percentage of similarity to other documents has been detected and offering the option to view the fulltext of any matching documents.
CrossCheck alone cannot detect plagiarism, partially because plagiarism includes the concept of intent, which machines cannot reliably infer. Instead, CrossCheck allows publishers to efficiently screen submissions to identify manuscripts of concern. Participating publishers can then use their publication ethics guidelines and procedures to determine whether particular manuscripts raise concerns and whether subsequent investigations are necessary.
Identifying trustworthy research content: Ensuring research builds on authoritative scholarship of the past
Another problem facing scholars is the growing volume of literature, the dwindling reading time per article and the proliferation of multiple versions of scholarly content available on the Web. As information professionals well know, the major Internet search engines are often the first stop on a research quest. Search results may include uncorrected author preprints from an author’s website, an institutional repository, a government repository or a subject-specific archive. The same scholarly content may also be held in aggregated databases of articles as well as at publishers’ websites. How can researchers intelligently choose among these options and ensure that they are basing their research on accurate and current articles?
How will CrossMark work?
CrossRef is launching a pilot of CrossMark, which will provide researchers and librarians with information about the stewardship of a document. Users will see a CrossMark logo on the document or its abstract page.When they click on the logo, they will be taken to a screen where they can view metadata about that document from the publisher. This information will include information such as the NISO version of record status, the CrossRef DOI (the permanent URL) to the content, and an indication of its status (examples might be current, enhanced, corrected, retracted or withdrawn). CrossMark metadata might also include the information that a document has been deposited in CrossCheck.
CrossMark publishers may also choose to include publisher-specific metadata that is important to them and to their readers. That data might include information about the publisher’s peer-review process, organizations that funded the research, whether associated data has been deposited in an approved repository or other critical information.
The CrossMark pilot is launching in early 2010 with a small number of participants. The pilot will demonstrate how the linked logo described above would work in practice. CrossMark will work for material whether it is available by subscription or freely through open access. In both cases, researchers have a need to identify the most up-to-date versions of particular articles or other types of content.
Assuring the trustworthiness of the scholarly record
An important part of CrossRef’s mission is enabling easy identification and use of trustworthy electronic content. CrossCheck and CrossMark are both intended to help researchers and librarians easily identify and use trustworthy electronic content. We look forward to working with librarians in spreading the word about these important initiatives.