Get the latest articles and downloads sent to your inbox in a monthly newsletter.

Get the latest articles and downloads sent to your inbox in a monthly newsletter.

Introducing your researchers to text mining: 5 first steps

By Rachel Martin, Elsevier | Nov 29, 2016

Text mining image

Text mining makes it possible for researchers to analyze vast data sources, extract answers and develop new concepts more quickly and efficiently than ever before. However, a recent survey by the Publishing Research Consortium found that awareness of text mining techniques is still relatively low. Three out of four respondents had not used these techniques, and two-thirds of that group had not heard of text mining before the survey.


Librarians play a key role in raising awareness of text mining’s potential and helping to facilitate its use. Here are five key pointers to help introduce your researchers to text mining, including videos and information on how to use Elsevier's APIs. 



Text mining can be a powerful technique to help in your next research project. 


There are millions of articles and book chapters out there, packed with information that might help answer your research questions. So what exactly is text mining and how can it help? Text mining uses computerized tools to automatically search, extract and analyze large amounts of text from source documents. Similarly, data mining employs equivalent techniques to analyze databases and statistics. Together they are known as TDM. 


1. Introduce your researchers to text mining basics with this two-minute video that includes a simplified example of a research question.



Text mining is more than just a search process. 


Text mining uses natural language processing (NLP), a form of machine learning, to help you detect connections and patterns at a volume and speed that would be impossible to achieve manually. This is the true potential of text mining — analyzing all potential resources to gain new insights into potential relationships. 


2. Explain how the text mining process works with the help of this short video.



Text mining is still experimental and requires specialized tools and some programming knowledge.


While its potential is very exciting, text mining is still at the early stages, particularly for scientific, technical and medical (STM) content. Typical text mining tools are designed for general internet content such as news items or social media posts. This type of content is very different from discipline-specific content, such as STM, which will have its own jargon, abbreviations and uniquely formatted references. As a researcher in a specific discipline, you will probably need customized tools.


There are three main options for text mining tools, requiring varying levels of knowledge in programming, statistics and linguistics:


  • Off-the-shelf: If you have basic technological skills, a ready-made TDM workbench provides building blocks to put together a customized tool.
  • Build your own: With more advanced programming skills, you can create your own tools.
  • Outsource to a specialist provider: Accurate text mining, in particular, requires NLP expertise. 


3. Review specific tools available for TDM via your library or open source providers.



Text mining requires bulk downloads of vast amounts of articles and book chapters.


TDM tools run against a working set of data and/or content known as a “corpus.” To assemble a corpus, you need to bulk download (i.e., make a copy of) the material that you wish to mine, often from publisher platforms. Application programming interfaces (APIs) are a standard way for a computer to access and interact with the content. For text mining, APIs make it much easier to download the volume of content that you will typically want to mine and to do so in a programmatic language. Best of all, APIs will typically return results faster, reducing the overall time needed to bulk download content. 


4. Demonstrate how to create an account and obtain an API key (see Elsevier API Key video).



Access scholarly content for text mining purposes right now!


To mine across publisher platforms, use the free Crossref TDM service and Crossref Metadata API to access the full text of content identified by Crossref DOIs across more than 4,000 participating publishers.


Elsevier supports researchers who want to mine text for non-commercial purposes. All its journals and book chapters are converted into XML, a machine readable format, and available through an API. 


5. Visit the Elsevier Developers portal and help researchers register for an account and API key.