Get the latest articles and downloads sent to your inbox in a monthly newsletter.

Get the latest articles and downloads sent to your inbox in a monthly newsletter.

Knowledge discovery through text analytics: advances, challenges and opportunities

By Akhilesh K.S. Yadav, Tata Institute of Social Sciences | Mar 01, 2017

Text mining for patterns and trends

Text mining or text analytics is the analysis of unstructured data contained in natural language text using various methods, tools and techniques. It has become an important research process with applications in many different disciplines. I interviewed Gabe Ignatow to learn more about advances, challenges and opportunities in text analytics. Gabe is an Associate Professor in the Department of Sociology at the University of North Texas and co-author of Text Mining: A Guidebook for the Social Sciences (2016). 

 

 

Gabe Ignatow1. Could you tell us about yourself: research interests, publications, projects, teaching, etc.?

 

Born and raised in New York, I was educated in Virginia (University of Virginia) and California (Stanford University), then worked two years each in Turkey and Israel. I have been living and working in Dallas-Fort Worth for the last 10 years. 

 

My research interests have evolved over the years, as is the case for anyone. I am probably best known for a few theoretical and empirical papers on how sociology can incorporate insights from cognitive science. But, because of my graduate advisor John Meyer and my four years abroad, I have also done work in the areas of global environmentalism, public libraries in developing countries, and globalization and religious change.

 

Even before graduate school I was interested in text analysis methods and have published a few papers over the years using these methods. About five years ago I decided to make text analysis/text mining the central focus of my research.

 

These days I’m working on three book projects, including an introductory text mining textbook, a six-author book on emerging digital social research methods, and a book manuscript on Pierre Bourdieu and digital sociology.

 

What can I say about my teaching? I’ve taught on all of the above topics at every level from introductory undergraduate courses to doctoral seminars. I’ve always enjoyed teaching, particularly the classroom interaction with students.

 

 

2. Could you share your experiences (reminiscence of life, learning, career, etc.) with different organizations, associations and institutions?

 

What a great question. The three organizations that have had the greatest impact on me are: Outward Bound (a youth organization) when I was a teenager, the University of Virginia (UVA) and the Echols Scholars program at UVA, and the University of North Texas (UNT). The Echols Program at UVA allows select undergraduates to take whatever courses interest them without regard for core requirements. For me it was the ideal program at the ideal university, and it allowed me to develop as an interdisciplinary researcher. In a very real sense I owe my career to the Echols Scholars founders and to my amazing undergraduate advisor Jonathan Haidt.

 

UNT is where I have worked for the past 10 years. So much has happened here that has allowed me to learn and develop as a researcher and teacher, but also as an administrator. UNT is classified as an “emerging research university,” meaning we are transitioning from a regionally oriented teaching institution to a globally oriented research institution. The transition has not always been smooth, but it has certainly been exciting to be part of an organization that is transforming itself from stem to stern.

 

 

3. You have been involved in various other roles — administrator, teacher of sociology and psychology, researcher in text mining and many more. What do you call yourself?

 

A sociologist. I explored other career paths before choosing sociology, and sociology quickly became my home discipline and professional identity. Of course, there are different types of sociologists doing very different kinds of work. The fact that sociology has not settled into rigid theoretical or methodological orthodoxies has always appealed to me.

 

 

4. How is text mining different from data mining? What is the application of text mining in different disciplines? How do researchers currently practice text mining?

 

Text mining is a form of data mining. Where other forms of data are typically organized as matrices, raw text data is “unstructured.” Simply put, text mining involves collecting and analyzing large volumes of textual data. Typically text mining is performed to learn about the groups or communities that produced the text, but its ultimate purpose depends on the field and the interests of the researcher. Social scientists use text mining tools to learn about shifting public opinion; marketers use it to learn about consumers’ opinions of products and services; and it has even been used to predict the direction of stock markets. As we discuss in the book, text mining involves multiple different tools for collecting data, as well as multiple different approaches to analyzing the data collected. These approaches include sentiment analysis, topic modeling and metaphor analysis, among others.

 

 

5. How is text mining different from Natural Language Processing?

 

Natural Language Processing lies at the intersection of linguistics and computer science. In the social sciences, text mining exists at the intersection of social science and data science. Text mining often makes use of Natural Language Processing tools and techniques.

 

 

6. How has text mining become so popular among industries and educational institutions? 

 

The popularity of text mining today is driven by technology and the availability of unstructured data. I think much of the appeal of text mining is due to its low cost in comparison with other methods for gauging public opinion, and because social media and the internet generally are the central locations for all sorts of important conversations. Academic and industrial researchers are benefiting from new technologies, but our relationship to these technologies is one of being, for the most part, downstream from them. 

 

 

7. Traditionally industries and businesses have employed data and text mining, but recently data and text mining have provided great opportunities for academicians to study different aspects of human/social life by applying various techniques. Can text mining play an active role in societal improvements? What is your opinion as an academician and sociologist? 

 

Text mining techniques are powerful tools that can be used to achieve many different ends. They can certainly be used to make organizations more efficient and productive, but they can be abused as well. In our textbook we compare text mining technologies to polygraph (lie detection) technologies. It took the better part of a century for police agencies and courts to define the appropriate and ethical uses of polygraphs, and we may see a similar pattern of gradual adaptation with text mining technologies.

 

 

8. Researchers use text mining in different fields and apply different techniques to discover patterns. One subdomain of text mining is Digital Humanities, which applies tools and techniques of text mining to cultural resources to find a new pattern. In what other areas is text mining being applied?

 

Text mining is widely applied in social science studies of social media and new media. This makes sense given the ubiquity of these media technologies in everyday life, and the fact that it is not too difficult for researchers to access website data.

 

 

9. What are the latest technology trends in text mining that every researcher and academician should be aware of?

 

Everyone should know that there are many software tools available for web scraping, web crawling, cleaning data, organizing unstructured data, and analyzing data that do not require advanced programming skills. Text mining is not only for boffins.

 

More students are entering graduate programs with programming skills, for example, familiarity with Python and R. This will increasingly allow the social sciences to make positive contributions to computer science technologies rather than always being downstream of innovations.

 

 

10. Do you think it is a big opportunity for libraries and librarians?

 

Given that topic modeling emerged out of library science, it makes sense that text mining tools would be used by librarians to more efficiently evaluate their collections as well as, perhaps, analyze the attitudes and opinions of library patrons.

 

 

11. What are the main challenges for text mining?

 

The main challenges I see are institutional rather than technical. How will undergraduate and graduate academic programs incorporate text mining into curricula? How will they provide training in these methods? The institutions that develop viable models for text mining pedagogy will be in a great position.

 

A second challenge I see is a need for academic publishers, especially those that publish journals, to think about some new ways of rewarding innovative text mining research. For instance, in sociology text mining papers are not exactly methodology papers because they do not develop new methodologies so much as new procedures for using existing text mining and data mining tools. On the other hand, it can be challenging to convince reviewers and editors trained in other methods of the value of text mining papers. We need a few journals and book publishers that reward innovative uses of text mining and data mining technologies.

 

 

12. Plenty of tools and techniques are available for data mining, but not text mining. Where do you think text mining is headed within the academy?

 

We are at the start of a revolution in social research methodology. Over the past century the social sciences have developed ethnographic research methods, focus groups, social survey analysis, network analysis and now data mining/text mining. Over the next 10-20 years I expect rapid development of both text mining technologies and of social science procedures and research designs for using them.

 

The current technologies are ahead of our institutional capacities and cultures. So our universities (e.g., graduate programs, Institutional Review Boards) and private and public sector organizations need to catch up if we want our students, clients, customers, etc. to be able to take advantage of text mining technologies.

 

Text mining will become a standard tool for academic and applied social researchers, along with ethnography, interviews, focus groups, surveys and network analysis methods.

 

 

13. Do you have a final message for academics and practitioners involve in teaching and research?

 

If you’re working in social science text mining I’d love to hear from you at @gabe_ignatow.

 

 

Thank you so much for the interview.

 


 

Editor’s Note: Elsevier supports researchers who want to mine text for non-commercial purposes. All its journals and book chapters are converted into XML, a machine readable format, and available through an API. Researchers can visit the Elsevier Developers portal to register for an account and API key.

 

Comments