Sept. 27, 2012–University of Delaware professor Hagit Shatkay has co-authored a new book, detailing systematic ways to automate biological text analysis.

Entitled “Mining the Biomedical Literature,” the book offers readers a concise introduction on how to mine biological text effectively in order to find relevant information. The book also describes general techniques for retrieving and extracting information and analyzing text.

According to Shatkay, it is the only authored book in the area of biological text mining so far.

Shatkay, a professor of computer and information sciences, explains that while medical text mining has been applied to patient records and physicians’ notes for quite some time, the tools developed in the clinical context do not necessarily apply to biology and to scientific publications.

“Biology is a data rich science, but finding and accessing existing information remains a challenge for many scientists because this information is often hidden within the millions of published articles in scientific journals” Shatkay says.

The project evolved alongside the human genome project, which was completed in 2003. It stemmed from the need to quickly access results reported in biomedical literature in order to assist in understanding the roles of genes and proteins. Finding such information necessitates the development of tools to effectively mine published research in medicine, chemistry, biology and other areas pertinent to human health.

“Analyzing purely biological data such as genes and proteins is something that biologists know how to do. Analyzing text, however, is something that biologists are not typically trained for and did not previously think they should care about,” she continues.

“All of a sudden, there is a whole genome and researchers move from studying a particular gene to studying complete processes and systems. In order to do that, they need to find information about the many genes and proteins comprising such systems.”

Shatkay co-authored the book with longtime colleague Mark Craven from the University of Wisconsin, Madison. She and Craven were among the first to author papers on biomedical text mining, in 2000 and 1999 respectively, before the area emerged as a prominent research field.

“We don’t expect all biologists to become tool builders. We want to provide biologists with the means to effectively mine biological text in order to further their research; the book provides biomedical researchers with sufficient background to understand how text-mining works, understanding of what biomedical text mining tools may achieve, and ways to evaluate and interpret which existing tools can help them in their work,” she said.

About the authors

Hagit Shatkay joined UD in 2010 as an associate professor of computer and information sciences. She is also an affiliated faculty member of the Center for Bioinformatics and Computational Biology (CBCB). She previously served as an associate professor at Queen’s University in Kingston, Ontario. Her research focuses on biomedical text mining, computational biomedicine and machine learning. She has served on several editorial and review boards for publications and journals involving bioinformatics, computational biology, artificial intelligence and information retrieval.

Mark Craven is a professor of biostatistics and medical informatics, and of computer science at the University of Wisconsin, Madison. His research focuses on developing and applying machine learning methods in the context of biological and medical problems.

Article by Karen B. Roberts