Big data, big answers

2:04 p.m., June 9, 2015–Big data is being used for applications ranging from analyzing the popularity of beer to mapping the world’s ecosystems. While having more information can lead to more accurate analysis of a problem, the more isn’t always the merrier when it comes to data. What happens when the needle can’t be found because the haystack is so big?

Pharmaceutical companies, for example, face the daunting task of narrowing millions of potential molecules down to a small pool of the most promising candidates for wet-lab testing.

Now, a team of computer scientists led by the University of Delaware’s Michela Taufer may have a solution to expediting the search for drugs that can treat Parkinson’s disease, HIV, schizophrenia and other diseases.

The group – which includes researchers from the University of New Mexico, San Diego Supercomputing Center and Argonne National Laboratory – recently took first place in the Eighth IEEE International Scalable Computing Challenge (SCALE 2015) for their project, “Accurate Scoring of Drug Conformations at the Extreme Scale“

“The experimental drug development process is slow, expensive, and error prone says Taufer, the David L. and Beverly J.C. Mills Chair of Computer and Information Sciences.

“Pharmaceutical companies now have the technology to perform huge amounts of computation using large numbers of computers working independently to generate solutions. The problem comes in getting all of these computations to agree that there are only a few molecules of interest. Approaches used to address this problem are usually either accurate but not scalable or scalable but not accurate.”

Taufer and her collaborators developed a solution that involves mapping the information into metadata, or data that describes other data. Metadata reduces the quantity of data but preserves its important properties, in this case yielding accurate results while speeding up the process by 400 times.

“In the rapidly growing field of structural biology, algorithms are needed to deal with big data in a way that enables the science to be extracted,” Taufer says. “I like what Carly Fiorina, the former CEO of Hewlett-Packard, said about the goal being to ‘turn data into information and information into insight.’ Data is only as valuable as the insights we can ultimately gain from it.”

About the team

In addition to Taufer, the team included:

Boyu Zhang, who recently completed her Ph.D. at UD and has accepted a position at Purdue University, where she will serve as a big data specialist;
Trilce Estrada, who earned her Ph.D. at UD in 2012 and is now an assistant professor at the University of New Mexico;
Pietro Cicotti, San Diego Supercomputing Center; and
Pavan Balaji, Argonne National Laboratory.

Taufer credits Zhang and Estrada with developing the algorithm and her colleagues at the San Diego Supercomputing Center and Argonne National Lab with providing the supercomputing infrastructure to test it.

The team was first selected as one of five semifinalists among 15 submissions. Their win was announced at the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, held from May 4-7 in Shenzhen, Guangdong, China.

About the challenge

The Eighth IEEE International Scalable Computing Challenge (SCALE 2015) is sponsored by the IEEE Computer Society Technical Committee on Scalable Computing (TCSC).

The objective of the SCALE Challenge is to highlight and showcase real-world problem solving using computing that scales. Effective solutions to many scientific and engineering problems require applications that can scale.

The SCALE Challenge is concerned with advances in application development and their supporting infrastructure that enable scaling.

Recent News

Upcoming Seminars