"Big Data" refers to a technology phenomenon that has arisen since the mid-1980s. As computers have improved, growing storage and processing capacities have provided new and powerful ways to gain insight into the world by sifting through the infinite quantities of data available.Photo credit: DARPA.
RIVERSIDE, Calif. — Scientists at the University of California, Riverside work on a variety of research topics critical to human health, such as genome biology, biomedical sciences, chemistry and computational biology. Next-generation sequencing and other high-throughput technologies routinely used in researching these topics generate vast amounts of data, increasing the need for high-performance computing.
The campus has now received funding of $600,000 from the National Institutes of Health (NIH) to support data-intensive research — also often called Big Data science.
Thomas Girke is an associate professor of bioinformatics at UC Riverside.Photo credit: Girke Lab, UC Riverside.
“Over the past five-eight years, data sizes have grown in many of our research areas by a factor of more than 1000, which has transformed data processing into one of the most expensive research infrastructure investments,” said Thomas Girke, an associate professor of bioinformatics in the Department of Botany and Plant Sciences and the grant’s principal investigator. “Particularly data-intensive research areas that will benefit from this grant are high-throughput biology, drug discovery, and various other human health related disciplines.”
Specifically, the grant will make possible the purchase of a complex instrument: a Big Data cluster with high-performance CPU resources and data storage space equivalent to 5,000 modern laptops.
Big Data has been identified as a contributor to the growth of the U.S. economy over the next few decades. The NIH grant is expected to make UC Riverside more competitive in attracting new outstanding faculty and facilitating the research of many existing programs. The grant is expected to benefit more than 160 scientists from more than 15 departments and several colleges at UCR.
“Currently, due to the very high demand, the existing compute resources of our facility are operating at maximum capacity, impacting its ability to support data intensive biomedical research,” said Girke. “This data overflow often results in delays in processing new research data in a time-efficient manner, which, in turn, slows down the discovery process of many projects. With the new equipment grant from NIH the compute facility will be able to at least quadruple its current compute resources, which should greatly help resolve our shortage of Big Data compute resources to support many new research programs.”
UCR’s research compute infrastructure is provided by a central bioinformatics facility which is part of the Institute of Integrative Genome Biology (IIGB). Girke, a member of IIGB, is currently the director of the bioinformatics facility. The new instrument will be housed in a brand-new server room of the Genomics Building where many IIGB researchers work.
IIGB’s bioinformatics facility was formed in 2003. In 2008 it became the largest high-performance research compute facility at UCR.