The power behind big data: high-performance computing in environmental toxicogenomics


What happens when cutting-edge biology meets powerful computing? At the Department of Environmental Science, researchers are harnessing the power of high-performance computing (HPC) to investigate how pollution leaves its mark—not only on individuals, but on future generations.

What happens when cutting-edge biology meets powerful computing? At the Department of Environmental Science, researchers are harnessing the power of high-performance computing (HPC) to investigate how pollution leaves its mark—not only on individuals, but on future generations.

Bioinformatics Uppmax. Photo: Mikael Wallerstedt

From Sanger to next-generation sequencers: a genomic revolution

Over the past few decades, sequencing technologies have undergone a dramatic evolution, revolutionizing molecular biology and significantly contributing to biomedical and environmental research. The first major breakthrough came with the development of Sanger sequencing in the 1970s, a method that laid the foundation for the Human Genome Project. Despite its accuracy, Sanger sequencing was time-consuming and costly, making large-scale genome projects an arduous challenge.

The advent of next-generation sequencing (NGS) in the mid-2000s marked a paradigm shift. Technologies such as Illumina sequencing enabled massively parallel processing of millions of DNA fragments, drastically reducing the cost and time required to sequence entire genomes. This democratization of sequencing opened new avenues in research. In environmental toxicology, for example, NGS allowed for the comprehensive study of molecular mechanisms driving the toxicity of harmful chemicals in the environment, giving rise to the field of environmental toxicogenomics.

Enter high-performance computing

But with greater sequencing capacity came a new challenge: big data. A single modern sequencing run can now generate hundreds of gigabytes to terabytes of raw data. Managing, storing, and analyzing this information requires advanced computational infrastructure. That’s where HPC systems, cloud-based platforms, and scalable bioinformatics pipelines come into play.

As multiomics approaches—integrating genomics, epigenomics, transcriptomics, proteomics, and more—become increasingly common, the complexity of data analysis has skyrocketed. Sophisticated algorithms are needed to align sequences, call variants, quantify expression, and interpret biological meaning.

These processes are computationally intensive and memory-demanding, often requiring parallel processing and specialized software environments. To meet this demand, researchers rely on open-source, community-curated bioinformatics tools, and increasingly publish raw data and code alongside their manuscripts to promote transparency and reproducibility.

Researchers dive into toxicogenomics

At the Department of Environmental Science, high-performance computing is central to several ongoing projects evaluating the molecular effects of environmental contaminants. With support from the National Academic Infrastructure for Supercomputing in Sweden (NAISS), researchers remotely access the HPC clusters at Uppmax (Uppsala University) to process sequencing data from toxicological studies.

Former PhD student Mauricio Roza. Photo: Stella Papadopoulou

Within this research environment, PhD students Mauricio Roza and Eleftheria Theodoropoulou , together with postdoctoral researcher Andrey Höglund and Professor Oskar Karlsson , are using HPC to uncover how environmental pollutants affect health—both now and in generations to come.

Uncovering the hidden legacy of pollution

Mauricio Roza’s recent doctoral thesis explores how exposure to endocrine-disrupting pesticides during critical developmental windows can induce epigenetic modifications, alter gene expression, and produce transgenerational effects in amphibians. His research uses reduced representation bisulfite sequencing (RRBS) and RNA sequencing to study the molecular mechanisms behind these changes—even in generations that were never in contact with the toxic chemicals.

PhD student Eleftheria Theodoropoulou. Photo: Private

Eleftheria Theodoropoulou’s work follows a similar line, but in mammals. Using mice as a model for human health, she investigates how DBP—a common food and water contaminant found in plastic containers—affects the epigenome and transcriptome across multiple generations. Her research focuses on key biological systems, such as the reproductive, immune, and metabolic systems.

Meanwhile, Andrey Höglund has contributed to several projects, including a study on PFOS—one of the so-called “forever chemicals.” His work revealed cancer-related DNA methylation alterations in human breast cells, further highlighting the potential long-term risks posed by environmental pollutants.

Together, these studies illustrate the critical role of computational infrastructure in advancing environmental toxicogenomics. By leveraging HPC, researchers are able to turn massive, complex datasets into insights that deepen our understanding of pollution’s invisible legacy.
 

Add new comment