Thesis defense of Sonja Haenzelmann: “Pathway-centric approaches to the analysis of high-throughput genomics data”

October 11, 2012

On thursday, 11th of October, Sonja Haenzelmann, member of the Functional Genomics Group of GRIB (IMIM-UPF) will defend her thesis at 11:30 at Xipre Room of PRBB.

On thursday, 11th of October, Sonja Haenzelmann, member of the Functional Genomics Group of GRIB (IMIM-UPF) will defend her thesis at 11:30 at Xipre Room of PRBB.

Abstract

In the last decade, molecular biology has expanded from a reductionist view to a systems-wide view that tries to unravel the complex interactions of cellular components. Owing to the emergence of high-throughput technology it is now possible to interrogate entire genomes at an unprecedented resolution. The dimension and unstructured nature of these data made it evident that new methodologies and tools are needed to turn data into biological knowledge. To contribute to this challenge we exploited the wealth of publicly available high-throughput genomics data and developed bioinformatics methodologies focused on extracting information at the pathway rather than the single gene level. First, we developed Gene Set Variation Analysis (GSVA), a method that facilitates the organization and condensation of gene expression profiles into gene sets. GSVA enables pathway-centric downstream analyses of microarray and RNA-seq gene expression data. The method estimates sample-wise pathway variation over a population and allows for the integration of heterogeneous biological data sources with pathway-level expression measurements. To illustrate the features of GSVA, we applied it to several use-cases employing different data types and addressing biological questions. GSVA is made available as an R package within the Bioconductor project.

Secondly, we developed a pathway-centric genome-based strategy to reposition drugs in type 2 diabetes (T2D). This strategy consists of two steps, first a regulatory network is constructed that is used to identify disease driving modules and then these modules are searched for compounds that might target them. Our strategy is motivated by the observation that disease genes tend to group together in the same neighborhood forming disease modules and that multiple genes might have to be targeted simultaneously to attain an effect on the pathophenotype. To find potential compounds, we used compound exposed genomics data deposited in public databases. We collected about 20,000 samples that have been exposed to about 1,800 compounds. Gene expression can be seen as an intermediate phenotype reflecting underlying dysregulatory pathways in a disease. Hence, genes contained in the disease modules that elicit similar transcriptional responses upon compound exposure are assumed to have a potential therapeutic effect. We applied the strategy to gene expression data of human islets from diabetic and healthy individuals and identified four potential compounds, methimazole, pantoprazole, bitter orange extract and torcetrapib that might have a positive effect on insulin secretion. This is the first time a regulatory network of human islets has been used to reposition compounds for T2D.

In conclusion, this thesis contributes with two pathway-centric approaches to important bioinformatic problems, such as the assessment of biological function and “in silico“ drug repositioning. These contributions demonstrate the central role of pathway-based analyses in interpreting high-throughput genomics data.