General objectives

Taking into account the dual dependence of GRIB and the IMIM focus of the present report, only its research groups leaded by IMIM staff are described in detail in the following sections.

Evolutionary Genomics (Mar Albà)

Mar Albà joined the GRIB as a UPF Ramón y Cajal researcher beginning of 2002, and in October 2005 was appointed ICREA Research Professor at IMIM. She is one of the senior scientists of the research lab on Computational Genomics. Within it, she is leading the research group on Evolutionary Genomics (genomics.imim.es/htdocs_eg), which is focused in the use of comparative genomics to learn about the evolution of genes, to identify new functional elements in genomes, and to gain insights into the molecular biology underlying human disease processes. In the period 2002-2007 the group has published 18 articles in ISI-indexed journals and 1 book chapter. Currently, the group includes 5 PhD students.

Its main current research lines are:

1. Evolution of gene coding sequences and implications for gene function and disease

The repertoire of human genes is a mosaic of genes of very ancient origin, present in all eukaryotes, and other genes that have a more recent origin. The group has dated the appearance of the genes in the human genome across the eukaryotic phylogeny, demonstrating that the more recent genes show accelerated evolutionary rates, and have thus a greater potential to play important roles in adaptative processes. The group also performs studies on the evolution and functional implications of amino acid repeat expansions. Among these there are a number of poly-alanine and poly-glutamine repeats whose uncontrolled expansion causes developmental or neurological human diseases, respectively. The studies in the group have shown that whereas disease-associated poly-alanine repeats are remarkably well conserved across mammals, suggesting that they play important ancient roles, poly-glutamine repeats are highly variable across species and show high polymorphism in humans. These properties can help identify putative novel disease loci.

Main publications:

Albà MM, Castresana J. On homology searches by protein Blast and the characterization of the age of genes. BMC Evol Biol 2007; 7:53.
Mularoni L, Veitia RA, Albà MM. Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. Genomics 2007; 89:316-25.
Mularoni L, Guigó R, Albà MM. Mutation patterns of amino acid tandem repeats in the human proteome. Genome Biol 2006; 7:R33.
Albà MM, Castresana J. Inverse Relationship between Evolutionary Rate and Age of Mammalian Genes. Mol Biol Evol 2005; 22:598-606.
Gibbs RA et al. (including Albà MM). Genome Sequence of the Brown Norway Rat Yields Insights into Mammalian Evolution. Nature 2004; 428:493-521.
Huang H et al. (including Albà MM). Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes. Genome Biol 2004; 5:R47.
Castresana J, Guigó R, Albà MM. Clustering of genes coding for DNA binding proteins in a region of atypical evolution of the human genome. J Mol Evol 2004; 59:72-9.
Albà MM, Guigó R Comparative analysis of amino acid repeats in rodents and humans. Genome Res 2004; 14:549-54.

2. Gene transcription regulatory motifs and networks.

Gene upstream regions are rich in motifs involved in the regulation of transcription. The Evolutionary Genomics Group has developed different methods to predict regulatory motifs (PROMO, PEAKS), using pathway information and the conservation of motif arrangements across many different human and mouse genes. In addition, using large-scale human and mouse gene expression data the group has recently shown that housekeeping genes show less conserved promoters than genes with tissue-specific expression, which indicates that genes with constitutive expression require shorter functional promoters. In these studies, we have also identified subsets of regulatory motifs that are specifically over-represented in housekeeping or tissue-specific promoters. These studies enhance our understanding of gene regulatory networks and provide the scientific community with tools for the analysis of gene regulatory sequences.

Main publications:

Farré D, Bellora N, Mularoni L, Messeguer X, Albà MM. Housekeeping genes tend to show reduced upstream sequene conservation. Genome Biol 2007; 8:R140.
Bellora N, Farré D, Albà MM. PEAKS: Identification of regulatory motifs by their position in DNA sequences. Bioinformatics 2007; 23:243-4.
Farré D, Roset R, Huerta M, Adsuara JE, Roselló L, Albà MM, Messeguer X. Identification of patterns in biological sequences at the ALGGEN server: PROMO and MALGEN. Nucl Acids Res 2003; 31:3651-3.
Messeguer X, Escudero R, Farré D, Núñez O, Martínez J, Albà MM. PROMO: detection of known transcription regulatory elements using species-tailored searches. Bioinformatics 2002; 18:333-4.

3. Comparative analysis of virus genomes and prediction of immune evasion virus genes.

The group performs studies on the evolution of virus gene families and the identification of horizontal gene transfer between virus and host genomes, specially related to the herpesvirus family, including human and rodent cytomegalovirus. The group has developed new protocols using protein-domain based searches to increase the sensitivity of detection of remote homology between virus and host genes, leading to the identification of new putative viral genes that interact with the immune response. Compilation of virus protein families in the Virus Database VIDA serves as a basis for these bioinformatics studies. The group has recently established a number of collaborations with experimental groups, with the aim to combine bioinformatics and wet lab studies to improve our understanding of virus immune evasion pathways.

Main publications:

Holzerlandt R, Orengo C, Kellam P, Albà MM. Identification of new herpesvirus gene homologues in the human genome. Genome Res 2002; 12: 1739-48.
Albà MM, Lee D, Pearl FMG, Shepherd AJ, Martin N, Orengo CA, Kellam P. (2001). VIDA: A virus database system for the organization of virus genome open reading frames. Nucl Acids Res 2001; 29:133-6. (update 2002 at http://217.169.56.209/nar/database/summary/201).

Chemogenomics (Jordi Mestres)

The Chemogenomics Laboratory was established in September 2003 with the incorporation of Dr. Jordi Mestres at the GRIB after a 7-year experience in pharmaceutical industry (1 year at Pharmacia&Upjohn in USA and 6 years at Organon in the Netherlands and UK). Today, apart from its head, the lab is composed of one post-doctoral researcher and 5 PhD students, with a current vacancy for a second post-doctoral researcher. In terms of scientific production, within the period 2004-2007, the lab has published 17 articles in ISI journals and 2 book chapters. Research in the lab is done at the interface between chemistry, biology, and informatics, which requires a multidisciplinary team of skilled individuals in different fields. The ultimate aim is to develop and apply novel integrative biochemoinformatics tools for the systematic annotation of molecules to entire target families of therapeutic relevance. This information can then be used either upstream to identify chemical probes for target validation or downstream to identify novel hits for lead generation in the drug discovery process.

Main Research Lines

The Chemogenomics Lab has two main methodological research lines, namely, research on the Chemome and the research on the Proteome. In addition to these methodological lines, we have recently initiated four therapeutic research lines directed to the disease areas of cardiovascular, obesity, pain, and oncology. A brief description of each line is provided next:

1. Chemome Research

Generating Chemical Graph Identifiers. Fast and robust algorithms for indexing molecules have been historically considered strategic tools for the management and storage of large chemical libraries. This lab has developed a modified and further extended version of the MEQNUM naming adaptation of the Morgan algorithm for the generation of a chemical graph identifier (CGI). This new version corrects for the collisions recognized in the original adaptation and includes the ability to deal with tautomerism, regioisomerism, optical isomerism, and geometrical isomerism. Publications: (i) Garriga R, Gregori-Puigjané E, Mestres J. Indexing Molecules with Chemical Graph Identifiers. J Chem Inf Model (submitted)
Defining Biologically-relevant Molecular Descriptors. A novel set of molecular descriptors called SHED (SHannon Entropy Descriptors) has been developed. They are derived from distributions of atom-centered feature pairs extracted directly from the topology of molecules. Similarity between pairs of molecules is then assessed by calculating the Euclidean distance of their SHED profiles, under the assumption that molecules having similar pharmacological profiles should contain similar features distributed in a similar manner. Publications: (i) Gregori-Puigjané E, Mestres J. SHED: Shannon Entropy Descriptors from Topological Feature Distributions. J Chem Inf Model 2006; 46:1615-22; (ii) Mestres J, Martín-Couce L, Gregori-Puigjané E, Cases M, Boyer S. Ligand-based Approach to In Silico Pharmacology: Nuclear Receptor Profiling. J Chem Inf Model 2006; 46:2725-36; (iii) Mestres J, Maggiora GM. Putting Molecular Similarity into Context: Asymmetric Indices for Field-based Similarity Measures. J Math Chem 2006; 39:107-18.
Constructing Virtual Chemical Spaces. The ability to generate novel synthetically-feasible biologically-active candidate molecules by computational means has long been pursued as a comprehensive and efficient approach to explore the vast chemical space potentially compatible to a given target space. We are developing a new ligand-based de novo design approach referred to as SHIFT, for Structural Hopping by Isosteric Fragment Transformations, and is based on the concept of a bioisosteric molecular shift defined as the exchange of a chemical structure for another of the same biological class.

2. Proteome Research

Functional coverage of the proteome by structures. Tools and resources for translating the remarkable growth witnessed in recent years in the number of protein structures determined experimentally into actual gain in the functional coverage of the proteome are increasingly necessary. The Chemogenomics Lab has developed FCP, a publicly accessible continuously updated web tool dedicated to analyzing the current state and trends of the population of structures within protein families. Publications: (i) Garcia-Serna R, Opatowski L, Mestres J. FCP: Functional Coverage of the Proteome by Structures. Bioinformatics 2006; 22:1792-3; (ii) Mestres J. Representativity of Target Families in the Protein Data Bank: Impact for Family-directed Structure-based Drug Discovery. Drug Discov Today 2005; 10:1629-37.
SuSe: Dictionary of protein surface segments. Based on the basic assumption that similar chemical functionalities should bind to similar protein environments we are constructing an integrative database connecting small molecule fragments to protein surface segments on the basis of structural information present in the Protein Data Bank.

Therapeutic Research Lines

References to individual protein targets and bioactive small molecules associated to multifactorial diseases can be found scattered in multiple bibliographic sources over the years. Mining these sources, we are collecting lists of targets associated to four therapeutic areas, namely, cardiovascular, obesity, pain, and oncology, and organising them using functional classification schemes in the four main protein families of therapeutic relevance, namely, enzymes, G protein-coupled receptors, ion channels, and nuclear receptors. Each disease-associated target space is then taken to interrogate an annotated chemical library and extract the bioactive chemical space connected to the corresponding target space. Some of these bioactive ligands were also found to have affinity for targets not directly linked to its respective diseases, thus constituting a valuable indirect source to infer the potential disease-associated off-target space. Compilation, classification, and integration of this prior knowledge provide a comprehensive perspective of the pharmacological space relevant to modern global drug discovery.

Integrative Biomedical Informatics (Ferran Sanz)

This particular research group constitutes the specific platform for the direct research activity carried out by the GRIB Director, Ferran Sanz, as well as for the promotion of large-scale initiatives aiming to take advantage of synergic contributions of several GRIB labs. A post-doctoral scientist and two PhD students are directly assigned to this group.

A clear example of such large-scale initiatives has been the EU-funded INFOBIOMED Network of Excellence (www.infobiomed.org). This network of excellence has been active in the 2004-2007 period with the general objective of structuring European Biomedical Informatics in order to Support Individualised Healthcare. It has been led by Ferran Sanz and had an active participation of the GRIB research groups led by M. Albà, B. Oliva and J. Mestres.

A similar case is the GRIB participation in the @neurIST EU-funded integrated project (www.cilab.upf.edu/aneurist1), a European initiative to integrate biomedical informatics in the management of cerebral aneurisms. In this case the GRIB group leaders with active participation are B. Oliva, J. Villà, M. Pastor and F. Sanz.

CancerGRID (www.cancergrid.eu) is another case of EU-funded project with the participation of scientists from different GRIB labs (J. Mestres, I. Zamora and F. Sanz). The aim of this project is to develop and refine methods for the enrichment of molecular libraries in order to facilitate the discovery of potential anti-cancer agents.

The same strategy is being followed in proposals submitted to the first calls of the EU 7th Framework Programme. A clear example of it is the ALERT project for the early detection of adverse drug events by integrative mining of clinical records and biomedical knowledge, which has been selected for funding in the first call and will start by the beginning of 2008. The GRIB contribution in ALERT will be mainly carried out by the groups led by J. Mestres and F. Sanz and will be focused on the automatic generation of scientifically and mechanistically sound explanations for the signals obtained from the mining of clinical records by means of a combination of analyses of biomedical literature and databases, in silico predictions and pathway mapping.

At the Spanish level, the same integrative philosophy inspired GRIB participation in the INBIOMED thematic network funded by the ISCIII (www.inbiomed.retics.net), which lasted from 2003 to 2006. A new thematic network proposal (COMBIOMED) has been submitted to the 2007 call of the ISCIII.

The participation of the GRIB in the Spanish National Institute of Bioinformatics is another example of the joint participation of several GRIB labs in strategic initiatives.

On the other hand, F. Sanz is particularly interested in the application of biomedical informatics in the pharmaceutical domain. I this field, he is currently interested in the integrative knowledge management and exploitation. His participation in the European (www.imi-europe.org) and Spanish (www.medicamentos-innovadores.org) Technology Platforms on Innovative Medicines are key activities in such a domain. In particular, he is acting as expert by invitation of the European Commission and EFPIA, and he is coordinating the Spanish Platform.

Another collaborative initiative in the pharmaceutical field that is being promoted is the ChemBioBank, a joint initiative of the Barcelona Scientific Park, the Pharmacological Screening Platform of the University of Santiago de Compostela and the GRIB (Jordi Mestres and Ferran Sanz). The aim is the collection, in vitro and in silico annotation, and the dissemination and exploitation of the publicly-generated Spanish chemical diversity.

A selection of recent scientific publications of F. Sanz is:

Gutiérrez-de-Terán H, Centeno NB, Pastor M, Sanz F. Novel approaches for modeling of the A1 adenosine receptor and its agonist binding site. Proteins 2004; 54:705-15.
Fontaine F, Pastor M, Sanz F. Incorporating Molecular Shape into the Alignment-free GRid-INdependent Descriptors (GRIND). J Med Chem 2004; 47: 2805-15.
Fontaine F, Pastor M, Zamora I, Sanz F. Anchor-GRIND: Filling the gap between standard 3D-QSAR and the GRid-INdependent Descriptors. J Med Chem 2005; 48: 2687-94. Bonis J, Furlong LI, Sanz F. OSIRIS: a tool for retrieving literature about sequence variants. Bioinformatics 2006; 22: 2567-9.
Dezi C, Brea J, Alvarado M, Raviña E, Masaguer CF, Loza MI, Sanz F, Pastor M. Multi-structure 3D-QSAR studies on a series of conformationally constrained butyrophenones docked into a new homology model of the 5-HT2A receptor. J Med Chem 2007; 50: 3242-55.
Klinger R, Furlong LI, Friedrich CM, Mevissen HT, Fluck J, Sanz F, Hofmann-Apitius M. Identifying Gene Specific Variations in Biomedical Text. J Bioinform Comput Biol (in press).