Bioinformatik En fördjupning i forskningen: bio- och kemoinformatik Hantering av stora databaser Analys av DNA sekvens data DNA alignments Bestämming av gener Genome assembly Protein strukturbestämning Analys av proteinexpression Anders Backlund Analys av protein-protein interaktioner Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi 2015.10.01 Anders Backlund Jämförande genomanalys Modellering av evolution Modellering av populations och systembiologi Avd. f. Farmakognosi Inst. f. Läkemedelskemi Vad är det vi vill ha svar på? Vilka gener? Vilka arter? (läkemedel för en art, eller mot en art?) Vilken data är vi intesserade av? Anders Backlund Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Avd. f. Farmakognosi Inst. f. Läkemedelskemi Sekvensanalys av en gen Sekvensanalys av en gen… eller ett helt genom! Databaser Sekvensbehandling och genomeassembly (GenBank, SWISSPROT, EMBL) Homologisökningar Genomstruktur (BLAST) Genprediktion -Open Reading Frames (ORF) -Gen prediktion -Prediktion av andra genetiska element (glimmer, GeneMark, ORF finder) Statistisk analys av DNA eller AA innehåll Anders Backlund (Biopython, bioPERL) Genannotering Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Avd. f. Farmakognosi Inst. f. Läkemedelskemi -Homologisökningar -Motiv-sökningar Strese et al. BMC Evolutionary Biology 2014, 14:119 http://www.biomedcentral.com/1471-2148/14/119 Page 3 of 13 Strese et al. BMC Evolutionary Biology 2014, 14:119 http://www.biomedcentral.com/1471-2148/14/119 RESEARCH ARTICLE Open Access A recently transferred cluster of bacterial genes in Trichomonas vaginalis - lateral gene transfer and the fate of acquired genes Åke Strese1, Anders Backlund1 and Cecilia Alsmark1,2* Abstract Background: Lateral Gene Transfer (LGT) has recently gained recognition as an important contributor to some eukaryote proteomes, but the mechanisms of acquisition and fixation in eukaryotic genomes are still uncertain. A previously defined norm for LGTs in microbial eukaryotes states that the majority are genes involved in metabolism, the LGTs are typically localized one by one, surrounded by vertically inherited genes on the chromosome, and phylogenetics shows that a broad collection of bacterial lineages have contributed to the transferome. Results: A unique 34 kbp long fragment with 27 clustered genes (TvLF) of prokaryote origin was identified in the sequenced genome of the protozoan parasite Trichomonas vaginalis. Using a PCR based approach we confirmed the presence of the orthologous fragment in four additional T. vaginalis strains. Detailed sequence analyses unambiguously suggest that TvLF is the result of one single, recent LGT event. The proposed donor is a close relative to the firmicute bacterium Peptoniphilus harei. High nucleotide sequence similarity between T. vaginalis strains, as well as to P. harei, and the absence of homologs in other Trichomonas species, suggests that the transfer event took place after the radiation of the genus Trichomonas. Some genes have undergone pseudogenization and degradation, indicating that they may not be retained in the future. Functional annotations reveal that genes involved in informational processes are particularly prone to degradation. Conclusions: We conclude that, although the majority of eukaryote LGTs are single gene occurrences, they may be acquired in clusters of several genes that are subsequently cleansed of evolutionarily less advantageous genes. Keywords: Lateral gene transfer (LGT), Trichomonas, Peptoniphilus, Phylogeny Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Background The protozoan parasite Trichomonas vaginalis is a human pathogen that causes the most common, non-viral, sexually transmitted disease in the world, infecting 248 million people yearly according to WHO estimates [1]. Men are often asymptomatic carriers of the parasite, while symptoms in women range from malodorous vaginal discharge, inflammation and swelling of the urogenital tract to increased risk for cervical cancer, adverse pregnancy outcomes and an increased susceptibility to HIV-1 infection [2-4]. Treatment today is limited to two nitroimidazole derivatives, tinidazole and metronidazole, although failure * Correspondence: cecilia.alsmark@fkog.uu.se Division of Pharmacognosy, Department of Medicinal Chemistry, Uppsala University, Uppsala, Sweden 2 Department of Virology, Immunobiology and Parasitology, National Veterinary Institute, Uppsala, Sweden 1 of treatment due to resistance has been reported [5]. A draft genome sequence of T. vaginalis G3 was accomplished in 2007 [6], revealing an unusually large genome of more than 160 Mbp, encoding up to 60,000 genes in addition to numerous and diverse repeated regions. LGT is the acquisition and fixation in the recipient genome of genetic material from a foreign donor organism without sexual transfer. It offers a rapid retrieval of new capabilities such as the ability to utilize new metabolites [7], degradation of chemicals such as pesticides [8] or the deployment of drug resistance genes [9]. The bacterial routes for uptake of foreign DNA are well described by features such as transformation, conjugation and transduction, or by the activities of “gene transfer agents” such as transposable elements. The mechanisms for eukaryotic gene acquisition are less well described [10], although one of the favored hypothesis suggests that the transfer is © 2014 Strese et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Figure 1 Gene map overview of the genes of TvLF. Overview of the genes of TvLF, in five strains investigated, of T. vaginalis, and the corresponding region in P. harei. Genes are numbered in order of appearance so that all orthologs have the same number. All details are listed in Additional file 1: Table S5. Note that P. harei contains eight genes without homologs in any strain of T. vaginalis (genes abbreviated 2, 4, 6, 10–11, 20, and 22–23) and T. vaginalis possess three genes absent in P. harei (genes denoted with asterisk, abbreviated 3, 5, and 19). Additionally, three genes (abbreviated 14, 16 and 29) are unique for T. vaginalis G3, Pinna and Moz-4, and are caused by stop codons in these strains. Gene classifications that are denoted by the different colors are according to Kyoto Encyclopedia of Genes and Genomes pathway (KEGG pathway). Genes without suitable KEGG-classification are categorized as “other function”. The majority of the primer-pairs used for amplifying and sequencing the genes of TvLF are visualized along with the primer-pair abbreviation found in Additional file 1: Table S8. P. harei. In previous studies T. gallinae and T. tenax have been verified to be the two most closely related species to T. vaginalis within the class of Trichomonadea [26]. This indicates that the transfer has occurred after the divergence of T. vaginalis from the remainder of the genus. A recent acquisition would be in agreement with the unusually high nucleotide sequence similarity to orthologs of the putative bacterial donor (Table 2). The genomic architecture of TvLF The genes on TvLF encompass a stretch of 27 consecutive genes of bacterial origin, TVAG_243570-TVAG_243830, Fylogenetisk analys av sekvensdata DNA / AA sekvenser (GenBank, SWISSPROT, EMBL) Sekvens alignment (clustal W, X, Muscle, Geneious…) Tree-building algorithms (parsimony, maximum likelihood, bayesian inference…) Tree support Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi (jackknife, bootstrap, Bremer-support…) Tree interpretation Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi (gain/loss, insert, deletion, traits & characters, trends) Fotosyntetiserande organismer… Anders Backlund Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Avd. f. Farmakognosi Inst. f. Läkemedelskemi RuBisCO L2 L8 L10 Anders Backlund Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Avd. f. Farmakognosi Inst. f. Läkemedelskemi L8S8 138 E. Svangård et al. / Phytochemistry 64 (2003) 135–142 Dockningsanalys Protein-protein bindning Protein-DNA bindning Statistiska modelleringar CAPRI initiativet Critical Assessment of PRediction of Interactions http://www.ebi.ac.uk/msd-srv/capri/ Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Fig. 1. A molecular surface plot of vodo M (1), vodo N (2) and kalata B1, coloured by polarity (red indicating negatively charged residues, blue indicating positively charged residues, white indicating hydrophobic residues, and yellow indicating hydrophilic residues). The molecular surfaces were calculated in QUANTA (Accelrys Inc, San Diego). Each peptide is presented in four molecular surface plots, representing rotations of 90! around the vertical axis, as indicated at the top of the figure. The modelled structures of vodo M and vodo N show amphipathic structures with hydrophobic amino acids presented on the surface (even in a polar environment). Strese et al. BMC Evolutionary Biology 2014, 14:119 http://www.biomedcentral.com/1471-2148/14/119 APG & APG II (III) An ordinal classification of flowering plants. The Angiosperm Phylogeny Group (29 författare) Annals of the Missouri Botanical Garden 1998, 85, pp. 531-553. ––––––––––––––––––––––––––––––––––––– An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi The Angiosperm Phylogeny Group (27 författare) Botanical Journal of the Linnean Society 2003, 141, pp. 399-436. Genetik, mutationer Cancerforskning Genetiska sjukdommar Allelisk variation 1000 genome projektet Kartlägga alla SNPs Page 3 of 13 Amborellaceae Nymphaeaceae Austrobaileyales Chloranthaceae Canellales Piperales Laurales Magnoliales Acorales Alismatales Asparagales Dioscoreales Liliales Pandanales Arecales Poales Commelinales Zingiberales Ceratophyllales Ranunculales Proteales Gunnerales Caryophyllales Santalales Saxifragales Crossostomatales Geraniales Myrtales Celastrales Malphigiales Oxalidales Fabales Rosales Cucurbitales Fagales Brassicales Malvales Sapindales Cornales Ericales Garryales Gentianales Lamiales Solanales Aquifoliales Asterales Apiales Dipsacales Anders Backlund Figure 1 Gene map overview of the genes of TvLF. Overview of the genes of TvLF, in five strains investigated, of T. vaginalis, and the corresponding region in P. harei. Genes are numbered in order of appearance so that all orthologs have the same number. All details are listed in Avd. f. Farmakognosi Inst. f. Läkemedelskemi Additional file 1: Table S5. Note that P. harei contains eight genes without homologs in any strain of T. vaginalis (genes abbreviated 2, 4, 6, 10–11, 20, and 22–23) and T. vaginalis possess three genes absent in P. harei (genes denoted with asterisk, abbreviated 3, 5, and 19). Additionally, three genes (abbreviated 14, 16 and 29) are unique for T. vaginalis G3, Pinna and Moz-4, and are caused by stop codons in these strains. Gene classifications that are denoted by the different colors are according to Kyoto Encyclopedia of Genes and Genomes pathway (KEGG pathway). Genes without suitable KEGG-classification are categorized as “other function”. The majority of the primer-pairs used for amplifying and sequencing the genes of TvLF are visualized along with the primer-pair abbreviation found in Additional file 1: Table S8. P. harei. In previous studies T. gallinae and T. tenax have been verified to be the two most closely related species to T. vaginalis within the class of Trichomonadea [26]. This indicates that the transfer has occurred after the divergence of T. vaginalis from the remainder of the genus. A recent acquisition would be in agreement with the unusually high nucleotide sequence similarity to orthologs of the putative bacterial donor (Table 2). The genomic architecture of TvLF The genes on TvLF encompass a stretch of 27 consecutive genes of bacterial origin, TVAG_243570-TVAG_243830, Proteinstrukturmodellering spanning more than 34 kbp of the 52 kbp long contig DS113827 in the T. vaginalis G3 genome (Figure 1, Table 2 and Additional file 1: Table S4 and Additional file 1: Table S5). Although absent from the sequenced eukaryote gene-pool, an homologous region was detected in the Strain Isolated Location firmicute bacterium Peptoniphilus harei (contig 0004, G3 (PRA98)1 1973 Beckham, United Kingdomeller röntgen kristallografi NMR, är långsamma och dyra metoder positions 22397–56995, HMPREF9286_0330-HMPREF9 Casu2 (SS-22)1 2008 Sardinia, Italy 286_0294, reverse direction). The TvLF stands in contrast Moz-4 (MPM4)1 1997 Mozambique to other LGTs detected in parasite genomes that typically Pinna (SS-28)1 1998 Sardinia, Italy are singletons embedded among vertically inherited kan vara vilseledande Tor-A (TO-01)1 2010 Turin, Italy genes [17,27]. 2 A comprehensive comparative sequence analysis of the T1 1993 Taipei, Taiwan 2 TvLF in T. vaginalis G3 and the putative bacterial donor P9 Prague, Czech Republic reveals an unusually high degree of nucleotide sequence 1 Strains used in this study to investigate TvLF. 2 similarity (79-98%), compared to that of typical prokaryoteStrains tested positive for the presence of three randomly selected TvLF genes. Table 1 Identification of T. vaginalis strains used in this study Experimentell bestäming Homologibaserad modellering Algoritmer för modellering av proteinveckning extremt komplexa Anders Backlund Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Avd. f. Farmakognosi Inst. f. Läkemedelskemi Lipinskis rule of five ChemGPS-NP dimensioner En global, 8D, karta över NP kemiska rymd Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. –––––––––––––––––––––––––––––––––––––––––––––– C. A. Lipinski, et al. 2 aromaticitet &konjugation Advanced Drug Delivery Review 23:3 p.3-25. 1997. 3 lipofilicitet, polaritet & vätebindningskapacitet 1 storlek, form, polariserbarhet 4 flexibilitet & rigiditet Hur många tänkbara ’läkemedelslika’ substanser finns det då? 5 electronegativitet, antal kväve, halogener & amider 6 antal ringar, roterbara bindningar, amider & OH 7 antal dubbelbindningar, syre & kväve Anders Backlund Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Avd. f. Farmakognosi Inst. f. Läkemedelskemi 8 aromatiska & alifatiska OH, omättnad, LAI Larsson, J., Gottfries, J., Muresan, S., och Backlund, A. ChemGPS-NP: tuned for navigation in biologically relevant chemical space Journal of Natural Products, 2007, Vol. 70 (5) pp 789-794 …formalisering… OH Vad tittar man på?... O O H3CO H3C OCH3 OH OC1=C(C=O)C=C(O)C=C1 CC1=CC(OC)=CC(OC)=C1 OH OC1=C(C=O)C(C2=C(OC)C=C(OC)C=C2C)=C(O)C=C1 O OCH3 Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi N N O 1b OH H3CO CH3 Anders Backlund SMILES = Simplified Molecular Input Line Entry System http://www.daylight.com/smiles/ Avd. f. Farmakognosi Inst. f. Läkemedelskemi N CH3 N Vilka strukturer är mest lika? ...och sen’ då? DragonX beräknar de 35 molekyldeskriptorerna, dessa är beskrivande värden för 35 olika aspekter av molekylen 1 2 3 SIMCA gör sedan med hjälp av dessa 35 deskriptorer en prediktion av var på kartan över kemisk rymd som just denna molekyl hör hemma. Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi 4 Därmed bestäms molekylens position i 8D (åtta dimensioner), och denna position blir alltid densamma. 55 - Barettin Vilka strukturer är mest lika? Euklidiska avstånd över 8D! 10.9 1 11.9 3 5.8 8.7 8.4 3.9 4.6 5 11.8 5 - Barettin 12.4 11.7 2 4 Klassificera nya cytostatika… Strukturer och/eller deskriptorer? Olika sidor av samma mynt, precis som systematik och ekologi. ALKYLERARE PROTEASOM TYROSINKINAS För strukturbaserad approach: + direkt koppling till biosyntes TUBULIN-AKTIVA För deskriptorbaserad approach: + färre ad hoc antaganden + lätt att skala upp till stora dataset + fysikalisk-kemisk verklighet Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi TOPOISOMERAS-I TOPOISOMERAS-II ANTIMETABOLITER c.f. Schuffenhauer et al., J. Chem. Inf. Model. 2007. 47: 47-58. Rosén, J., Rickardson, L., Backlund, A., Gullbo, J., Bohlin, L., Larsson, R., Gottfries, J. (2009) ChemGPS-NP mapping of chemical compounds for prediction of anticancer mode of action. QSAR Comb. Sci. 28: 436-446 (2009). ...prediktion mot MOA dataset. Spåra kemisk syntes... ALKYLATORS ANTIMETABOLITES TUBULIN-ACTIVE Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Startsubstanser, • + • ger en serie, • + • den andra. TOPOISOMERASE-I INHIBITORS TOPOISOMERASE-II INHIBITORS Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi ...passar bra med Topo-II inhibitors, mer detaljer kan erhållas från OPLS-DA* analys. Data from: Synthesis and biological evaluation of phenanthrene derivatives as cytotoxic agents by Lee, C.-L. et al. in prep for JMC. *OPLS-DA = orthogonal partial least squares discriminant analysis Utvärdera okända substanser… OPLS-DA som visar... + Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Anders Backlund nAT = number of atoms Se = sum of Sanderson atomic electronegativity nBT = number of bonds Sp = sum of atoms polarizability ARR = aromatic ratio Ui = unsaturation index nAB = number of aromatic bonds nCar = number of aromatic carbons Avd. f. Farmakognosi Inst. f. Läkemedelskemi …deskriptorer som är starkt korrellerade. Charting biological activity in chemical property space using ChemGPS-NP Anders Backlund, Rosa Buonfiglio, Astrid Henz, Elisabet Vikeved, Kuei-Hung Lai & Thierry Kogej [presenterat i] Anders Backlund Budapest, 2015.08.23-27 Avd. f. Farmakognosi Inst. f. Läkemedelskemi Prediktera substanserna… Comparing campaigns… Probing & expanding model!... Investigating pharmacological similarity by charting chemical space Rosa Buonfiglio, Ola Engkvist, Péter Várkonyi, Astrid Henz, Elisabet Vikeved, Anders Backlund, and Thierry Kogej. Journal of Chemical Information and Modeling – Under revision. ID: ci-2015-00375m ChEMBL collection Activity cutoff = 10µM Human target annotation (EGID) Anders Backlund Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Div. of Pharmacognosy 514,257 active compounds divided in 909 protein target sets Dept. of Medicinal Chemistry (Enzyme, GPCR, Ion Channel, NHR, Kinase, Transporter) ChemGPS-NP v/s ECFP_4… Mapping… Kärven ChemGPS-NP ECFP_4 fingerprint Stora Salsta Sågmyra 263 Sundersknallen Nylunda Backavattnet Jobbsbol Paris Anders Backlund Anders Backlund Div. of Pharmacognosy Avd. f. Farmakognosi Inst. f. Läkemedelskemi Dept. of Medicinal Chemistry ECFP_4 fingerprint 116 667 7 ChemGPS-NP Berga Trollemölla Flyemyra Mapping… First stage of activity-mapping: ––––––––––––––––––––––––––––––– 80 defined activities 21669 compounds retrieved & mapped ……… 19508 with unique activities 2161 exhibiting ’polypharmacology’ Anders Backlund Anders Backlund Div. of Pharmacognosy Avd. f. Farmakognosi Inst. f. Läkemedelskemi Dept. of Medicinal Chemistry Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi 69 compounds with experimentally demonstrated activity against Leishmania, at concentrations <10µM, obtained from litterature & db’s. The corresponding chemical property space defined by this set, represented by 183779 computational nodes (in first 3D). An example from 63rd GA 2015 anti-Plasmodium, 2262cpd anti-Leishmania, 69cpd opiod receptor µ, 648cpd Anders Backlund Anders Backlund Div. of Pharmacognosy Avd. f. Farmakognosi Inst. f. Läkemedelskemi Dept. of Medicinal Chemistry 5-LOX, 1427cpd 12-LOX, 98cpd 15-LOX, 102cpd An example from 63rd GA 2015 O An example from 63rd GA 2015 O OH OH Amorfrutin B OH OH CH3 O CH3 CH3 CH3 Amorfrutin B CH3 O CH3 CH3 CH3 ”Natural PPARγ agonist with potent glucose-lowering properties.” SMILES: O=C(O)C1=C(O)C(C/C=C(CC/C=C(C)/C)\C)=C(OC)C=C1CCC2=CC=CC=C2 Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi ChemGPS-NP position: 0.417125 1.629921 1.938134 0.357786 0.624139 -1.511832 -0.382870 0.014741 Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi O OH OH CH3 O CH3 CH3 CH3 An example from 63rd GA 2015 An example from 63rd GA 2015 O OH Amorfrutin B OH Amorfrutin B CH3 693 PPARg-agonists O CH3 CH3 CH3 49 closest compounds in ChemGPS-NP chemical property space are coded for activity on PPARγ or PPARα. Number 50, 51 and 52 are known as AChE inhibitors: O O N Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi O O Avd. f. Farmakognosi Inst. f. Läkemedelskemi OH OH N H OH Anders Backlund O O N O O CH3 CH3 Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi Anders Backlund, Prof. - chemography & phylogeny Cecilia Alsmark, Assist.Prof. - bioinformatics, LGT Christina Wedén, Dr. - fungi & fungal compounds Anna Koptina, Dr. - biological testing Muaaz Alajlani, Dr. - TB & ChemGPS-NP Kuei-Hung ’Momo’ Lai - bioactive compounds from fungi Astrid Henz - chemography & phylogeny Åke Strese - LGT, Trichomonas Elisabet Vikeved - LGT, Leishmania Josefin Rosén, Dr. - chemography, 2009 Catarina Ekenäs, Dr. - ethnobotany, 2008 Sonny Larsson, Dr. - phylogenies, 2007 Petra Lindholm, Dr. - screening, 2005 ……………………………………………………………………………. Thierry Kogej, Dr. Rosa Buonfiglio, Dr. Johan Gottfries, Prof. - AstraZeneca R&D - AstraZeneca R&D - Gothenburg University O CH3 Researchgroup for Molecular Pharmacognosy – 2015.08.26 Support gratefully acknowledged! Anders Backlund Avd. f. Farmakognosi Inst. f. Läkemedelskemi O O O O CH3 O N O OH O O
© Copyright 2024