Maulik R. Kamdar and Michel Dumontier
(maulikrk@stanford.edu, michel.dumontier@stanford.edu)
Introduction & Objectives
Ebola virus (EBOV; formerly designated Zaire ebolavirus) is a lethal Category A human pathogen, of the family Filoviridae, and is responsible for
the Ebola virus disease (EVD). EVD can cause severe hemorrhagic fever and has an average Case Fatality Rate of 71%.
The ongoing EVD epidemic, which began in Guinea in February 2014, has spread exponentially across 5 other countries in Western Africa and
has infected at least 9380 people (as of February 15, 2015). (http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/index.html)
The Viral Hemorrhagic Fever Consortium sequenced a set of 99 EBOV virus sequences from 78 confirmed patients in Sierra Leone to 2000x
coverage. (BioProject: http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA257197)
There is a dire need to consolidate and integrate all available knowledge that we currently possess or could be retrieved from open-access
knowledge bases and available literature, on the EBOV genome.
This could lead to a better understanding of the underlying mechanisms of EVD, determination of the druggability of target domains in EBOV
and identification of small molecules which could show positive binding affinity.
Information required to undertake the above goals is not yet available at a single, aggregated source. As a consequence, the biomedical
researcher has to traverse across several data portals to retrieve relevant knowledge before using it to formulate hypotheses.
Ebola-KB Vocabulary
Availability
http://ebola.semanticscience.org
Linked Data
Use Cases
Retrieve knowledge from KEGG and
DrugBank on small molecule ligands which
bind to the Ebola protein ‘[AIG96339.1]
Polymerase’
Identify the biological activities and
associated scientific publications of the
EBOV Gene ‘[AIG96339.1] Polymerase’
Obtain scientific publications that provide
evidence for the binding of the ligands
‘[RFP] Rifampicin’, ‘[RBT] Rifabutin’ and
‘[RPT] Rifapentine’
Reference: http://lod-cloud.net/
Bio2RDF Release 3 (http://bio2rdf.org) has
11B triples from 35 biomedical sources.
Cross-linked with several data sources,
like Chem2Bio2RDF and MGD.
Develop Bio-mashups and Linked
Biomedical Dataspaces facilitating
in silico drug discovery
Rifabutin
[DB00615]
Antibacterial
[antimycobacterial]
Rifamycins
DNA-dependent RNA
polymerase inhibitor, RNA
synthesis inhibitor
Rifampicin
[DB01045]
Antibacterial
Ribosome
Rifamycins
DNA-dependent
RNA
polymerase inhibitor, RNA
synthesis inhibitor
23S
rRNA
of
50S
ribosomal
subunit,
protein synthesis inhibitor
Rifapentine
[DB01201]
Antibacterial
Rifamycins
DNA-dependent RNA
polymerase inhibitor, RNA
synthesis inhibitor
Pharmacia Inc.
Kaiser Foundation
Hospital
Pfizer Inc.
Novartis AG
Eon Labs
Sanofi-Aventis Inc.
UDL Laboratories
Mckesson Corp.
Ciba Geigy Ltd.
and 30 others
Sanofi-Aventis Inc.
Gruppo Lepetit SPA
CONSTRUCT {
?bioUri ebola:chemicalName ?title; ebola:molecularWeight ?molWeight;
ebola:molecularFormula ?formula; graph:pdb-page ?pdbInfo;
graph:drugbank-label ?drugbankLabel; graph:packager
?packagerTitle;
graph:mechanism-of-action ?mechAction; graph:pharmacology
?pharmacologyDesc
} WHERE {
<http://bio2rdf.org/genbank:AIG96639.1> ebola:hasKeyword ?keyword .
?keyword ebola:x-pdb ?pdbUri .
?pdbUri ebola:hasLigand ?bioUri; ebola:pdbPage ?pdbInfo .
?bioUri ebola:chemicalName ?title; ebola:molecularWeight ?molWeight;
ebola:molecularFormula ?formula; ebola:x-drugbank
?drugbankUri .
?drugbankUri rdfs:label ?drugbankLabel
FILTER( xsd:double( ?molWeight ) < 500 ) .
{
SELECT ?drugbankUri ?mechAction ?packagerTitle ?pharmacologyDesc
WHERE {
SERVICE <http://cu.drugbank.bio2rdf.org/sparql> {
?drugbankUri drugbank:mechanism-of-action ?action;
drugbank:packager ?packager;
drugbank:pharmacology ?pharmacology .
?action dc:description ?mechAction .
?packager dc:title ?packagerTitle .
?pharmacology dc:description ?pharmacologyDesc
}
} GROUP BY ?drugbankUri
}
}
Listing: SPARQL CONSTRUCT Query for Use Case 2
Table: Knowledge on potential EBOV ‘[AIG96639.1] Polymerase’
Protein-binding Ligands retrieved from KEGG and DrugBank using
the Ebola-KB endpoint
Ebola-KB Dashboard
EBOV Genomic Wheel
System Architecture
Data sources:
NCBI Gene
PubMed
InterPro
PDB
Gene Ontology
DrugBank
KEGG
Future Work
Include PubChem information on BioAssays and activities of small
molecules which bind potential virus targets in the Ebola-KB, by querying
the NCBI E-utilities with specific EBOV keywords.
Delve into methods which predict small molecule binding sites on
proteins with a known or unknown structure, given a protein sequence.
Use Mouse Model Phenotypes to study the binding profiles of the
aggregated molecules against the EBOV targets.
Conduct a Subjectivistic User-driven study by evaluating the Ebola-KB
and the Ebola-KB Dashboard in a clinical setting.
Discussion: Challenges and Limitations
As the sequencing of the 2014 strain of the Zaire Ebolavirus was just completed very recently, there is lack of up-to-date and integrated knowledge
pertaining to gene functions, protein interactions and activities of binding ligands.
Some of the popular knowledge-bases like STITCH, the resource for chemical-protein interaction networks, did not provide any information on the small
molecules which bind the EBOV protein sequences, or those binding other similar proteins.
Very few EBOV InterPro domains had actually been annotated with Gene Ontology Terms.
Our approach to generate and use EBOV Keywords as search terms for PDB and PubMed has incorporated some ‘noise’ in the Ebola-KB, for example
information on ligands binding ‘DNA Polymerase’ in species which may not be useful. More rigorous protein and domain-similarity features could be used.
This work was funded by Stanford University start-up
fund to Michel Dumontier.
Querying requires user knowledge on SPARQL, as well as necessitates the availability and better uptime of SPARQL Endpoints, which is not always possible.
© Copyright 2025