Yuling Li1, Hans-Michael Mü ller1, Paul Sternberg1,2 1 Biology, California Institute of Technology, Pasadena, CA, USA 2 Howard Hughes Medical Institute, Pasadena, CA, USA Biocuration 2015 April 26th, 2015 Textpresso is an information extracting and processing package for biological literature. http://www.textpresso.org Full text literature searches of model organism research and subject-specific articles at individual sites Corpus: C. elegans, nematode, mouse, D.melanogaster, arabidopsis, neuroscience, cancer Full paper text search Keyword search Category words search(genes, cell, biological processes, disease, etc… ) Successor of current Textpresso system built from scratch with an emphasis on a “one-stop” search, view and curation experience for curators. bigger, faster, more functionalities, easier to use The site currently contains approximately 880,000 full text articles from the PMC Open Archive. (NXML or PDF format) 26 sub-corpora (experimental division): Agriculture, Clinical, Health, Nutrition, Protein, Animal, Crystallography, Immunology, Oncology, Psychology, Biology, Disease, Medicine, Pediatrics, Review, Cardiology, Genetics, Methodology, Pharmacology, Unclassified, Chemistry, Genomics, Neuroscience, Physiology, Virology, and C. elegans Full text paper search and text-mining: ◦ Document level and sentence level ◦ Search with pre-loaded category terms ◦ Text mining as a natural part of curation, curation results become part of training sets for text mining. Full text paper viewer: view papers in full text with highlighted keywords/terms Curate paper directly in paper viewer : Curate directly on papers (from search results) Save to curation DB(post to external DB) Save curations to Textpresso database or post to external DBs DOCUMENT LEVEL vs SENTENCE LEVEL search • You want to find out that smf-1 is expressed in dopaminergic neurons document level search OR a sentence level search DOCUMENT LEVEL no direct association, need to browse/read many articles. The words smf-1 and dopaminergic neuron not tightly associated SENTENCE LEVEL the first hit already gives you the result you are looking for Sentence level search may return more correct results Search scope(Doc or Sentence level) Filters(Author, Journal, Year, etc.) keyword to search Categories to search Refined lists of terms from new sources: Such as: ◦ ◦ ◦ ◦ Sequence Ontology, Chemical Entities of Biological Interest (Chebi), Phenotypic Quality Ontology (PATO). Etc… search using category only Use category search to find papers, examples: organic substance biosynthetic process (GO:1901576): =>find papers about Collagen biosynthesis catalytic activity (GO:0003824) AND gene (SO:0000704) =>find papers about Catalytic activities and related genes ciliary part (GO:0044441) AND deletion (SO:0000159) => Find papers about deletion mutations and cilium Check and view papers in curation Paper loaded from search results Pick a category of terms to highlight Type in a word to highlight Sentence selected to curate Select sentence by clicking on first word and last word Save curation to external databases Save curation to local database Edit curation entries in Database Natural Language processing: incorporate machine learning pipelines(such as SVM: a popular paper classifier for triaging) into TextpressoCentral Users can upload and manage their own corpus and their own categories of words, making them searchable. Workflow: automated curation pipelines can be managed under a control panel with TextpressoCentral. Old Textpresso: www.textpresso.org New TextpressoCentral under construction: send email to textpresso@caltech.edu for announcements estimated release: Mid-fall 2015
© Copyright 2024