Insight Driven Health Natural Language Processing in the EHR Lifecycle Cecil O. Lynch, MD, MS cecil.o.lynch@accenture.com Health & Public Service Outline •Medical Data Landscape •Value Proposition of NLP •Strategies for voice and text processing •Tooling options •Integration with the EMR lifecycle Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Medical Data Landscape Copyright Copyright©©2010 2010Accenture AccentureAll AllRights RightsReserved. Reserved.Accenture, Accenture,itsitslogo, logo,and andHigh HighPerformance PerformanceDelivered Deliveredare aretrademarks trademarksofofAccenture. Accenture. Medical Data Landscape Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Medical Data – Where is it? Two Types of Content 1. Structured Content - Typically found in a database A. B. 20% UMLS RxNorm Fits a pre-defined data model Fits well into relational tables. Examples • Databases • XML Data • Data warehouses • Enterprise systems (CRM, ERP, etc.) 2. Unstructured Content - Can be found throughout an organization 80% A. Does not fit a pre-defined data model B. Does not fit well into relational tables. Examples - Text-based • Email messages • Office documents • Web documents • BLOB (Binary Large Object) field type (e.g. Transcribed Doctor’s Notes) Examples – Non-Text-based • Voice/Audio files (e.g. Dictated Doctor’s Notes) • Images • Video files Medical Charts Slide from DataSkill Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. NLP Value Proposition Copyright Copyright©©2010 2010Accenture AccentureAll AllRights RightsReserved. Reserved.Accenture, Accenture,itsitslogo, logo,and andHigh HighPerformance PerformanceDelivered Deliveredare aretrademarks trademarksofofAccenture. Accenture. NLP Value Proposition Data from IBM study at Seton Healthcare Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Case Study 5 – BJC HealthCare Making healthcare smarter BJC Healthcare “NLP Results” Results: Follow-up Appointments and Diagnoses Element Precision Recall Alcohol Use 91.8% 96.2% Alcohol Substance 95% 74% Alcohol Volume 96.3% 100.0% Alcohol Duration 86.7% 93.3% Alcohol Quit Duration 100.0% 96.1% Alcohol Family History 95.8% 83.3% Tobacco Use 90.0% 93.0% Medications 90.0% 92.0% |8 Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Strategies for Voice and Text Analytics Copyright Copyright©©2010 2010Accenture AccentureAll AllRights RightsReserved. Reserved.Accenture, Accenture,itsitslogo, logo,and andHigh HighPerformance PerformanceDelivered Deliveredare aretrademarks trademarksofofAccenture. Accenture. Strategic Approach • Voice recognition to standard EMR UI • Voice recognition to a standard model • Voice recognition to unstructured text document • Content analytics on unstructured documents written to EMR fields • Content analytics on unstructured documents written to a data warehouse • Content analytics used at runtime and for predictive analytics and decision support Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Is there a limit to Structured Data? Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Tooling Options Copyright Copyright©©2010 2010Accenture AccentureAll AllRights RightsReserved. Reserved.Accenture, Accenture,itsitslogo, logo,and andHigh HighPerformance PerformanceDelivered Deliveredare aretrademarks trademarksofofAccenture. Accenture. NLP Pipelines - UIMA Unstructured Information Management Architecture • 4 Major Software Divisions – It specifies component interfaces in an analytics pipeline – It describes a set of Design patterns – It suggests two data representations: an in-memory representation of annotations for high-performance analytics and an XML representation of annotations for integration with remote web services. – It suggests development roles allowing tools to be used by users with diverse skills Is an OASIS Standard Reference Implementation Donated by IBM (SourceForge) Maintained by the Apache Foundation Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Tooling Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Tooling - Continued Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Tooling - Continued Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. cTAKES • Clinical Text Analysis and Knowledge Extraction System (Mayo Clinic, Children's Hospital Boston) – http://sourceforge.net/projects/ohnlp/files/cTAKES/ • Components – – – – – – – – – – – – Sentence boundary detector (OpenNLP) Rule-based tokenizer to separate punctuations from words Normalizer (NLM’s NORM) Part-of-speech tagger (OpenNLP) Phrasal chunker (OpenNLP) Dictionary lookup annotator Context annotator Negation detector (NegEx) Dependency parser Module for the identification of patient smoking status Drug mention annotator Context dependent tokenizer Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. cTAKES Derivation cTAKES Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Refined Lucene OWL Code Annotation Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. ClearTK • ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA. – http://code.google.com/p/cleartk/ (UCB) – A common interface and wrappers for popular machine learning libraries such as SVMlight, LIBSVM, OpenNLP MaxEnt, and Mallet. – A rich feature extraction library that can be used with any of the machine learning classifiers. Under the covers, ClearTK understands each of the native machine learning libraries and translates your features into a format appropriate to whatever model you're using. – Infrastructure for creating NLP components for specific tasks such as partof-speech tagging, BIO-style chunking, named entity recognition, semantic role labeling, temporal relation tagging, etc. – Wrappers for common NLP tools such as the Snowball stemmer, the OpenNLP tools, the MaltParser dependency parser, and the Stanford CoreNLP tools. – Corpus readers for collections like the Penn Treebank, ACE 2005, CoNLL 2003, Genia, TimeBank and TempEval. Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. EMR Integration Options Copyright Copyright©©2010 2010Accenture AccentureAll AllRights RightsReserved. Reserved.Accenture, Accenture,itsitslogo, logo,and andHigh HighPerformance PerformanceDelivered Deliveredare aretrademarks trademarksofofAccenture. Accenture. Optimal Goal • Goal is: – Convert unstructured to structured data – Code this data into standard Meaningful Use terminologies – Write the data to standard information models for health care data elements in standard ISO Healthcare datatypes Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. City of Hope – A Proposed Architecture Reporting and Business Intelligence ETL Content Analytics Natural Language Processing Allscripts Healthcare Accelerator Staging - Triplestore HL7 RIM V3 ETL Logical Layer Connection Physical Layer Staging - Relational Allscripts Database EMR OLTP ETL EDW and Datamarts OLAP Analytics Predictive Analytics Statistics Datamining ETL RDF Datamining Triplestore Datamart • Risk stratification • Treatment/Protocol evaluations • Research cohort comparisons • Real-time clinical decision support • Disease management • Population health management • Personalized medicine / genomics • Performance assessment • Patient profiling • Treatment cost calculations ETL ETL High Performance Analytics Tool Examples: SPARQL, OWL, IBM SLRP, IBM IODT , OntoBroker, Sesame, Jena RDF – Resource Description Framework OWL – Web Ontology Language SPARQL – Protocol and RDF Query Language IBM SLRP – IBM Semantic Layer Research Platform IBM IODT – IBM’s toolkit for ontology-driven development OntoBroker – Semantic web middleware Sesame – Framework for querying and analyzing RDF data. Jena – Semantic Web Framework for Java WATSON for Healthcare WEA Advisor Framework Utilization Management Advisor Tools APIs Methods Data Platform Diagnosis and Treatment Advisor Massively Parallel Infrastructure |25 Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Wrap Up – Questions ? ? cecil.o.lynch@accenture.com Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Thank You - Credits • IBM jStart Team – Randall Wilcox, Kevin Conroy • Dataskill – Victor Bagwell - CIO • City of Hope – Naveen Raja, D.O. – CMIO – Ying Liu, Ph.D. Bioinformatics Group • Accenture – German Acuna – Suniti Ponkshe – Jim Traficant Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture.
© Copyright 2025