Abstract
We investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach.
We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task.
兩次DNN方法,一次生成詞嵌入,第二次用作實體識別。
Introduction
介紹electronic health record的應用價值,以及面臨實體識別的問題。
Many existing clinical NLP systems use dictionariesand rule-based methods to identify clinical concepts, such as MedLEE, MetaMap, cTAKES.
More recently, a number of challenges on NER involving shared tasks in clinical text have been organized, including the 2009 i2b2, the 2010 i2b2, the 2013 Share/CLEF challenge and the 2014 Semantic Evaluation challenge.(有空著重了解下=_=)
Conventional ML-based methods have been applied to Chinese clinical NER tasks.
In summary, current efforts on NER in Chinese clinical text primarily focus on investigating different machine learning algorithms or optimizing combinations of different types of features via human engineering.
最近越來越多人對基于深度學習的NLP系統(tǒng)感興趣。這種系統(tǒng)能從大規(guī)模的未標注的語料通過非監(jiān)督的方法學習到有用的特征表達式。深度學習是一個能通過深度神經網絡學習高級特征表達的機器學習的研究領域?,F(xiàn)在在圖像處理,語音自動識別和機器翻譯方面獲得了先進的表現(xiàn)。NLP研究者開發(fā)出DNNs從大量的未標注的數(shù)據(jù)中去學習有用的特征,不再用花費大量時間去尋找任務特性的特征。Dr. Ronan Collobert的系統(tǒng)通過單個深度神經網絡在很多NLP任務中獲得了最先進的表現(xiàn)。
本文首個應用DNNs研究中文病歷NER,并對比了傳統(tǒng)的CRF方法。