publication . Article . Other literature type . 2019

Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.

Michel Oleynik; Markus Kreuzthaler;
Open Access
  • Published: 01 Nov 2019 Journal: Journal of the American Medical Informatics Association, volume 26, pages 1,247-1,254 (issn: 1067-5027, eissn: 1527-974X, Copyright policy)
  • Publisher: Oxford University Press (OUP)
Abstract
<jats:title>Abstract</jats:title> <jats:sec> <jats:title>Objective</jats:title> <jats:p>Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset.</jats:p> </jats:sec> <jats:sec> <jats:title>Materials and Methods</jats:title> <jats:p>We participated in the 2018 National NL...
Subjects
free text keywords: Health Informatics, Research and Applications, natural language processing, data mining, machine learning, deep learning
Related Organizations
46 references, page 1 of 4

1 Meystre SM, Savova GK, Kipper-Schuler KC, et al Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008; 17: 128–44.

2 Hebal F, Nanney E, Stake C, et al Automated data extraction: merging clinical care with real-time cohort-specific research and quality improvement data. J Pediatr Surg 2017; 52 1: 149–52.27865473 [PubMed]

3 Safran C, Bloomrosen M, Hammond WE, et al Toward a national framework for the secondary use of health data: an American medical informatics association white paper. J Am Med Inform Assoc 2007; 14 1: 1–9.17077452 [OpenAIRE] [PubMed]

4 Mann CJ.Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emerg Med J 2003; 20 1: 54–60.12533370 [OpenAIRE] [PubMed]

5 Geneletti S, Richardson S, Best N.Adjusting for selection bias in retrospective, case–control studies. Biostatistics 2008; 10 1: 17–31.18482997 [PubMed]

6 Pan SJ, Yang Q.A survey on transfer learning. IEEE Trans Knowl Data Eng 2010; 22 10: 1345–59.

7 Goodfellow I, Bengio Y, Courville A.Deep Learning. Cambridge, MA: MIT Press; 2016.

8 Mikolov T, Chen K, Corrado G, et al Efficient estimation of word representations in vector space. arXiv 2013 Sep 7 [E-pub ahead of print].

9 Arnold S, Gers FA, Kilias T, et al Robust named entity recognition in idiosyncratic domains. arXiv 2016 Aug 24 [E-pub ahead of print].

10 Bojanowski P, Grave E, Joulin A, et al Enriching word vectors with subword information. Trans Assoc Comput Linguist 2017; 5: 135–46.

11 Joulin A, Grave E, Bojanowski P, et al Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers; 2017: 427–31. https://www.aclweb.org/anthology/papers/E/E17/E17-2068/ Accessed May 3, 2019.

12 Johnson AEW, Pollard TJ, Shen L, et al MIMIC-III, a freely accessible critical care database. Sci Data 2016; 3 1: 160035.27219127 [OpenAIRE] [PubMed]

13 Zhang Y, Chen Q, Yang Z, et al BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci Data 2019; 6 1: 52.31076572 [OpenAIRE] [PubMed]

14 Chen Q, Peng Y, Lu Z. BioSentVec: creating sentence embeddings for biomedical texts. arXiv 2019 Jun 19 [E-pub ahead of print].

15 Shivade C, Raghavan P, Fosler-Lussier E, et al A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014; 21 2: 221–30.24201027 [OpenAIRE] [PubMed]

46 references, page 1 of 4
Abstract
<jats:title>Abstract</jats:title> <jats:sec> <jats:title>Objective</jats:title> <jats:p>Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset.</jats:p> </jats:sec> <jats:sec> <jats:title>Materials and Methods</jats:title> <jats:p>We participated in the 2018 National NL...
Subjects
free text keywords: Health Informatics, Research and Applications, natural language processing, data mining, machine learning, deep learning
Related Organizations
46 references, page 1 of 4

1 Meystre SM, Savova GK, Kipper-Schuler KC, et al Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008; 17: 128–44.

2 Hebal F, Nanney E, Stake C, et al Automated data extraction: merging clinical care with real-time cohort-specific research and quality improvement data. J Pediatr Surg 2017; 52 1: 149–52.27865473 [PubMed]

3 Safran C, Bloomrosen M, Hammond WE, et al Toward a national framework for the secondary use of health data: an American medical informatics association white paper. J Am Med Inform Assoc 2007; 14 1: 1–9.17077452 [OpenAIRE] [PubMed]

4 Mann CJ.Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emerg Med J 2003; 20 1: 54–60.12533370 [OpenAIRE] [PubMed]

5 Geneletti S, Richardson S, Best N.Adjusting for selection bias in retrospective, case–control studies. Biostatistics 2008; 10 1: 17–31.18482997 [PubMed]

6 Pan SJ, Yang Q.A survey on transfer learning. IEEE Trans Knowl Data Eng 2010; 22 10: 1345–59.

7 Goodfellow I, Bengio Y, Courville A.Deep Learning. Cambridge, MA: MIT Press; 2016.

8 Mikolov T, Chen K, Corrado G, et al Efficient estimation of word representations in vector space. arXiv 2013 Sep 7 [E-pub ahead of print].

9 Arnold S, Gers FA, Kilias T, et al Robust named entity recognition in idiosyncratic domains. arXiv 2016 Aug 24 [E-pub ahead of print].

10 Bojanowski P, Grave E, Joulin A, et al Enriching word vectors with subword information. Trans Assoc Comput Linguist 2017; 5: 135–46.

11 Joulin A, Grave E, Bojanowski P, et al Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers; 2017: 427–31. https://www.aclweb.org/anthology/papers/E/E17/E17-2068/ Accessed May 3, 2019.

12 Johnson AEW, Pollard TJ, Shen L, et al MIMIC-III, a freely accessible critical care database. Sci Data 2016; 3 1: 160035.27219127 [OpenAIRE] [PubMed]

13 Zhang Y, Chen Q, Yang Z, et al BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci Data 2019; 6 1: 52.31076572 [OpenAIRE] [PubMed]

14 Chen Q, Peng Y, Lu Z. BioSentVec: creating sentence embeddings for biomedical texts. arXiv 2019 Jun 19 [E-pub ahead of print].

15 Shivade C, Raghavan P, Fosler-Lussier E, et al A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2014; 21 2: 221–30.24201027 [OpenAIRE] [PubMed]

46 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue