Clinical sublanguages
Vocabulary structure and its impact on term weighting
Due to its specific linguistic properties, the language found in clinical records has been characterized as a distinct sublanguage. Even within the clinical domain, though, there are major differences in language use, which has led to more fine-grained distinctions based on medical fields and document types. However, previous work has mostly neglected the influence of term variation. By contrast, we propose to integrate the potential for term variation in the characterization of clinical sublanguages. By analyzing a corpus of clinical records, we show that the different sections of these records vary systematically with regard to their lexical, terminological and semantic composition, as well as their potential for term variation. These properties have implications for automatic term recognition, as they influence the performance of frequency-based term weighting.
Article outline
- 1.Background
- 2.Related research
- 3.Sublanguages, semantic classes and variation types
- 3.1Sublanguages
- 3.2Classes of medical concepts
- 3.3Types of variation
- 4.Corpus study 1: Characterization of sublanguages across sections
- 4.1Corpus characteristics
- 4.2Preprocessing
- 4.3Annotation procedure and feature set
- 4.4Research questions of corpus study 1
- 4.5Results of corpus study 1
- 4.5.1Global lexical structure
- 4.5.2Distribution of semantic types across sections
- 4.5.3Distribution of term types across sections
- 4.6Discussion of corpus study 1
- 5.Corpus study 2: Impact of vocabulary structure on frequency-based term weighting
- 5.1Research questions of corpus study 2
- 5.2Corpus and preprocessing
- 5.3Term filtering
- 5.4Results of corpus study 2
- 5.4.1Precision
- 5.4.2Recall
- 5.5Discussion of corpus study 2
- 6.Conclusion
- Notes
-
References
References
Afzal, Zubair, Ewoud Pons, Ning Kang, Miriam Sturkenboom, Martijn J. Schuemie, and Jan A. Kors
2014 “
ContextD: An Algorithm to Identify Contextual Properties of Medical Terms in a Dutch Clinical Corpus.”
BMC Bioinformatics 15(1): 373.
Ahmad, Khurshid, Lee Gillam, and Lena Tostevin
1999 “
University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In
Proceedings of the 8th Text Retrieval Conference (
TREC-8), ed. by
Ellen M. Voorhees, and
Donna K. Harman, 717–724. Washington: National Institute of Standards and Technology.
Bansler, Jørgen P., Erling C. Havn, Kjeld Schmidt, and Troels Mønsted
2016 “
Cooperative Epistemic Work in Medical Practice: An Analysis of Physicians’ Clinical Notes.”
Computer Supported Cooperative Work 251: 503–546.
Bowker, Lynne, and Shane Hawkins
Chiaramello, Emma, Francesco Pinciroli, Alberico Bonalumi, Angelo Caroli, and Gabriella Tognola
2016 “
Use of ‘Off-the-Shelf’ Information Extraction Algorithms in Clinical Informatics: A Feasibility Study of MetaMap. Annotation of Italian Medical Notes.”
Journal of Biomedical Informatics 631: 22–32.
Doing-Harris, Kristina, Olga Patterson, Sean Igo, and John Hurdle
2013 “
Document Sublanguage Clustering to Detect Medical Specialty in Cross-Institutional Clinical Texts.” In
Proceedings of the 7th International Workshop on Data and Text Mining in Biomedical Informatics, 9–12. Accessed June 15, 2017.
Doing-Harris, Kristina, Yarden Livnat, and Stephane Meystre
2015 “
Automated Concept and Relationship Extraction for the Semi-Automated Ontology Management (SEAM) System.”
Journal of Biomedical Semantics 6 (15): 1–15.
Faber, Pamela
. “
Specialized Language Pragmatics.” In
A Cognitive Linguistics View of Terminology and Specialized Language ed. Pamela Faber, 213–239. New York: De Gruyter Mouton
2010.
Faber, Pamela, and Pilar León-Araúz
2016 “
Specialized Knowledge Representation and the Parameterization of Context.”
Frontiers in Psychology 71: 1–20.
Feldman, Keith, and Nicholas Hazekamp
2016 “
Mining the Clinical Narrative: All Text Are Not Equal.” In
IEEE International Conference on Healthcare Informatics 2016, ed.
Wai-Tat Fu,
Larry Hodges,
Kai Zheng,
Gregor Stiglic, and
Ann Blandford, 271–280. Piscataway, N.J.: IEEE.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima
. “
Natural Language Processing for Digital Libraries Automatic Recognition of Multi-Word Terms: The C-Value/NC-Value Method.”
International Journal on Digital Libraries 31 (
2000): 115–30.
Friedman, Carol
. “
Sublanguage Text Processing – Application to Medical Narrative.” In
Analyzing language in restricted domains ed. Ralph, Grishman R.,
Kittredge, R., 85–102. Hillsdale, NJ: Lawrence Erlbaum
1986.
Friedman, Carol, Pauline Kra, and Andrey Rzhetsky
2002 “
Two Biomedical Sublanguages: A Description Based on the Theories of Zellig Harris.”
Journal of Biomedical Informatics 351: 222–35.
Grigonyte, Gintare, Maria Kvist, Mats Wirén, Sumithra Velupillai, and Aron Henriksson
2016 “
Swedification Patterns of Latin and Greek Affixes in Clinical Text.”
Nordic Journal of Linguistics 39(1): 5–37.
Harris, Zellig Sabbettai
.
A Theory of Language and Information: A Mathematical Approach. Oxford: Clarendon Press
1991.
He, Zhe, Zhiwei Chen, Sanghee Oh, Jinghui Hou, and Jiang Bian
2017 “
Enriching Consumer Health Vocabulary through Mining a Social Q&A Site: A Similarity-Based Approach.”
Journal of Biomedical Informatics 691. Elsevier Inc.: 75–85.
Jensen, Lotte G., and Claus Bossen
2016 “
Factors Affecting Physicians’ Use of a Dedicated Overview Interface in an Electronic Health Record: The Importance of Standard Information and Standard Documentation.”
International Journal of Medical Informatics 871: 44–53.
Kaufman, David R., Barbara Sheehan, Peter Stetson, Ashish R. Bhatt, and I. Adele
2016 “
Natural Language Processing-Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study.”
JMIR Medical Informatics 41: e35.
Leaman, Robert, Ritu Khare, and Zhiyong Lu
2015 “
Challenges in Clinical Natural Language Processing for Automated Disorder Normalization.”
Journal of Biomedical Informatics 571: 28–37.
León-Araúz, Pilar, Pamela Faber, and Silvia Montero Martínez
. “
Specialized Language Semantics.” In
A Cognitive Linguistics View of Terminology and Specialized Language ed. Pamela Faber, 133–212. New York: De Gruyter Mouton
2010.
Lossio-Ventura, Juan Antonio, Clement Jonquet, Mathieu Roche, and Maguelonne Teisseire
Biomedical Term Extraction: Overview and a New Methodology.”
Information Retrieval Journal 19 (2016): 59–99.
Lövestam, Elin, Sumithra Velupillai, and Maria Kvist
2014 “
Abbreviations in Swedish Clinical Text – Use by Three Professions.”
Studies in Health Technology and Informatics 2051: 720–24.
Patterson, Olga O., and John F. Hurdle
2011 “
Document Clustering of Clinical Narratives: A Systematic Study of Clinical Sublanguages.” In
AMIA 2011 Annual Symposium, 1099–1107.
Periñán-Pascual, Carlos
2017 DEXTER: A Workbench for Automatic Term Extraction with Specialized Corpora.
Natural Language Engineering. Cambridge University Press.
Riveros, Alejandro, Maria De-Arteaga, Fabio A. Gonzalez, and Sergio Jimenez
2014 “
MindLab-UNAL: Comparing Metamap and T-Mapper for Medical Concept Extraction in SemEval 2014 Task 7.” In
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), edited by
Preslav Nakov and
Torsten Zesch, 424–27. Dublin, Ireland: Association for Computational Linguistics.
Roberts, Angus
2017 “
Language, Structure, and Reuse in the Electronic Health Record.”
AMA Journal of Ethics 19(3): 281–88.
Rosenbloom, S Trent, Joshua C. Denny, Hua Xu, Nancy Lorenzi, William W. Stead, and Kevin B. Johnson
2011 “
Data from Clinical Notes: A Perspective on the Tension between Structure and Flexible Documentation.”
Journal of the American Medical Informatics Association 181: 181–86.
Sager, Naomi, Margaret Lyman, Christine Bucknall, Ngo Nhan, and Leo Tick
1994 “
Natural Language Processing and the Representation of Clinical Data.”
Journal of the American Medical Informatics Association 11: 142–60.
Siklósi, Borbála, Attila Novák, and Gábor Prószéky
2016 “
Context-Aware Correction of Spelling Errors in Hungarian Medical Documents.”
Computer Speech & Language 351 (2016): 219–33.
Stetson, Peter D., Stephen B. Johnson, Matthew Scotch, and George Hripcsak
2002 “
The Sublanguage of Cross-Coverage.” In
Proceedings of the AMIA 2002 Annual Symposium, ed.
Isaac S. Kohana, 742–46.
Temnikova, Irina, Ivelina Nikolova, William Baumgartner, Galia Angelova, and Kevin Cohen
2013 “
Closure Properties of Bulgarian Clinical Text.” In
Recent Advances in Natural Language Processing 2013 Proceedings, ed.
Galia Angelova,
Kalina Bontcheva,
Ruslan Mitkov, 667–75.
Topaz, Maxim, Kenneth Lai, Dawn Dowding, Victor Lei, Anna Zisberg, Kathryn H. Bowles, and Li Zhou
2016 “
Automated Identification of Wound Information in Clinical Notes of Patients with Heart Diseases: Developing and Validating a Natural Language Processing Application.”
International Journal of Nursing Studies 641: 25–31.
Zeng, Qing T., Doug Redd, Guy Divita, Cynthia Brandt, and Jonathan R. Nebeker
2011 “
Characterizing Clinical Text and Sublanguage: A Case Study of the VA Clinical Notes.”
J Health Med Informat S3: 1–9.
Cited by
Cited by 3 other publications
Chai, Christine P.
2023.
Comparison of text preprocessing methods.
Natural Language Engineering 29:3
► pp. 509 ff.
Vezzani, Federica & Giorgio Maria Di Nunzio
2019.
Computational Terminology in eHealth. In
Digital Libraries: Supporting Open Science [
Communications in Computer and Information Science, 988],
► pp. 72 ff.
This list is based on CrossRef data as of 8 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.