Modeling Diagnostic Label Correlation for Automatic ICD Coding

Jun 1, 2021·

Shang-Chi Tsai

Chao-Wei Huang

Yun-Nung Chen

· 0 min read

Abstract

Given the clinical notes written in electronic health records (EHRs), it is challenging to predict the diagnostic codes which is formulated as a multi-label classification task. The large set of labels, the hierarchical dependency, and the imbalanced data make this prediction task extremely hard. Most existing work built a binary prediction for each label independently, ignoring the dependencies between labels. To address this problem, we propose a two-stage framework to improve automatic ICD coding by capturing the label correlation. Specifically, we train a label set distribution estimator to rescore the probability of each label set candidate generated by a base predictor. This paper is the first attempt at learning the label set distribution as a reranking module for medical code prediction. In the experiments, our proposed framework is able to improve upon best-performing predictors on the benchmark MIMIC datasets.

Type

Conference paper

Publication

2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2021)

Last updated on Jun 1, 2021

← PLM-ICD: Automatic ICD Coding with Pretrained Language Models Jul 1, 2022

Leveraging Hierarchical Category Knowledge for Data-Imbalanced Multi-Label Diagnostic Text Understanding Nov 1, 2019 →