Extracting Suicide Causes and Discovering Annotation Inconsistencies in Death Investigation Notes through NLP Approaches
Yifan Peng, PhD
Assistant Professor, Division of Health Sciences Department of Population Health Sciences at Weill Cornell Medicine
Abstract: Suicide presents a major public health challenge worldwide affecting people across the lifespan, demanding immediate attention and a comprehensive understanding of the underlying suicide causes. The National Violent Death Reporting System (NVDRS) is a population-based active surveillance system that collects information on violent deaths that occurred among both residents and non-residents in 50 U.S. states, the District of Columbia, and Puerto Rico. It serves as a valuable repository of death investigation notes, offering crucial information and contexts surrounding suicide deaths, which are essential for developing NLP systems that can enhance our understanding of suicide causes.In this talk, I will first discuss our work on effectively extracting suicide causes from death investigation notes using NLP approaches. Through analysis, we noticed a significant performance mismatch between different states. I will then argue that the inherent data annotation inconsistencies exist in NVDRS between different states and even within one single state, which can be one of the main causes of the observed performance gap. This reveals an unmet need for approaches to identifying and addressing the annotation inconsistencies in death investigation notes. I will then describe our NLP approach designed to explore the data annotation inconsistencies in NVDRS death investigation notes, how we identified problematic data instances that may contribute to these in consistencies, and further verified the effectiveness of label correction. Finally, I will discuss the current limitations and future directions.
Bio: Yifan Peng, PhD, is an Assistant Professor in the Division of Health Sciences Department of Population Health Sciences at Weill Cornell Medicine. He graduated from UD in 2016, under the supervision of Dr. Cathy Wu and Dr. Vijay Shanker. His main research interests include BioNLP and medical image analysis. He has published in major AI and healthcare informatics venues, including ACL, CVPR, MICCAI, and ICHI, as well as medical venues, including Nature Medicine, Nucleic Acids Research, npj Digital Medicine, and JAMIA. His research has been funded by federal agencies, including NIH and NSF and industries such as Amazon and Google. He is an Editorial Board Member for the Journal of Biomedical Informatics.