Probabilistic medical predictions of large language models.

NPJ digital medicine

Authors	Bowen Gu Rishi Desai Kueiyu Lin Jie Yang
Abstract	Large Language Models (LLMs) have shown promise in clinical applications through prompt engineering, allowing flexible clinical predictions. However, they struggle to produce reliable prediction probabilities, which are crucial for transparency and decision-making. While explicit prompts can lead LLMs to generate probability estimates, their numerical reasoning limitations raise concerns about reliability. We compared explicit probabilities from text generation to implicit probabilities derived from the likelihood of predicting the correct label token. Across six advanced open-source LLMs and five medical datasets, explicit probabilities consistently underperformed implicit probabilities in discrimination, precision, and recall. This discrepancy is more pronounced with smaller LLMs and imbalanced datasets, highlighting the need for cautious interpretation, improved probability estimation methods, and further research for clinical use of LLMs.
Year of Publication	2024
Journal	NPJ digital medicine
Volume	7
Issue	1
Pages	367
Date Published	12/2024
ISSN	2398-6352
DOI	10.1038/s41746-024-01366-4
PubMed ID	39702641
Links

Recent ӳ��ý Publications