LLM-Powered Clinical Calculator

Large Language Models (LLMs) are a now-ubiquitous type of generative AI model trained to generate natural language. One of the most common applications of LLMs are chatbots, such as ChatGPT, where these models generate natural language in response to messages from a human user. While LLMs have great potential to aid in clinical decision-making, they suffer from important limitations including a tendency to hallucinate (i.e. confabulate, confidently make factually incorrect statements without awareness) and an inability to reliably perform mathematical calculations.

Clinical calculators (a.k.a. medical calculators) are quantitative clinical decision support tools widely used in medicine. Physicians commonly access them through web services such as MDCalc.

We developed and tested a chatbot powered by an LLM assisted by verified software implementations of commonly used clinical calculators and metadata about these calculators via Retrieval-augmented generation (RAG). On simulated physician clinical calculator workloads, our chatbot is significantly more accurate than an unassisted off-the-shelf LLM, especially with regard to the correctness of calculated results. Our work has been submitted to the AMIA 2025 Annual Symposium. A preprint is available on arXiv and our code is shared on Github.