Columbia University New York, New York, United States
Background: Clinicians managing Type 2 diabetes face complex treatment decisions across distinct evidence bases, bariatric surgery, intensive lifestyle, and pharmacotherapy, each supported by landmark but disparate clinical trials. No existing tool integrates findings from these trials into a single queryable interface accessible at the point of care.
Objective: To develop and evaluate an exploratory AI-powered clinical decision support system enabling clinicians to query peer-reviewed diabetes literature using natural language patient descriptions.
Methods: We developed EvidenceRx, a Retrieval-Augmented Generation (RAG) system indexing 160 peer-reviewed publications from bariatric surgery, intensive lifestyle intervention, and pharmacotherapy trials and related literature. PDFs were processed through a custom extraction pipeline preserving tabular data, chunked at 1,024 characters with study-context headers, and embedded using sentence-transformers (all-MiniLM-L6-v2) into a FAISS vector index. A local large language model (Llama 3 8B via Ollama) generates evidence-grounded responses with inline source citations. A three-signal confidence scoring system — combining retrieval distance, keyword relevance overlap, and mechanistic question detection — classifies each response as HIGH, MEDIUM, or LOW evidence match. The system includes an out-of-scope detector that intercepts queries about agents not represented in the corpus. A structured Patient Profile Mode accepts clinical inputs (BMI, A1C, diabetes duration, comorbidities, prior treatments) and generates multi-query evidence summaries. An automated test suite of 20 queries across four categories (answerable, patient profile, edge case, adversarial) was used for evaluation. AI-assisted debugging tools were used during software development.
Results: In preliminary evaluation, the system correctly refused to answer out-of-scope or unsupported questions in approximately 70% of test cases, with HIGH evidence match responses demonstrating accurate inline citations to primary literature. The system was demonstrated live at the DAX 2026 Data Science and AI Exchange Innovation Showcase at Columbia University, where attendees including clinicians and researchers interacted with the tool. User engagement centered on cross-study synthesis questions comparing surgical, lifestyle, and pharmacological treatment pathways for individual patient profiles.
Conclusion: EvidenceRx demonstrates the feasibility of a locally deployable, citation-grounded RAG system for diabetes treatment decision support. By integrating evidence from literature and communicating retrieval confidence explicitly, the system addresses a critical gap in evidence synthesis at the point of care. Future work includes prospective evaluation with clinicians, corpus expansion, and work for potential deployment.
*Unless otherwise noted, all abstracts presented at ENDO must not be released to the press or the public until the date and time of presentation. For oral presentations, the abstracts are embargoed until the session begins. The Endocrine Society reserves the right to lift the embargo on specific abstracts that are selected for promotion prior to or during ENDO.*