A cluster-randomized study has shown that using a large language model for automated assessment of the CMS quality measure SEP-1 (Severe Sepsis and Septic Shock Management Bundle) and providing timely feedback to treating physicians can significantly increase compliance with this measure. The study was published in JAMA Network Open.
In the study, 66 emergency physicians at two University of California, San Diego, university hospitals were randomized. A total of 301 patients who met the inclusion criteria for the SEP-1 measure were treated by the physicians. The intervention group received an automated, large language model-based assessment of SEP-1 compliance with targeted feedback upon discharge. The control group followed the usual procedure.
SEP-1 measure compliance was 82.9 percent in the intervention group and 70.1 percent in the control group. This represented an absolute improvement of 13.0 percentage points (odds ratio 2.10; 95% confidence interval 1.15–3.81; P = 0.02). The largest difference was observed in the completion of the 30-ml/kg fluid bolus, which was less frequently omitted in the intervention group.
The agreement between the automated LLM assessment and manual expert review was 92 percent. There were no significant differences between the groups regarding 30-day mortality or intensive care unit admissions.
The study demonstrates that AI-powered, timely quality measurement and feedback can improve adherence to complex quality measures. The authors see this as an approach to overcome existing limitations of manual quality reporting and support a learning healthcare system. At the same time, it remains to be seen whether improved compliance with the measure also leads to clinically relevant improvements in patient outcomes. The study was conducted at only two centers and examined exclusively the SEP-1 measure, so its transferability to other settings and quality indicators still needs to be investigated.
