The Four-Minute Mile of Healthcare AI Has Been Run

Blog

In May 1954, Roger Bannister did something the medical establishment had declared physiologically impossible: he ran a mile in under four minutes. Within 12 months, multiple other runners had done the same. The barrier, it turned out, was never physical. It was a belief problem. Once people saw the number was real, the ceiling disappeared.

The same dynamic has quietly played out in healthcare AI. For the better part of a decade, the promise of ambient artificial intelligence, capturing patient encounters in real time, automating clinical documentation, and meaningfully reducing the administrative burden crushing America's physician workforce, was treated with polite skepticism at best and outright dismissal at worst. Leaders wanted it to be true. Vendors claimed it was. And somewhere in between, the actual evidence never quite materialized in a form rigorous enough to compel a budget committee.

That has now changed.

KLAS Research, the independent healthcare research firm that has spent nearly three decades building credibility through unbiased, methodologically rigorous analysis, recently completed a ROI validation study across three large health systems — FMOL Health, McLeod Health, and Rush University Medical Center — all of which had deployed Suki's AI assistant. The findings don't just validate the category. They provide something the market has been waiting for: hard, independently audited numbers that hospital leadership can actually take into a budget conversation.

"We needed to pull back the curtain, look closely in a non-blinded way. These studies show that at this organization, of this size, of this caliber, these are the outcomes they've seen — and other organizations can apply like-for-like."

Mac Boyter

Research Director, KLAS Research

Watch the full webinar on demand to hear directly from KLAS Research Director Mac Boyter and Suki's Josh Margulies as they walk through the full findings, methodology, and implications of this landmark ROI validation study.

Why Methodology Is Everything

The healthcare technology market is saturated with vendor claims that evaporate under the pressure of today’s healthcare workload. KLAS was deliberate in structuring this as a non-blinded study to ensure the credibility of the data gathered. Rather than the anonymized, aggregated data typical of most research, the firm chose to identify all three health systems by name. This raises the accountability bar considerably for both the data and the organizations reporting it.

The imperative was to draw causative links, not merely correlative ones. That distinction matters enormously. It is relatively easy to show that clinician satisfaction went up after deploying a new tool. It is much harder to isolate that specific tool as the cause, strip out concurrent training initiatives, staffing changes, and workflow updates, and arrive at a number that can be credibly attributed to the technology itself. That forensic approach is precisely what KLAS applied here.

"What we wanted was to go beyond clinical burnout and cognitive load," Boyter explained during a recent conversation about the study's design. "We wanted to draw a clear, causative link between ambient AI and actual financial impact — as opposed to, 'well, we feel like we're seeing more patients.'"

The Time Savings Are Real. And the Downstream Effects Are Larger Than They Look

The most visible metric in any ambient AI study is time saved on documentation. Here, the numbers are credible and consistent across all three organizations:

reduction in time spent on notes

McLeod Health

decrease in after-hours documentation

FMOL Health

reduction in time spent on notes

FMOL Health

reduction in notes open longer than 7 days

FMOL Health

The 65% reduction in after-hours documentation deserves particular attention. Research consistently identifies "pajama time" — the hours clinicians spend completing notes after returning home — as one of the most corrosive contributors to physician burnout. It is not merely the volume of administrative work; it is the temporal intrusion of that work into personal life that accelerates disengagement and, ultimately, departure from the profession. A two-thirds reduction in that specific behavior is not a marginal improvement. It is a structural change in how physicians experience their working lives.

The 43% reduction in notes open beyond seven days has significant implications that run well past clinician wellness. Delayed notes create clinical risk when care decisions downstream depend on documentation that hasn't been finalized. They create revenue cycle delays that can stretch remuneration timelines for both clinicians and health systems. And they represent a patient safety exposure that risk departments monitor closely. Closing that gap by nearly half, across an entire organization, compounds into meaningful operational improvement over time.

"Our physicians are top-gun fighter pilots. Highly trained, highly skilled individuals that we've unfortunately reduced to data entry specialists. Ambient AI was designed to ameliorate that burden."

Mac Boyter

KLAS Research

The Financial Case: Beyond Feel-Good ROI

If the clinical and wellness data provides the moral argument for ambient AI, it is the financial data that converts skeptics into sponsors. The KLAS study produced figures that are conservative in their construction — deliberately so — which makes them more credible, not less.

net monthly gain per provider (baseline)

McLeod Health

increase in Level 4 visit capture

FMOL Health

total monthly impact per provider (with coding uplift)

McLeod Health

incremental monthly revenue per provider

FMOL Health

The mechanism behind the Level 4 visit improvement is worth understanding, because it illuminates something fundamental about how ambient AI creates value beyond time savings. It comes down to what practitioners informally call the "oh, by the way" problem.

Every physician knows the scenario: the visit is wrapping up, the assessment is complete, and as the patient reaches for the door handle they turn back and add, almost as an afterthought, "oh, by the way — I've been having some chest pressure when I go upstairs." What follows is clinically significant, potentially urgent, and completely outside the original visit scope. In the pre-ambient world, that conversation might be partially documented, imperfectly recalled eight hours later, or altogether missing from the note. The ambient AI captures it entirely. "With ambient AI now, you're documenting all of the conversation completely," Boyter noted. "Smoking cessation conversations, the 'oh, by the way' conversations — physicians are getting credit for all of the effort. And just as important, if not more important, all of that is captured for the patient record." This isn't over-coding. It is accurate coding for work that was actually performed but previously went undocumented and therefore unbilled. The distinction matters both ethically and commercially.

Organic Volume Growth: The Empowerment Effect

One of the more counterintuitive findings in the study concerns patient volume. Conventional wisdom in ambient AI deployments held that health systems would mandate additional patient visits to offset the cost of the technology. KLAS found something more interesting: when health systems deployed ambient AI as an empowerment tool rather than a productivity mandate, clinicians voluntarily chose to see more patients.

organic increase in patient volume

FMOL Health

increase in patient encounters per month

McLeod Health

The distinction between mandated and voluntary volume growth is not semantic. Health systems that made additional visits a condition of the technology's ROI justification tended to see lower utilization and worse outcomes overall. Those that deployed it purely for clinician benefit saw something different: physicians, freed from documentation burden, redirected that time toward the work they trained to do.

"The health systems that forced additional visits had much lower usage rates," noted Josh Margulies, VP of Brand Advocacy at Suki. "Those who wanted it purely for the clinician's benefit saw the clinician end up seeing more patients anyway — because that's what they want to do. They want to heal."

The Physician Shortage Context

The U.S. has not recovered to pre-pandemic physician capacity. Clinicians spend just 27% of their time on direct patient care. The remainder is consumed by administrative tasks. In that context, ambient AI isn't merely a quality-of-life improvement for physicians. It is, as KLAS researchers frame it, a structural necessity for maintaining patient access in a system already stretched past its limits.

Patient Experience: The Overlooked ROI

The study captured something that financial models rarely account for: the measurable improvement in patient experience that follows when physicians put down the keyboard and make eye contact.

McLeod Health saw a 6.3% increase in patient experience scores specifically related to provider listening and trust. FMOL Health reported 100% of surveyed clinicians using Suki as part of their workflow said it improved their work-life balance — a number that Margulies noted is essentially unprecedented in any survey context.

The downstream financial implications of patient experience improvements are real, if underappreciated. Research consistently shows that patients who feel genuinely heard and attended to are more likely to follow care plans, keep follow-up appointments, and ultimately pay their bills. A patient who leaves an encounter feeling dismissed doesn't just generate a poor satisfaction score — they generate a payment dispute. "When patient satisfaction increases, patient payments increase," Boyter observed. "Patients who feel they got real value from that visit are more apt to make good on their medical bills. We don't always see the knock-on effects."

What Health System Leaders Should Actually Take Away

The KLAS study is valuable as evidence. It is more valuable as a framework for how health systems should be evaluating ambient AI investments going forward. Based on the findings and the methodology behind them, four imperatives stand out for any CMO or CIO currently navigating this decision:

Measure causation, not correlation. The health systems in this study that generated the most compelling ROI narratives were the ones that did the hard forensic work of attributing specific outcomes to specific tools. That requires baseline data, deliberate pilot design, and a willingness to set aside outcomes that can't be cleanly attributed. Vendor-provided ROI estimates are a starting point, not a conclusion.
Bring revenue cycle leadership into the conversation from day one. A recurring finding across all three organizations was that ambient AI's financial impact extends well beyond clinical documentation — into coding accuracy, claim quality, and denial prevention. Health systems that treat ambient as a clinical IT purchase and exclude RCM leadership from the evaluation are leaving a significant portion of the ROI unmeasured.
Design for clinician empowerment, not productivity extraction. The data is unambiguous on this point: organizations that framed ambient AI as a tool to enable better clinical care — and resisted the temptation to use it as justification for mandating additional patient volume — achieved higher adoption rates, better clinician satisfaction outcomes, and ultimately stronger financial results.
Invest in clinician champions. Technology adoption in clinical settings does not follow the same dynamics as enterprise software. Clinicians are peer-influenced, skeptical of vendor narratives, and responsive to authentic testimony from colleagues they respect. The organizations in this study that achieved broad, sustained adoption all identified and cultivated physician champions early — and let those voices do the work that marketing never could.

"I've never seen the rate and velocity of utilization and adoption with any other technology that I've seen with ambient AI. There's no question: it's because the ROI is there, and because physicians demand it."

Mac Boyter

KLAS Research

The Baseline Problem — and Why It Matters for Competitive Evaluation

One nuance in the KLAS data deserves attention from health systems currently running competitive evaluations: the baseline from which improvement is measured matters enormously, and not all ambient tools are equal.

Rush University Medical Center's results in the study appear more modest at first glance — not because Suki underperformed there, but because Rush was migrating from a different ambient AI solution, not from manual documentation. Their baseline was already elevated. The marginal improvement over a competent incumbent is structurally smaller than the improvement over no tool at all, even if the absolute quality difference is significant.

For health systems evaluating whether to switch ambient vendors, the lesson is to normalize comparisons against the same baseline. A tool that generates strong results when compared against a standing solution is likely generating stronger absolute performance than a headline number against a zero-baseline competitor would suggest.

Looking ahead

The four-minute mile stood unbroken for decades, not because it was impossible, but because no one had proved it wasn't. Roger Bannister's achievement in 1954 didn't just validate his own capability — it recalibrated what the entire field believed was achievable. Within a year, the record fell again.

The KLAS validation study across FMOL, McLeod, and Rush represents a similar moment for ambient AI. The numbers are independently verified, the methodology is rigorous, and the results are publicly attributable. The four-minute mile has been run. The only remaining question is who runs the next one, and how quickly.

Watch the Full Webinar On-Demand

Hear directly from KLAS Research Director Mac Boyter and Suki's Josh Margulies as they walk through the full findings, methodology, and implications of this landmark ROI validation study. Watch the webinar recording now.