James Corcoran from STAC Research on performance benchmarking, AI infrastructure, and the new economics of LLMs

James Corcoran from STAC on performance benchmarking, AI infrastructure, and the new economics of LLMs

What does good AI performance really mean in 2025? In this episode of Data in the AI Era, James Corcoran, Head of AI at STAC, joins us to challenge outdated assumptions about performance benchmarking and AI infrastructure strategy.

 

In this episode of Data in the AI Era, James Corcoran, Head of AI at STAC, joins KX’s Peter Finter for a deep dive into how enterprise AI performance should really be measured in 2025 and beyond. They unpack why traditional benchmarks are losing relevance, what’s replacing them, and how infrastructure choices can make or break AI strategy.

James shares what STAC’s latest research reveals about the hidden costs of model serving platforms, the tipping point for building your own LLM stack, and why fidelity drift in commercial APIs could be a silent risk for firms deploying GenAI at scale. They also explore how structured and unstructured data must be integrated to build trusted, efficient, and scalable AI applications, especially in regulated markets.

The episode covers:

  • Why many firms are underestimating the impact of model-serving platforms
  • When it makes economic sense to move from managed LLMs to self-hosted stacks
  • How structured data can reduce hallucinations and improve trust in GenAI
  • What ‘semantic drift’ means—and why it could quietly undermine compliance
  • Why retrieval speed, not just inference, is now the bottleneck in GenAI apps

From GPU economics to search-and-retrieval bottlenecks, this episode sheds light on what actually drives performance in production AI systems. Whether you’re scaling AI across the enterprise or still evaluating vendor options, this is a must-listen for anyone shaping infrastructure decisions in financial services.

Key takeaways and quotes

1. Benchmarks aren’t dead, but they’re no longer enough

“Benchmarks are very important for helping you select technology platforms or even downselect from a couple of competing offerings.”

Traditional performance benchmarks still help you compare vendors and technologies. But they no longer cover the full complexity of today’s AI infrastructure decisions. According to James, you now need to consider broader factors like model fidelity, interpretability, efficiency, and real business outcomes. STAC is expanding its research beyond raw speed to help you evaluate AI systems in a more nuanced and future-ready way.

2. AI stack decisions are economic as much as technical

“There’s a point at which the unit economics start to lean quite favorably towards running an open source model in your own environment.”

Starting with a managed LLM service is often cheaper and faster. But that calculus changes as AI usage scales. STAC has modeled the cost inflection point where self-hosting becomes more viable. If you are planning dozens or hundreds of AI-infused applications, infrastructure ownership can lead to significant savings and greater control. Timing that transition is now a strategic decision for AI leaders.

3. Fidelity drift is a growing blind spot in managed LLMs

“If you change the underlying infrastructure… you can get measurably different answers.”

Many firms assume their model outputs are stable. But STAC’s research reveals that even minor infrastructure changes, such as switching cloud environments or model servers, can alter results. This fidelity drift raises red flags for regulated industries that rely on auditability and consistency. Without robust monitoring, you risk deploying AI that behaves differently under the hood without anyone noticing.

4. RAG performance is bottlenecked by retrieval, not inference

“The slowest currently today is the search and retrieval part… most people focus on speeding up the model.”

Retrieval-Augmented Generation (RAG) pipelines are gaining traction, but they’re not as efficient as many teams think. While LLM inference is relatively scalable with more GPUs, retrieval remains the real performance bottleneck. STAC has shown that optimizing this layer can double end-to-end response speed. Focusing on faster and smarter retrieval engines may be the most impactful way to improve user experience.

5. Structured and unstructured data must be combined for trustworthy AI

“It seems to me that it’s completely necessary to combine structured data and unstructured data… one without the other is just so limited.”

James makes the case for hybrid data architecture. While unstructured data provides rich context, structured data grounds GenAI outputs in verifiable facts and boosts both precision and credibility. The ability to cite specific rows, APIs, or time series gives you the confidence you need to trust LLM-generated insights. Especially in capital markets, structured grounding is essential.

6. Semantic drift could become a compliance and governance risk

“You can measure how… not just the output changes, but the way in which that output is delivered semantically.”

Subtle shifts in tone, phrasing, or meaning can occur as LLMs evolve. STAC is researching how these semantic changes might affect decision-making, especially in risk-sensitive or customer-facing applications. For example, a compliance report that sounds more speculative due to a model update could create unintended consequences. Tracking these changes will be critical to managing model governance at scale.

7. User experience is the real performance benchmark

“If the LLM is jittery… the usability suffers. And that’ll be a reason people might switch from one LLM to another.”

Beyond latency, the feel of a system matters. STAC is now measuring LLM ‘jitter,’ the inconsistency in token delivery that degrades the user experience. Especially in real-time or front-office applications, a jittery model erodes confidence and reduces adoption, even if it’s technically fast. You should treat usability as a first-class performance metric rather than an afterthought.

8. Cooling and energy efficiency are board-level concerns

“We are being increasingly asked to measure power consumption, particularly because liquid cooling is becoming so important.”

AI infrastructure is not just about software and silicon. It is also about power draw and thermals. With liquid cooling and energy budgets now influencing architecture decisions, STAC is helping firms model compute efficiency in real-world deployments. For CTOs and CFOs, understanding the power profile of your AI stack is becoming critical to long-term scalability and cost control.

Further reading:
STAC Research Note: Comparing LLM Benchmarking Frameworks
STAC Research Note: Performance And Efficiency Comparison Between Self-Hosted LLMs And API Services

Follow James on LinkedIn and see the latest research from STAC on their website.

Want to learn more? Explore how the high-performance analytical database for the AI era can help you ask better questions and act faster on your data.

Be first in line to hear our next podcast by subscribing to our YouTube channelApple PodcastsSpotify, or your chosen podcast platform. If you have any questions you would like us to tackle in a future episode or want to provide some feedback, reach out to the team at [email protected].

AIによるイノベーションを加速する、KXのデモをお客様に合わせてご提供します。

当社のチームが以下の実現をサポートします:

  • ストリーミング、リアルタイム、および過去データに最適化された設計
  • エンタープライズ向けのスケーラビリティ、耐障害性、統合性、そして高度な分析機能
  • 幅広い開発言語との統合に対応する充実したツール群

専門担当者によるデモをリクエスト

*」は必須フィールドを示します

本フォームを送信いただくと、KXの製品・サービス、お知らせ、イベントに関する営業・マーケティング情報をお受け取りいただけます。プライバシーポリシーからお手続きいただくことで購読解除も可能です。当社の個人情報の収集・使用に関する詳しい情報については、プライバシーポリシーをご覧ください。

このフィールドは入力チェック用です。変更しないでください。

タイムシリーズ分野におけるG2認定リーダー