The Token Tax: What We Learned Building an AI Agent for Home Loan Verification

The Token Tax : What We Learned Building an AI Agent for Home Loan Verification

We built the same intelligent document verification agent three different ways – from scratch, on Kore.ai, and on Pega – and measured everything. Here’s what the data told us about token consumption, output reliability, and the real cost of ‘just vibe coding it.’

Why We Built This

Home loan processing is document-intensive by nature. A single application can include national ID documents, bank statements, payslips, company registration certificates, KYC filings, title deeds, and more — each needing to be verified not only for authenticity but for internal consistency across the application. A loan officer manually cross-checking all of this is slow, error-prone, and expensive at scale.

We saw a genuine opportunity for an AI agent to own this verification workflow end-to-end. But as we started planning the build, a question surfaced that we couldn’t ignore: how much does it actually cost to run an LLM-powered agent over a complex, multi-document process — and does that cost vary meaningfully depending on how you build it?

Token consumption is the electricity bill of AI. You can have the smartest agent in the world, but if it burns through tokens inefficiently — redundant context, bloated prompts, unpredictable retries — it becomes economically unviable at scale. We decided to treat this as a proper research exercise rather than a one-shot build.

“Token consumption is the electricity bill of AI. The smartest agent in the world becomes economically unviable if it burns tokens inefficiently.”

What the Agent Does: The RCU Agent

We called it the RCU Agent — short for Review, Check, and Underwrite. Its scope covers the full document verification layer of the home loan intake process. The agent operates across three distinct capability areas:

RCU Agent — Capability Architecture

01 Document Extraction
Extracts structured information from both templated documents (national IDs, bank statements, KYC forms) with defined schemas, and non-templated documents (payslips, company registration certificates, employer letters) that require adaptive extraction logic.

02 Application Verification
Cross-references extracted document data against the applicant’s stated details in the loan application — checking for consistency in identity, income, employment, and property details. Also verifies completeness, flagging missing fields or mismatched values.

03 Business Rules Engine
Applies a rules layer to determine whether the submitted document set is sufficient to proceed. Validates KYC document requirements, income verification documentation, and property-related documents independently — then produces a consolidated go/no-go recommendation with specific gap annotations.

How We Tested: Three Builds, Same Scope

Rather than picking a platform and building once, we ran a structured comparison across three approaches using an identical set of test applications — the same documents, the same verification rules, and the same expected outcomes for each run.

Build A: Scratch (Vibe-Coded)

The first build was an intentionally unconstrained implementation — written quickly, leaning heavily on the LLM to handle logic, with minimal prompt engineering discipline. Think of it as the “move fast” prototype: system prompts written intuitively rather than precisely, context passed in bulk, and no structured orchestration layer controlling the agent’s reasoning flow. This gave us a baseline for what unoptimised AI development actually looks like in practice.

Build B: Kore.ai Orchestration Platform

The second build used Kore.ai, an enterprise conversational and agentic AI platform. Kore provides a structured environment for building agents — pre-built dialogue flows, intent management, and some token-management tooling out of the box. This represented the “platform-assisted” tier of development: more guardrails than a scratch build, but still dependent on prompt quality at the developer level.

Build C: Pega Platform

The third build used Pega’s AI-integrated process automation layer. Pega’s architecture is designed around deterministic process execution first, with AI invoked at specific, bounded decision points — rather than an AI-first design where the LLM drives the overall orchestration. This structural difference turned out to be significant.

Key Design Difference

In the Scratch and Kore builds, the LLM was doing heavy orchestration work — deciding what to do next, when to call tools, how to structure outputs. In the Pega build, the process architecture handled orchestration deterministically, and the LLM was only invoked for the cognitive tasks that genuinely require it: extraction and judgement calls.

What We Observed

Most token analyses start at runtime — the cost of executing the agent. We went further, measuring across four distinct phases: build, first execute, multiple executions, and at scale. This matters because the build phase alone carries a significant token cost that rarely appears in platform comparisons, and the gap between approaches compounds differently at each stage.

Scope and Methodology

Runtime figures for Pega and Kore cover GenAI node configurations on both platforms — not the full agent frameworks of either. This was a deliberate scope decision: we tested how each platform’s AI invocation layer performs under identical workloads, not their broader agentic capabilities. Build-phase token estimates for all three approaches are reasoned estimates based on observed development patterns (prompt iteration cycles, integration testing, debugging runs) — not directly measured. They are presented transparently as such.

Phase 1: Build

Building a 9-node document verification agent requires prompt development, orchestration wiring, integration testing, and debugging — all of which consume tokens. The scratch build carries the heaviest build cost because every element is hand-crafted without platform scaffolding.

Build Activity	Scratch (Est.)	Kore.ai (Est.)	Pega (Est.)
Prompt dev per node (~12 iters scratch · ~5 Kore · ~3 Pega · avg 5,500 tokens/iter)	~594,000	~248,000	~149,000
Orchestration building — manual wiring, flow testing, partial pipeline runs	~450,000	~120,000	~80,000
End-to-end integration testing — full pipeline runs during development	~750,000	~300,000	~240,000
Bug fixing & regression — cross-node failures, logic errors, retries	~400,000	~80,000	~40,000
Final validation runs	~300,000	~52,000	~21,000
Total build (est.)	~2,494,000	~800,000	~530,000

The scratch build costs roughly 3× more tokens to build than Pega — before a single production case is processed. This overhead is largely invisible in standard platform evaluations because most teams measure running cost only, not development cost.

Phases 2–4: Execution

Token consumption across execution phases — all three approaches

Phase	Scratch	Kore.ai	Pega	Key Driver
First execute (1 run)	103,065 measured	30,138 measured	28,063 measured	Scratch agentic loop: 72.1% of tokens consumed by orchestration overhead
Multiple executions (5 runs avg/run)	103,065 extrapolated	30,436 measured	23,970 measured	Kore completion tokens 89.6% higher than Pega; scratch overhead constant per run
At scale (100 runs total)	10,306,500 extrapolated	3,043,640 measured	2,396,960 measured	Scratch 3.4× Kore, 4.3× Pega — orchestration overhead multiplies with every run

The Hidden Tax Inside the Scratch Build

Of the 103,065 tokens consumed in a single scratch run, only 28,720 (27.9%) were actual specialist work — schema mapping, KYC validation, income validation, bank validation, final decision. The remaining 74,345 tokens (72.1%) were manager and orchestration overhead: the agentic loop planning what to do next, managing tool-call context, and carrying intermediate outputs between steps.

This is what unstructured AI invocation looks like under the hood. The model is spending the majority of its token budget deciding how to do the task, not doing it. Structured platforms eliminate this overhead by handling orchestration deterministically.

Total Cost of Ownership: Build + 100 Runs

With all three approaches carrying measured or extrapolated runtime data, the full picture is stark. Scratch does not just cost more — it costs more by an order of magnitude once it reaches scale.

Approach	Total Tokens (Build + 100 Runs)	vs Scratch
Scratch	~12,800,500 tokens	Baseline
Kore.ai	~3,843,640 tokens	−70%
Pega	~2,926,960 tokens	−77%

The real divide is between unstructured agentic builds and any platform that applies architectural discipline to how the AI is invoked. Scratch is not close to Kore — it is 3.4× more expensive at 100 runs, driven by an orchestration overhead that compounds with every single execution.

Output Correctness: Platform Comparison

Token consumption was only part of the story. The more revealing findings emerged when we looked at output correctness and developer experience — whether the agent was actually getting the right answers, and how much control the platform gave us in pursuing that.

Approach	Token Efficiency	Output Correctness	Error Rate	Verdict
Scratch / Vibe-coded	Poor	Variable	High — factual extraction errors, hallucinated fields	Not viable
Kore.ai	Moderate	Consistent	Moderate — complex multi-document scenarios and cross-field rule evaluation exposed correctness gaps requiring ongoing prompt engineering	Viable with caveats
Pega	Good	Deterministic	Very low — same inputs produced same outputs across all runs	Production-ready

Key Observations

Observation 1: The Scratch Build Consumed the Most — and Made Mistakes

The vibe-coded build was predictably expensive. Without structured prompt engineering or a constrained orchestration layer, the agent passed large, unfiltered context windows to the LLM repeatedly. It also made substantive errors: in some runs, it hallucinated document fields that weren’t present, mismatched applicant names across documents, and occasionally skipped entire rule checks. The high token count was partly a consequence of retry logic attempting to recover from its own inconsistencies.

Observation 2: Kore Improved Efficiency, but Correctness and Flexibility Had Limits

Kore.ai’s platform tooling meaningfully reduced token consumption compared to the scratch build. The structured flow management reduced redundant context passing, and results were consistent across runs. However, we hit two meaningful friction points in practice.

First, because Kore is fundamentally a prompt-led development and execution environment, the quality of every output depended heavily on prompt construction. In several verification scenarios the agent returned incorrect results — not inconsistent results, but confidently wrong ones. It is worth being precise here: LLMs are not deterministic by nature, and a combination of well-engineered prompts, code-based logic, and post-LLM decisioning can make the system deterministic enough for most purposes. The challenge is that reaching and sustaining that threshold requires continuous prompt engineering investment.

Second, we encountered limits in workflow customisation depth. The platform’s structural conventions imposed a ceiling on certain configuration choices that a more process-native architecture would not.

Observation 3: Pega Delivered Efficiency, Correctness, and Greater Workflow Freedom

At GenAI node level, Pega consumed 27% fewer tokens than Kore at scale — 2,396,960 versus 3,043,640 for 100 applications. A critical qualification: this advantage is specific to the GenAI-node-optimised configuration. The architectural discipline of GenAI nodes is what creates the widening gap at volume.

On model flexibility, Pega’s model catalogue is curated rather than fully open, and integration with external models is not yet available on the platform. The developer freedom we experienced was in the workflow and decisioning layer — the ability to design the process architecture around our exact requirements without being forced into platform defaults.

“A combination of good prompting, code-based logic, and output decisioning can make a prompt-led system deterministic enough — but sustaining that at scale across complex, multi-document workflows requires constant engineering investment.”

Why the Architectural Difference Matters

The reason Pega outperformed both alternatives on token consumption comes down to a principle we’d call bounded AI invocation. When an LLM is responsible for orchestrating its own workflow, it consumes tokens not just on the task at hand but on the meta-reasoning about the task. A deterministic process layer handles this orchestration for free — the AI is told exactly when it is needed, for exactly what purpose, with exactly the context required.

Key Findings

Orchestration waste: In an unstructured scratch agent build, 72.1% of tokens are consumed by the orchestration loop — planning, tool-call context, and intermediate outputs. Only 27.9% goes toward actual specialist work. This is the cost of letting the LLM manage its own workflow.
Efficiency gap: The gap between Pega and Kore starts small (~7%) on first run but expands to ~27% at scale — driven almost entirely by completion token verbosity in Kore’s JSON generation and validation summary nodes.
Scope of findings: These findings are scoped to GenAI node configurations on both platforms. Results may differ when running native agentic implementations. Architecture choices within a platform matter as much as the platform choice itself.
Determinism: LLMs are not deterministic by nature. Process-controlled invocation enforces reliable outputs structurally, while prompt-led approaches require continuous engineering effort to sustain.
Completion tokens: In Kore, completion tokens were 89.6% higher than Pega per 100 runs — investigating output verbosity in LLM nodes is the highest-ROI optimisation for any prompt-led build.
Build-phase costs: Build-phase token costs are invisible in most analyses but real. Process-first architectures reduce this overhead by limiting LLM involvement during the development phase itself.

Where Each Platform Genuinely Shines

It would be a misreading of this research to conclude that Kore is simply the inferior choice. These findings are specific to a particular type of agent — a complex, multi-document, rules-heavy verification workflow in a regulated environment. The numbers favour Pega for this use case. They do not describe the full picture of what either platform is built for.

Kore.ai — Where It Leads

Kore’s genuine strengths are in speed, accessibility, and AI-native tooling. It is a low-code environment where teams without deep engineering depth can configure and deploy AI agents quickly. Time-to-market is materially faster than a process-first platform. Beyond the build, Kore brings a rich operational layer: AI governance, agent management, conversation analytics, and multi-agent orchestration are first-class features, not add-ons. For organisations whose primary need is deploying and managing AI agents at scale, Kore’s platform breadth is a genuine advantage.

Pega — Where It Leads

Pega’s strength in this exercise was its workflow-driven architecture — AI invoked at bounded, controlled points within a deterministic process. The industry is moving fast toward a world where AI that actually works is distinguished from AI that merely sounds convincing. Deterministic outcomes, explainable decisions, auditable reasoning trails, and governance baked into the execution layer are not optional features for regulated industries — they are the price of admission.

Beyond the AI layer, Pega is an enterprise operating system covering case management, decisioning, CRM, and operations automation. For organisations thinking about where enterprise AI is going — agents embedded in every process, every decision, every customer interaction, with full governance and traceability — Pega’s architecture is already built for that future.

The Cost Picture Is Bigger Than Tokens — and That’s Exactly the Point

Let’s address the obvious objection head-on: both Kore and Pega come with platform licensing costs — the normal cost of enterprise SaaS, no different from the CRM, the cloud infrastructure, or the data platform your organisation already runs. And yes, a team that vibe-codes an agent on a bare API call pays neither.

But token costs are not static. They scale with every application processed, every agent deployed, every workflow automated. Uber burned through its annual AI budget by April. Microsoft told engineers to stop using Claude. These are early signals from organisations that treated AI consumption as a rounding error until it wasn’t. The teams that hardcode cost efficiency into their architecture from day one are the ones that will still be running AI at scale in two years.

Platform investment reframes this entirely. When Kore costs money, it is buying you AI governance, agent lifecycle management, multi-agent orchestration, and deployment infrastructure that would take a substantial engineering team months to build. When Pega costs money, it is typically justified across the breadth of enterprise problems it can address — AI is leverage on an investment already being made.

“The teams that treat token costs as an afterthought are already building tomorrow’s tech debt. Optimisation isn’t a nice-to-have — it’s the difference between AI that scales and AI that gets shut down.”

An unoptimised AI architecture does not just cost more per token. It costs more in errors caught late, in inconsistent outputs that require human review, in retry logic that multiplies consumption, and in engineering cycles spent firefighting instead of building. The token bill is the visible tip. The operational cost underneath is what sinks teams.

With model costs under constant pressure and AI workloads growing exponentially, every architectural decision you make today is a financial commitment you will be living with at 10× the volume in 18 months. The organisations that figure this out early compound the ROI of their platform investments rather than be consumed by them.

What This Means for AI in Financial Services

Conclusion

The economics of AI agents in production environments are not just about model capability — they are about how the model is invoked. Our research suggests that teams building AI agents for regulated, high-stakes workflows should invest in architectural discipline before investing in model scale. For this specific use case — complex, multi-document verification with strict correctness requirements — a workflow-driven architecture delivered measurably better token efficiency and output reliability than a prompt-driven approach.

That is not a verdict on which platform is “better” in the abstract. Kore brings real strengths in speed of development, AI-native governance, and agent management that matter enormously in the right context. Pega’s advantage here is inseparable from its workflow-first architecture — most valuable when the problem is complex, regulated, and demands auditable outcomes.

The “just vibe code it” approach remains a useful starting point for experimentation. But at production scale, architectural choices made early compound quickly — in token costs, in correctness risk, and in the engineering effort required to fix what was built in a hurry.

Every enterprise commerce transformation starts with a gap: between where your commerce infrastructure is today and where the market requires it to be in 2027, 2028, and 2030. The research from Gartner, McKinsey, Forrester, KPMG, and Deloitte leaves no ambiguity about where that destination is: a cloud-native, API-first, AI-orchestrated commerce platform that is ready for agentic commerce, resilient under any demand scenario, compliant in every jurisdiction, and generating continuous predictive intelligence from every transaction.

The question is not whether to build toward this destination. The question is how to do it at speed, with measurable ROI at every stage, and without disrupting your live commerce operations in the process. This is the journey Novitates is designed to deliver.

KEY STATISTICS AT A GLANCE

▶ $15T agent-mediated B2B commerce market by 2028 — Gartner, October 2025

▶ $3T–$5T global agentic commerce revenue by 2030 — McKinsey, October 2025

▶ Early movers will have a decisive advantage in agentic commerce — McKinsey, October 2025

▶ 50% of world economies covered by AI regulation by 2027 — compliance-by-design is mandatory — Gartner, October 2025

The 2027–2030 Commerce Destination

Based on synthesis of the most credible research available, the enterprise commerce platform of 2030 has five defining characteristics. First, agentic readiness: the ability to be discovered, evaluated, and transacted with by AI procurement agents operating on behalf of B2B buyers — serving the $15 trillion agent-mediated B2B market predicted by Gartner. (Source: Gartner, October 2025) Second, real-time orchestration: synchronised, sub-second inventory, pricing, and fulfilment across every channel and geography — enabling the $1–$5 trillion agentic commerce opportunity. (Source: McKinsey, October 2025) Third, predictive intelligence: AI demand forecasting, dynamic pricing, and proactive fulfilment that reduces logistics costs by 15% and optimises inventory by 35%. (Source: Microsoft/IBM, 2025) Fourth, composable flexibility: the ability to deploy new capabilities in days rather than months, maintaining 80% faster deployment cadence than legacy competitors. (Source: MACH Alliance, 2025) Fifth, compliance by design: automated regulatory documentation, data sovereignty controls, and sustainability reporting — meeting the requirements of a regulatory landscape that will cover 50% of the world’s economies with AI regulation by 2027. (Source: Gartner, October 2025)

The Novitates Four-Stage Transformation Roadmap

Novitates delivers enterprise commerce transformation through a structured four-stage roadmap. Stage 1 (months 1–3): Architecture Foundation — Commerce Architecture Assessment; integration mapping; cloud-native infrastructure deployment; core ERP, OMS, and CRM integration. Stage 2 (months 4–8): Orchestration Excellence — real-time inventory and pricing synchronisation; intelligent order routing; self-service automation for high-volume interaction types; peak demand testing and validation. Stage 3 (months 9–15): Intelligence Layer — predictive demand forecasting; dynamic pricing engine; AI personalisation; post-order automation and analytics. Stage 4 (months 16–24): Agentic Readiness — B2B API-first agent interface; agent commerce performance monitoring; sustainability reporting integration; full agentic commerce capability.

This roadmap delivers measurable outcomes at each stage — ensuring that the investment case for the next stage is established before commitment is made. Every stage is designed to operate in parallel with live commerce operations, with zero disruption to revenue.

“Novitates delivers enterprise commerce transformation through a structured four-stage roadmap.”

The Research Consensus on First-Mover Advantage

Every major research firm cited in this playbook agrees on one point: the first-mover advantage in AI-native commerce is significant, durable, and compounding. McKinsey: “Early movers will have a decisive advantage” in agentic commerce. (Source: McKinsey, October 2025) Gartner: organisations using multi-agent AI for 80% of customer-facing processes will dominate by 2028. (Source: Gartner, October 2025) Forrester: enterprises that invest in composable agentlake architectures now will be positioned to scale AI agent capabilities as the market demands. (Source: Forrester, October 2025)

The window for early-mover advantage in enterprise AI commerce is not unlimited. The enterprises building agentic-ready infrastructure today are the ones that will own the market when the $15 trillion agent-mediated B2B transition accelerates.

Your Transformation Starts with One Conversation

Novitates offers a free 90-minute Commerce Transformation Strategy Session — a structured engagement with our commerce architecture and AI specialists that assesses your current state, identifies your highest-priority gaps, and maps the fastest credible path to your 2027 commerce platform target. No sales pitch. No commitment required. Just a clear, evidence-based view of where you are and where you need to be.

Book your session at novitatestech.com/contact-us. The research is clear. The timeline is defined. The question is whether your organisation will be a first-mover or a fast-follower — and that decision begins with a single conversation.

READY TO TRANSFORM YOUR CLOUD COMMERCE?

Novitates specialises in Pega-powered solutions for BFSI and enterprise commerce. Book a free 30-minute discovery session with our specialists today.

novitatestech.com/contact-us | +91 929-151-6231 | connect@novitatestech.com

Pega-Powered Cloud Commerce Webinar Starts In

Pega-Powered Cloud Commerce Webinar Starts In

Get in touch