The rise of AI-native ops

by Freddie Heygate

I recently attended AWS re:Invent 2025. Here are the headlines you may seen:

  • Observability now includes models, prompts, retrieval, tools and token paths.
  • GPU, Trainium, Inferentia and data movement are reshaping cost curves.
  • “Reliability” now includes inference latency, RAG accuracy and agent success rates.
  • From Amazon’s own perspective: Prime Day traffic hit eye-watering levels. 200 million Prime members, 9 billion same- or next-day packages, ElastiCache at ~1.5 quadrillion requests/day, ads infrastructure at over a trillion requests/minute, CloudFront serving more than 3 trillion HTTP requests.
  • Custom silicon matters: over 40 percent of Amazon.com traffic now runs on AWS Graviton, while AWS Trainium and AWS Inferentia power Amazon Bedrock for training and inference with drastically lower cost per token.
  • Robotics & autonomy have gone mainstream: more than one million robots operate across Amazon’s networks; Zoox uses petabyte-scale S3, thousands of GPUs, Slurm scheduling, Amazon EKS and Amazon SageMaker HyperPod for simulation and model development.
  • Internal AI agents are exploding: more than 21,000 internal agents deployed across Amazon’s eCommerce Foundation, built on AgentCore and the Strands Agents SDK.
  • Enterprise customers are already seeing results, from 2% conversion lifts across 100,000 SKUs to 30% autonomous resolution of complex workflows to measurable reduction of operational overhead.

Elevator pitch concluded.

Now, let’s get into why this matters for you.

Why AI-native operations and agents matter – in detail

The sessions made one truth unavoidable: AI-native systems change how the business operates. For the better – if done well. And painfully – if not.

A strong AI operating model unlocks:

1. Faster execution and experimentation

Amazon teams are already using spec-first workflows and integrated AI tooling to achieve 4.5× higher deployment frequency. This isn’t “code generation.”

It’s AI woven into planning, testing, deployment and rollback.

When the platform supports it, teams ship more with less.

2. Measurable revenue and cost outcomes

This is outcomes, not hype:

Rufus – Amazon’s generative shopping assistant – increased purchase completion by around 60% with 4.5× lower cost thanks to Trainium/Inferentia inference, continuous batching and prompt caching. 

Marketplace optimisation agents delivered 2% absolute conversion increases over massive SKU catalogs.

Internal Amazon teams reported billions in projected cost savings from workflow automation and tool-driven execution at scale.

The pattern is clear: AI is moving real commercial metrics, not just internal enthusiasm.

3. Reliability as feature zero

Across Prime Video’s streaming stack, Zoox’s autonomy platform and Amazon’s retail backbone, reliability wasn’t a footnote – it was the strategy.

AI workloads now demand:

  • SLOs for inference latency, token streaming and RAG freshness
  • Multi-region and edge-cloud architectures for real-time loads
  • Agent success, escalation and tool-call reliability tracking
  • Observability across prompt → model → reasoning → tool No AI system gets a pass on uptime.

4. A better experience for teams

AI agents step into the repetitive, procedural work: checking eligibility, summarising state, generating options, running validations.

Humans shift into:

  • Higher-order design
  • Debugging real issues
  • Experimenting and improving
  • Delivering customer impact 

Teams ship more. On-call pain goes down. Morale goes up. 

…but a weak operating model reverses everything

  • Pilots stall
  • Costs explode
  • Governance breaks
  • Incidents cross multiple opaque AI layers
  • Confidence erodes inside and outside the company 

Your AI operating model is now your moat. 

What makes an AI operating model actually work?

At the highest level, it must be:

Achievable

Your teams can realistically support it at 3 a.m.

Market-competitive 

You don’t need Amazon’s numbers – but you need credible performance.

Customer-centric

Agents should solve real friction: conversion loss, slow resolution, operational bottlenecks.

Transparent

Stakeholders should know what agents do, why they act and how quality is measured.

Flexible 

You can add new agents and tools without rebuilding the platform.

Governed

Guardrails, IAM integration, auditability and human oversight where needed.

The components of an AI-native stack

1. Foundation platform

The runtime for models, agents and orchestration. Correct AWS names:

  • Amazon Bedrock for foundation models
  • AWS Trainium for training
  • AWS Inferentia for inference
  • Amazon EKS and Amazon ECS for container workloads
  • Application Load Balancer (ALB) tuned for AI traffic patterns
  • Continuous batching, token streaming and cost-aware autoscaling

Teams should standardise on a tight set of building blocks.

2. Data & context layer

Agents require current, contextual and governed data. That means:

  • High-quality data contracts
  • Search and retrieval pipelines
  • Feature stores freshness SLOs
  • Strict access control

Without this, your agents operate on hope, not truth.

3. Agent & tool framework

Defines how agents reason, call tools, escalate and log actions. Correct AWS names:

  • Strands Agents SDK – open-source agent framework
  • AgentCore – the managed agent runtime in Amazon Bedrock

Provides:

  • Tool registries
  • Reasoning loops
  • Policy & guardrail enforcement
  • Human-in-loop patterns

4. Reliability & observability

AI ops require deep visibility across:

  • Latency, throughput and token use
  • RAG retrieval success
  • Model drift and content safety
  • Tool-call chains
  • Agent decision traces
  • Escalation logic

If you can’t observe it, you can’t run it.

5. Security, compliance and risk

AI must fit into your identity, access and governance stack: 

  • IAM integration
  • Model and agent policy boundaries
  • Audit logs for all tool calls
  • Safety constraints and approvals
  • Data minimisation

6. Operating model & ownership

This is where AI becomes “how we work,” not “what we’re testing.” It requires:

  • Named owners for platform and agents
  • Change-management processes
  • Regular performance and risk reviews
  • Lifecycle management for prompts, data and tools

Six proven practices for getting value from AI & agents

1. Monitor what your customers and users really care about

Uptime, latency, agent completion and business metrics – not model stats.

2. Treat reliability as a designed feature, not an outcome

Build it into the architecture, not the incident process.

3. Insert agents into existing flows

Adoption skyrockets when agents sit inside the tools teams already use.

4. Put guardrails into the runtime

Policies, access, approvals and auditability must live in the platform.

5. Review and tune regularly

Agents are not “set and forget.” They drift. Your data drifts. Your business moves. Reviews must be scheduled and mandatory.

6. Turn wins into stories you can sell 

Every uplift in conversion, cost efficiency or resolution speed must fuel your value narrative.

How Just After Midnight helps

We help organisations move from AI slideware to AI that runs reliably, safely and cost-efficiently in production.

That includes:

  • SRE & DevOps for AI workloads – inference SLOs, guardrails, observability
  • Agentic AI enablement – design, implementation and ops for Strands & AgentCore
  • Cloud modernisation – Trainium/Inferentia optimisation, GPU strategy, FinOps for AI
  • Data foundations – retrieval pipelines, feature stores, governance
  • Edge & real-time operations – robotics, streaming, containerised inference at the edge

If you’re looking at the re:Invent announcements thinking, “This is incredible… but who keeps this running at 3 a.m.?” that’s the gap we fill.

The real question isn’t whether you’ll adopt AI – it’s how you’ll operate it in 2026.

Get in touch with our team if concerns about reliability, cost control, observability or agent governance are already surfacing, that’s a signal to act early.

We help teams close that gap before it shows up as incidents and overruns.