Question 1

How is this different from agent orchestration?

Accepted Answer

Hybrid routing focuses on choosing which model serves a request or step. Agent orchestration (AgentWorks) adds multi-step workflows, tools, and schedules on top - see the agent orchestration service page for that narrative.

Question 2

Does routing add latency?

Accepted Answer

A lightweight classifier adds milliseconds to low single digits in typical designs; we size the path to your SLOs and can cache classifications for sticky sessions where policy allows.

Question 3

What happens when the SLM is uncertain?

Accepted Answer

You define the fallback: escalate to a larger model, return a safe default, or route to human review. Each path is logged so you can tune thresholds without guessing in production.

Question 4

Can we route to a public API as well as private models?

Accepted Answer

Technically yes, but many regulated teams prefer to keep traffic inside approved boundaries. We document data flows explicitly; mixing public APIs requires governance sign-off.

Question 5

How do we know the router is correct?

Accepted Answer

Offline eval sets plus shadow mode: run candidate routes alongside production without serving their outputs until metrics prove the switch.

Question 6

Who deploys the router?

Accepted Answer

Usually your platform team with our reference patterns - sidecar, gateway, or service mesh integration depending on your stack. The statement of work names owners.

Question 7

Where should we start a PoC?

Accepted Answer

Pick one high-volume API or workflow with clear success metrics, baseline cost and latency, then add routing rules for a subset of traffic before full rollout.

Hybrid AI routing

Layered SLM and LLM design

Routing model

Flow diagram

Typical sequencing

Multi-step workflows

From our insights

Frequently asked questions