Skip to content
SLM-Works

Service

SLM infrastructure

Pain

A trained SLM is only useful when inference is reliable: right-sized GPUs, observable latency and errors, safe promotion of new versions, and clarity on who operates the stack when something breaks at 2 a.m.

Outcome

We help you stand up serving on your own metal or cloud accounts (Path A), or source dedicated GPU capacity through SLM-Works where that fits your procurement model (Path B) - with runbooks, monitoring hooks, and explicit responsibility splits in the statement of work.

Differentiator

We document who runs hosts, networks, backups, and escalation paths before go-live. Marketing copy here is descriptive; binding SLAs and resale terms exist only in signed agreements.

Language about GPU resale, partner data centres, capacity commitments, and SLA splits on this page is draft until your counsel and ours approve it for public use. Do not treat this page as a contractual offer. Request the latest commercial pack during sales discussions.

Path A. On-premises or your cloud accounts

You retain ownership of subscriptions, regions, and access policies. We install and configure serving runtimes, wire observability into your toolchain, and hand over artifacts your platform team can repeat.

Path B. Dedicated GPU capacity via SLM-Works

When you prefer to buy capacity through SLM-Works rather than contracting hyperscaler resources directly, we can arrange dedicated GPU footprints with partner providers. Commercial terms, regions, and uptime targets are specified in contract - not on this website.

Architecture at a glance

Illustrative comparison only - your signed architecture diagram supersedes this figure.

SLM infrastructure

On-prem footprint vs dedicated capacity

Path A shows SLM serving deployed inside infrastructure the customer controls end-to-end. Path B shows dedicated accelerators sourced through SLM-Works with partner-operated facility responsibilities called out in commercial documents - not on this page. Arrows and boundaries in real engagements follow the architecture diagram in your statement of work.

Support and operations matrix

Typical split of responsibilities; the statement of work and runbooks are authoritative.

Support and operations responsibilities for SLM infrastructure engagements. Columns are SLM-Works, customer, and underlying provider where Path B applies.
AreaSLM-WorksCustomerProvider (Path B)
Model artifacts & version promotionDefines promotion playbooks; assists cutoverApproves releases; owns business riskN/A unless hosted registry is bundled
Inference runtime & model serving configImplements baseline; documents tuning knobsOwns change control in prodHost OS / hypervisor only (Path B)
GPU / accelerator hostsSpecifies sizing; may procure in Path BOwns or approves SKUs (Path A)Physical hardware & facility (Path B)
Monitoring & alertingIntegrates dashboards & SLO templatesOwns on-call rosters & escalationFacility/host metrics per contract
Backups & disaster recovery (non-model data)Documents recommended patternsOwns policies & executionMay offer snapshots per SKU (Path B)
Incident response (P1)Participates per support tier in SOWIncident commander for product impactPer facility runbooks (Path B)

Plan capacity, ownership, and observability with us

Start with discovery; we bring reference patterns - not a one-size-fits-all appliance.

Request a PoC

Frequently asked questions

Practical answers for technical buyers; validate resale, SLA, and capacity wording with legal and sales before public launch and paid campaigns.

When should we choose Path A versus Path B?
Path A fits when you already have cloud landing zones or datacenters, mature IAM, and want SLM-Works focused on model serving patterns - not procuring metal. Path B can reduce coordination with multiple vendors when you want dedicated GPU capacity sourced and operated under a single commercial thread with SLM-Works; exact trade-offs belong in discovery.
Who owns the GPUs in Path B?
Ownership and lien terms follow the contract chain (customer ↔ SLM-Works ↔ provider). This site does not specify title or lease structure - your order form and legal schedules do.
What SLAs apply?
Published marketing pages are not SLAs. Availability, response times, and credits - if any - are defined only in signed agreements and may reference underlying provider schedules. Ask for the current SLA exhibit during procurement.
Can you run inside our existing Kubernetes platform?
Yes, when cluster policies allow the required GPU drivers, device plugins, and observability agents. We align with your platform team on namespaces, network policies, and secret management rather than imposing a greenfield cluster.
How do we monitor latency and errors in production?
We typically wire RED-style metrics (rate, errors, duration) plus GPU utilization signals into your metrics stack, with optional synthetic probes from approved vantage points. Alert thresholds are co-owned: engineering proposes, your operations team approves.
How does this relate to custom SLM development?
Custom SLM engagements produce the model artifacts; infrastructure engagements make those artifacts reliable in your environment. Many clients combine both; some bring a model from another vendor and only need deployment hardening.
Which regions or countries are supported?
Region lists change with provider capacity and export rules. We confirm residency and data-flow diagrams during discovery - nothing on this page promises availability in a specific geography.

Ready to align infra, procurement, and on-call ownership?

Use the contact page for a PoC or a short discovery call.