// 00Technology arsenal

The heaviest engines. Picked for the project.

Not one framework — an arsenal. 145+ technologies across 12 layers: from LLM serving and agent graphs, through vector databases and data streams, to Rust, Kubernetes and CUDA. We pick them for the problem, not for the hype.

We cover the entire layer: from inference engines (vLLM, TensorRT-LLM) and agent graphs (LangGraph, MCP), through vector databases (Qdrant, BGE-M3) and data streams (ClickHouse, Kafka), to system languages (Rust, Go), orchestration (Kubernetes, Terraform), and GPU (NVIDIA CUDA, H100).

Flagship engines are those we use most frequently in production — the rest are chosen for the specific task, not trends. This breadth allows us to design solutions tailored to the client’s problem, rather than forcing the problem into a single, preferred tool.

We pick models by measurement, not datasheet. The OpenClaw router serves dozens of models today — DeepSeek-V4, Mistral Large 3, Qwen3.5/Coder, GLM-5, Gemma 3/4, Devstral-2 and more — each with measured TTFT, throughput and context window. Frontier models (Claude Opus 4, GPT-5, Gemini 3) are integrated when a project requires them. We mask PII before anything reaches the cloud, and compute BGE-M3 embeddings locally — sensitive data never leaves your infrastructure.

vLLM · TensorRT-LLMLLM serving with maximum throughput and low latency

LangGraph · MCPagent graphs with state control and tooling

Qdrant · BGE-M3production-grade semantic search

ClickHouse · Kafkareal-time analytics and streaming

Rust · Gowhere performance and reliability matter

Kubernetes · Terraforminfrastructure as code, built to scale

NVIDIA CUDA · B200/H200training and inference on our own hardware

Three.js · WebGPUin-browser visualizations and presentation

// //fleet · what we serve right now

…

// //benchmark · model tiers

OpenClaw routing tiers — GPU cost proxy and task coverage (source: routing matrix)
Model tier
Flagship	3	9 (best)	cloud	masked
Mid	1.5	6	cloud	masked
Small	1	3	cloud	masked
BGE-M3 (local)	0.15 (best)	1	on-prem	stays local

// //story · why this routing

Why this model routing

Cost climbs with the model tier

The OpenClaw GPU cost proxy per routing tier. Local BGE-M3 embeddings are ~20× cheaper than flagship models — so retrieval stays local and the cloud fires only when the task needs it.

Flagship3.00 proxy
Mid1.50 proxy
Small1.00 proxy
BGE-M30.15 proxy

Task-type coverage

How many task types (chat, reasoning, code, translation, summarisation…) each tier serves. The router picks the cheapest tier that can carry the task.

Flagship9 types
Mid6 types
Small3 types
BGE-M31 types

Flagship (▣) engines are the ones we run in production most often. The full selection depends on the problem — see services →

OpenClaw routing tiers — GPU cost proxy and task coverage (source: routing matrix)

Model tier

Flagship

9 (best)

cloud

masked

Mid

1.5

cloud

masked

Small

cloud

masked

BGE-M3 (local)

0.15 (best)

on-prem

stays local