cashcrown // wiedza

AI glossary

AI concepts without the jargon: RAG, embeddings, agents, GDPR and infrastructure — with definitions, relations and search. Consistent with our architecture and entities.

61 terms

Large language model (LLM)Core

A model that predicts the next text tokens — the basis of modern AI systems.

An LLM learns the statistics of language from huge corpora and generates text token by token. On its own it does not know your data — domain knowledge is added via RAG or fine-tuning.

TokenCore

The smallest unit of text a model processes (a sub-word piece).

Models count cost and limits in tokens, not characters. ~1 token is about 4 characters; billing and the context window are both measured in tokens.

Context windowCore

The maximum number of tokens a model can see at once (input + output).

When a conversation or documents exceed the window you must truncate or retrieve only the most relevant chunks — one reason RAG is used instead of stuffing the whole base into the prompt.

PromptCore

The instruction and context given to a model that steer its answer.

A good prompt sets a role, rules, context (e.g. RAG sources) and an output format. Injecting a malicious instruction into the prompt is prompt injection — guardrails defend against it.

InferenceCore

Running a trained model to generate an answer.

Inference is the running cost of an AI system — measured by latency and throughput. It can run in the cloud or locally (self-hosting), which decides data residency.

Fine-tuningCore

Further-training a model on your own examples to change its style or behaviour.

Fine-tuning changes the model's weights and is costly; for fresh factual knowledge RAG is usually better (cheaper, updatable without retraining). The two are sometimes combined.

RAG (retrieval-augmented generation)RAG & search

Retrieve facts from a base first, then have the model answer grounded only in them.

RAG curbs hallucination: the model is handed concrete sources and cites them. It is the basis of trustworthy support — answers are grounded, and on a weak match the system escalates to a human instead of inventing.

Embedding (vector)RAG & searchBGE-M3

Text turned into a list of numbers where closeness = similar meaning.

Embeddings let you search by meaning, not keywords. Cashcrown computes them locally with the BGE-M3 model (1024 dimensions), so the content being embedded never leaves the infrastructure.

Vector databaseRAG & searchQdrant

A store of embeddings that finds the nearest vectors in milliseconds.

The heart of semantic search in RAG. Cashcrown runs Qdrant locally as a native service — vectors and metadata stay on its own server.

Semantic searchRAG & search

Searching by a sentence's meaning rather than literal word match.

The query and documents become embeddings and are compared as vectors, so “how to protect data” finds a GDPR text even with no shared words.

Hybrid searchRAG & search

Combining semantic search with classic keyword search.

Semantics catches meaning while full-text search nails exact names and codes. Combining both (e.g. vectors + Postgres FTS) beats either alone.

RerankingRAG & search

A second pass that reorders search results by relevance.

After a fast first retrieval, a more precise model scores each candidate against the query and lifts the best to the top — improving the context handed to the LLM.

LLM routerAgentsOpenClaw

A single ingress to models: picks the model per task, masks PII, enforces limits.

All AI traffic in Cashcrown goes through the OpenClaw router — no code calls a provider directly. That makes PII masking, model fallback and telemetry enforceable in one place.

More:Model atlas →

AI agentAgents

An AI system that plans steps and uses tools to accomplish a task.

An agent doesn't just answer — it acts: searches, calls APIs, books a slot. Safety needs guardrails and confirmations on irreversible actions so behaviour never runs away.

Tool useAgents

A model's ability to call functions/APIs instead of only writing text.

The model is given a tool catalog with argument schemas and decides which to call. Irreversible actions (e.g. a booking) require a server-issued confirmation token, not the model's say-so.

GuardrailsAgents

Rules that constrain what a model may accept as input and emit as output.

On input they reject prompt injection; on output they qualify promises (e.g. price ranges, hedged deadlines). Guardrails keep the assistant from committing to things it shouldn't.

HallucinationAgents

A confident-sounding but fabricated answer from a model.

Models fill gaps with plausible text even when they don't know the fact. RAG with citations and a confidence threshold (escalate to a human on a weak match) is the main defence.

Structured outputAgents

Forcing a model to return valid JSON that matches a schema.

Without it a model's answer is hard to parse safely. Cashcrown uses prompt-based JSON with schema validation and one repair — steadier than the slow native “json_schema” modes of some providers.

Related:Prompt Tool use LLM router

PII (personal data)Privacy & GDPR

Information that identifies a person: email, phone, name, address.

Before anything goes to the cloud the router masks PII with tokens and rehydrates them in the response — the cloud model never sees real personal data.

Related:GDPR LLM router Data residency

GDPRPrivacy & GDPR

EU data-protection law: consent, minimisation, the right to erasure.

In practice: consent before tracking, keeping only what's necessary, and real erasure on request. In Cashcrown these are built into the pipeline, not bolted on later.

Data residencyPrivacy & GDPR

Where your data physically lives and is processed.

Local embeddings (BGE-M3) and a local Qdrant mean sensitive content never leaves the server. Only a masked prompt reaches the cloud — a deliberate residency choice.

Self-hostingInfrastructure

Running models and services on your own infrastructure, not a vendor's.

Gives control over data and cost and independence from a single vendor. Cashcrown self-hosts embeddings, the vector DB and search, reaching the cloud only for generation — with masking.

BGE-M3InfrastructureBGE-M3

A multilingual embedding model run locally (1024 dimensions).

Turns multilingual text into vectors without sending it to the cloud — the foundation of Cashcrown's private RAG.

More:BGE-M3 in the model atlas →

ObservabilityInfrastructure

Metrics, logs and traces that show what an AI system is really doing.

You cannot govern cost or quality without measuring it. Cashcrown exposes metrics (Prometheus), correlated logs and traces, so every model call is countable and debuggable.

Related:Latency LLM router Inference

LatencyInfrastructure

Time from question to answer; low latency = a fluid interaction.

Streaming the answer token by token cuts perceived latency — the user sees text before the model finishes. That's why the assistant “types live”.

ThroughputInfrastructure

How many requests/tokens a system serves per unit of time.

With latency it describes model-serving performance. Concurrency limits and backpressure protect throughput from overload.

Related:Latency Inference Observability

Concierge (assistant)Agents

A RAG-based assistant that answers with citations and escalates to a human.

Cashcrown's concierge combines RAG, guardrails, multilingual support and streaming — it answers live with citations and, when unsure, hands off to a human instead of inventing.

FAQ explorerSEO & AEO

Searching questions and answers by meaning, not just words.

A semantic FAQ surfaces the best answer even when the question is phrased differently from the base — using the same embeddings as RAG.

Thinking (reasoning) modeCore

A mode where the model reasons internally before composing an answer.

Thinking models do hidden reasoning — great for hard decisions but slower and costlier. Forced into ordinary chat they can return an empty answer, so we enable it only for reasoning tasks (the think parameter).

More:Model atlas →Which AI model? (tree) →

Model selection (routing)Infrastructure

Picking the right model per task — the cheapest one that can carry it.

There is no single “best” model; the OpenClaw router picks one per task from measured throughput, time-to-first-token (TTFT) and context window — not the name. The model atlas shows the whole fleet with measured specs and per-task selection.

More:Model atlas →Model comparison →

TTFT (time to first token)Infrastructure

Time from sending a question to the first token of the answer appearing.

TTFT decides how “fast” a model feels — with streaming the user sees text after it. We measure it live per model because names mislead (a “flash” can be slower than a big model).

More:Model atlas →

AI ActPrivacy & GDPR

EU regulation that classifies AI systems by risk and imposes obligations.

The AI Act splits systems into risk levels. For limited risk (chatbots, assistants) the key duty is transparency — the user must know they're talking to AI. High risk adds human oversight, technical documentation, log records and conformity assessment. In force since 2024 and applied in phases: bans on prohibited practices from 2025, while chatbot transparency (Article 50) and most high-risk obligations apply from August 2026.

DPIA (data protection impact assessment)Privacy & GDPR

A risk assessment required when processing may cause high risk to people's rights.

A DPIA stems from GDPR and is typically required for large-scale profiling, sensitive data or automated decisions about people. An assistant that only answers from a knowledge base usually does not need one; a system that profiles or decides probably does.

Related:GDPR AI Act PII (personal data)

Human oversight (human-in-the-loop)Agents

The requirement that a human supervise and confirm significant or irreversible AI decisions.

Human oversight is a pillar of compliance and safety: irreversible actions pass through a confirmation (human-gate), and the system acts autonomously only within a narrow, described scope. GDPR also grants a right not to be subject to solely automated decisions.

Related:AI agent Guardrails AI Act

ClassifierCore

A model that reads a document or message and assigns it to one of a set of categories.

Classification is one of the fastest-ROI tasks: invoice coding, ticket categorization, lead scoring. The result is measurable by definition (percentage of correct assignments) and the process usually already exists manually — which makes it a good first deployment.

Data extractionCore

Pulling specific fields from text — invoice number, amount, tax ID, date, CV data.

Extraction turns unstructured text into system-ready fields. Together with classification it solves a large share of first AI ideas in companies. It works best with an enforced schema (structured output) that guarantees a valid format.

Human handoffAgents

Smoothly passing a conversation from the AI assistant to a human when the case needs it.

A handoff is a sign of a mature system, not a failure: on low confidence, customer frustration, or a case that needs a decision, the assistant escalates to a human instead of guessing. It is also part of AI Act transparency — the user can always reach a human.

PilotInfrastructure

Deploying one narrow process at a fixed cost to measure value before scaling.

A pilot lowers risk on both sides: instead of a big contract we show a working system on one measurable process. If it delivers the numbers (hours saved, % cases closed without a human), we expand the scope; if not, it cost little.

Lead scoringAgents

Automatically scoring and prioritizing enquiries by fit to the ideal customer profile (ICP).

Scoring is classification applied to sales: a form lead gets a score (budget, fit, readiness) and the most valuable ones reach a human first. Criteria are explicit and logged, not hidden profiling — which matters for GDPR and the AI Act.

OCR (optical character recognition)RAG & search

Reading text from an image or scan — the first step before extracting fields from documents.

OCR turns a scanned invoice or contract into text the model can then classify and extract fields from (tax ID, amount, date). Combined with extraction and structured output it forms a full "scan → system-ready fields" pipeline.

RPA (robotic process automation)Agents

Automating repetitive steps across existing apps; with AI it becomes "intelligent".

Classic RPA clicks and retypes by rigid rules. Combined with AI (classification, extraction, decisions) it handles variance — e.g. reads an invoice in any layout instead of requiring one template. We fill this role with agents that have a tool allow-list and a human-gate.

Prompt engineeringCore

Designing a model's instructions: role, rules, context, output format.

A good prompt is engineering, not a magic spell: a clear role, constraints, context (e.g. from RAG) and an enforced format. In production a prompt is versioned and tested like code, not guessed.

Synthetic dataCore

Artificially generated data for training or testing when real data is scarce or sensitive.

Synthetic data helps when real data is scarce, costly or GDPR-bound — e.g. for tests and edge cases. It must reflect the real distribution, or the model learns a fiction.

AI governancePrivacy & GDPR

The rules, roles and controls over how a company builds and uses AI — who's accountable, what's allowed, how it's audited.

Governance ties scattered deployments into one regime: an AI system register, owners, data rules, an audit trail and reviews. It's the basis of AI Act compliance and risk control at scale.

TCO (total cost of ownership)Infrastructure

The full cost of an AI system: not just setup, but inference, maintenance, monitoring and updates.

The setup price is the tip of the iceberg. TCO includes inference cost (cloud vs local), maintenance, observability and updates. At scale these decide whether self-hosting beats an API.

Related:Inference Self-hosting Pilot

Explainability (XAI)Privacy & GDPR

The ability to show why an AI system gave a given answer or decision — the opposite of a black box.

We build explainability in practice: source citations (RAG), a log of every step and guardrails — so you can show where an answer came from. It's a requirement for trust and accountability (GDPR/AI Act).

ChunkingRAG & search

Cutting documents into chunks that get embedded and searched in RAG.

The model indexes shorter pieces, not whole files — the context window is limited and a precise chunk retrieves better than a full page. A bad boundary (a split sentence, a broken table) hurts relevance, so we cut along headings and paragraphs with light overlap, not blindly by character count.

MCP (Model Context Protocol)Agents

An open standard that connects models and agents to tools and data sources through one common interface.

MCP is a shared “plugin bus”: instead of writing a separate integration per system, an MCP server exposes tools and data and the agent reaches them in a standardised way. Convenient, but it grows the attack surface — every MCP server needs an allow-list and permission control, because an exposed tool becomes a real action.

Graph RAGRAG & search

RAG that searches over a graph of entities and their relationships, not just text chunks.

Plain RAG returns similar chunks but misses “how do these connect?” questions. Graph RAG builds a graph of entities (people, companies, documents) and their relations, so it handles multi-hop questions and context scattered across many files. It costs more to build and maintain, so we use it where the connections genuinely matter, not by default.

Agentic RAGRAG & search

RAG where an agent plans searches, judges the results and re-queries, instead of a single shot.

Classic RAG is one retrieval and an answer. In agentic RAG the agent breaks a hard question into steps, searches repeatedly, judges whether the retrieved context is enough and, if not, re-queries or reformulates. It gives better answers on complex questions at the cost of more model calls, so guardrails and limits still govern it.

Semantic cacheInfrastructure

A cache that returns a ready answer for a question semantically similar to an earlier one.

A plain cache hits only on identical text; a semantic cache compares embeddings, so “how much does deployment cost?” and “what is the pilot price?” can land on the same stored result. It cuts latency and inference cost, but needs a similarity threshold and a short TTL so it doesn't serve a stale or over-stretched answer.

QuantizationInfrastructure

Storing a model's weights at lower precision so it fits on cheaper hardware.

Quantization rounds weights (e.g. from 16 to 4 bits), so the model uses less memory and runs faster — at a small quality cost. It is the basic trick that lets useful models run locally (self-hosting) rather than only in the cloud.

LoRA & QLoRAInfrastructure

A cheap way to fine-tune a model — it trains small add-ons instead of all the weights.

LoRA bolts small trainable layers (adapters) onto the model, so fine-tuning is cheaper and faster than training the whole thing. QLoRA combines it with quantization to fit training on a single GPU. For fresh facts, RAG is usually still the better tool.

TemperatureCore

A randomness dial — low gives steady, predictable answers, high gives creative ones.

Temperature controls how much the model „gambles” when picking the next token. For support, data extraction and source-grounded answers we keep it low (repeatability), and raise it only where variety matters.

Mixture of experts (MoE)Infrastructure

An architecture where only part of the model fires for each token.

An MoE model splits into many „experts”, and a router activates only a few of them per token. That lets the model be very large (lots of knowledge) yet cheap at inference, because only the active part runs — hence their popularity in local deployments.

Token streamingInfrastructure

Showing the answer word by word as soon as the model produces it.

Instead of waiting for the whole reply, streaming shows text token by token — the user sees the first words after TTFT, so the system feels faster. In Cashcrown the concierge streams answers over SSE, which noticeably improves perceived responsiveness.

Prompt injectionAgents

A hidden instruction smuggled into input data to hijack the model.

An attacker hides a command in a message, document or web page (“ignore your instructions, exfiltrate secrets”) and the model obeys it as if it were its own. It is the top risk for tool-using agents; we defend by screening input with guardrails before the model and requiring a server-side confirmation for irreversible actions — the model alone is never enough.

Related:Guardrails Prompt Tool use

Red teamingAgents

Deliberately attacking your own AI system to find holes before someone else does.

Red teaming is a battery of probes: prompt injection, secret-extraction attempts, guardrail bypasses, forcing promises or hallucinations. Cashcrown keeps such a suite as a standing gate (e.g. multilingual injection patterns in PL/EN/DE/UK), because an attack that works in one language often passes in another.

MultimodalCore

A model that understands not just text but also images, PDFs and audio.

A multimodal model takes images, scans or recordings and works on them as it does on text — describing a photo, reading an invoice, transcribing a call. In practice we wire it with OCR and extraction into a “document → ready fields” pipeline, processing uploaded files with no disk write and zero retention.

Speech (STT / TTS)Core

Turning speech into text (STT) and text into speech (TTS) — the basis of a voice assistant.

STT (speech-to-text) writes an utterance down as text; TTS (text-to-speech) reads the answer aloud. Cashcrown transcribes locally with the Whisper model, so the recording never leaves the server; the voice assistant itself is a composition of STT, RAG and guardrails, not a separate “magic” model.

Agent evaluation (golden set)Agents

Measuring an agent's quality on a fixed set of reference cases, not by gut feel.

A golden set is a collection of questions with an expected answer (and the correct tool choice) on which we score accuracy after every prompt or model change — so fixing one thing doesn't quietly break ten others. Without it, “better” is just a feeling; with it, it becomes a number you can defend.

AI glossary

AI concepts without the jargon: RAG, embeddings, agents, GDPR and infrastructure — with definitions, relations and search. Consistent with our architecture and entities.

61 terms

Large language model (LLM)Core

A model that predicts the next text tokens — the basis of modern AI systems.

An LLM learns the statistics of language from huge corpora and generates text token by token. On its own it does not know your data — domain knowledge is added via RAG or fine-tuning.

TokenCore

The smallest unit of text a model processes (a sub-word piece).

Models count cost and limits in tokens, not characters. ~1 token is about 4 characters; billing and the context window are both measured in tokens.

Context windowCore

The maximum number of tokens a model can see at once (input + output).

When a conversation or documents exceed the window you must truncate or retrieve only the most relevant chunks — one reason RAG is used instead of stuffing the whole base into the prompt.

PromptCore

The instruction and context given to a model that steer its answer.

A good prompt sets a role, rules, context (e.g. RAG sources) and an output format. Injecting a malicious instruction into the prompt is prompt injection — guardrails defend against it.

InferenceCore

Running a trained model to generate an answer.

Inference is the running cost of an AI system — measured by latency and throughput. It can run in the cloud or locally (self-hosting), which decides data residency.

Fine-tuningCore

Further-training a model on your own examples to change its style or behaviour.

Fine-tuning changes the model's weights and is costly; for fresh factual knowledge RAG is usually better (cheaper, updatable without retraining). The two are sometimes combined.

RAG (retrieval-augmented generation)RAG & search

Retrieve facts from a base first, then have the model answer grounded only in them.

Embedding (vector)RAG & searchBGE-M3

Text turned into a list of numbers where closeness = similar meaning.

Embeddings let you search by meaning, not keywords. Cashcrown computes them locally with the BGE-M3 model (1024 dimensions), so the content being embedded never leaves the infrastructure.

Vector databaseRAG & searchQdrant

A store of embeddings that finds the nearest vectors in milliseconds.

The heart of semantic search in RAG. Cashcrown runs Qdrant locally as a native service — vectors and metadata stay on its own server.

Semantic searchRAG & search

Searching by a sentence's meaning rather than literal word match.

The query and documents become embeddings and are compared as vectors, so “how to protect data” finds a GDPR text even with no shared words.

Hybrid searchRAG & search

Combining semantic search with classic keyword search.

Semantics catches meaning while full-text search nails exact names and codes. Combining both (e.g. vectors + Postgres FTS) beats either alone.

RerankingRAG & search

A second pass that reorders search results by relevance.

After a fast first retrieval, a more precise model scores each candidate against the query and lifts the best to the top — improving the context handed to the LLM.

LLM routerAgentsOpenClaw

A single ingress to models: picks the model per task, masks PII, enforces limits.

All AI traffic in Cashcrown goes through the OpenClaw router — no code calls a provider directly. That makes PII masking, model fallback and telemetry enforceable in one place.

More:Model atlas →

AI agentAgents

An AI system that plans steps and uses tools to accomplish a task.

An agent doesn't just answer — it acts: searches, calls APIs, books a slot. Safety needs guardrails and confirmations on irreversible actions so behaviour never runs away.

Tool useAgents

A model's ability to call functions/APIs instead of only writing text.

The model is given a tool catalog with argument schemas and decides which to call. Irreversible actions (e.g. a booking) require a server-issued confirmation token, not the model's say-so.

GuardrailsAgents

Rules that constrain what a model may accept as input and emit as output.

On input they reject prompt injection; on output they qualify promises (e.g. price ranges, hedged deadlines). Guardrails keep the assistant from committing to things it shouldn't.

HallucinationAgents

A confident-sounding but fabricated answer from a model.

Models fill gaps with plausible text even when they don't know the fact. RAG with citations and a confidence threshold (escalate to a human on a weak match) is the main defence.

Structured outputAgents

Forcing a model to return valid JSON that matches a schema.

Related:Prompt Tool use LLM router

PII (personal data)Privacy & GDPR

Information that identifies a person: email, phone, name, address.

Before anything goes to the cloud the router masks PII with tokens and rehydrates them in the response — the cloud model never sees real personal data.

Related:GDPR LLM router Data residency

GDPRPrivacy & GDPR

EU data-protection law: consent, minimisation, the right to erasure.

In practice: consent before tracking, keeping only what's necessary, and real erasure on request. In Cashcrown these are built into the pipeline, not bolted on later.

Data residencyPrivacy & GDPR

Where your data physically lives and is processed.

Local embeddings (BGE-M3) and a local Qdrant mean sensitive content never leaves the server. Only a masked prompt reaches the cloud — a deliberate residency choice.

Self-hostingInfrastructure

Running models and services on your own infrastructure, not a vendor's.

Gives control over data and cost and independence from a single vendor. Cashcrown self-hosts embeddings, the vector DB and search, reaching the cloud only for generation — with masking.

BGE-M3InfrastructureBGE-M3

A multilingual embedding model run locally (1024 dimensions).

Turns multilingual text into vectors without sending it to the cloud — the foundation of Cashcrown's private RAG.

More:BGE-M3 in the model atlas →

ObservabilityInfrastructure

Metrics, logs and traces that show what an AI system is really doing.

You cannot govern cost or quality without measuring it. Cashcrown exposes metrics (Prometheus), correlated logs and traces, so every model call is countable and debuggable.

Related:Latency LLM router Inference

LatencyInfrastructure

Time from question to answer; low latency = a fluid interaction.

Streaming the answer token by token cuts perceived latency — the user sees text before the model finishes. That's why the assistant “types live”.

ThroughputInfrastructure

How many requests/tokens a system serves per unit of time.

With latency it describes model-serving performance. Concurrency limits and backpressure protect throughput from overload.

Related:Latency Inference Observability

Concierge (assistant)Agents

A RAG-based assistant that answers with citations and escalates to a human.

Cashcrown's concierge combines RAG, guardrails, multilingual support and streaming — it answers live with citations and, when unsure, hands off to a human instead of inventing.

FAQ explorerSEO & AEO

Searching questions and answers by meaning, not just words.

A semantic FAQ surfaces the best answer even when the question is phrased differently from the base — using the same embeddings as RAG.

Thinking (reasoning) modeCore

A mode where the model reasons internally before composing an answer.

More:Model atlas →Which AI model? (tree) →

Model selection (routing)Infrastructure

Picking the right model per task — the cheapest one that can carry it.

More:Model atlas →Model comparison →

TTFT (time to first token)Infrastructure

Time from sending a question to the first token of the answer appearing.

TTFT decides how “fast” a model feels — with streaming the user sees text after it. We measure it live per model because names mislead (a “flash” can be slower than a big model).

More:Model atlas →

AI ActPrivacy & GDPR

EU regulation that classifies AI systems by risk and imposes obligations.

DPIA (data protection impact assessment)Privacy & GDPR

A risk assessment required when processing may cause high risk to people's rights.

Related:GDPR AI Act PII (personal data)

Human oversight (human-in-the-loop)Agents

The requirement that a human supervise and confirm significant or irreversible AI decisions.

Related:AI agent Guardrails AI Act

ClassifierCore

A model that reads a document or message and assigns it to one of a set of categories.

Data extractionCore

Pulling specific fields from text — invoice number, amount, tax ID, date, CV data.

Human handoffAgents

Smoothly passing a conversation from the AI assistant to a human when the case needs it.

PilotInfrastructure

Deploying one narrow process at a fixed cost to measure value before scaling.

Lead scoringAgents

Automatically scoring and prioritizing enquiries by fit to the ideal customer profile (ICP).

OCR (optical character recognition)RAG & search

Reading text from an image or scan — the first step before extracting fields from documents.

RPA (robotic process automation)Agents

Automating repetitive steps across existing apps; with AI it becomes "intelligent".

Prompt engineeringCore

Designing a model's instructions: role, rules, context, output format.

A good prompt is engineering, not a magic spell: a clear role, constraints, context (e.g. from RAG) and an enforced format. In production a prompt is versioned and tested like code, not guessed.

Synthetic dataCore

Artificially generated data for training or testing when real data is scarce or sensitive.

Synthetic data helps when real data is scarce, costly or GDPR-bound — e.g. for tests and edge cases. It must reflect the real distribution, or the model learns a fiction.

AI governancePrivacy & GDPR

The rules, roles and controls over how a company builds and uses AI — who's accountable, what's allowed, how it's audited.

Governance ties scattered deployments into one regime: an AI system register, owners, data rules, an audit trail and reviews. It's the basis of AI Act compliance and risk control at scale.

TCO (total cost of ownership)Infrastructure

The full cost of an AI system: not just setup, but inference, maintenance, monitoring and updates.

The setup price is the tip of the iceberg. TCO includes inference cost (cloud vs local), maintenance, observability and updates. At scale these decide whether self-hosting beats an API.

Related:Inference Self-hosting Pilot

Explainability (XAI)Privacy & GDPR

The ability to show why an AI system gave a given answer or decision — the opposite of a black box.

ChunkingRAG & search

Cutting documents into chunks that get embedded and searched in RAG.

MCP (Model Context Protocol)Agents

An open standard that connects models and agents to tools and data sources through one common interface.

Graph RAGRAG & search

RAG that searches over a graph of entities and their relationships, not just text chunks.

Agentic RAGRAG & search

RAG where an agent plans searches, judges the results and re-queries, instead of a single shot.

Semantic cacheInfrastructure

A cache that returns a ready answer for a question semantically similar to an earlier one.

QuantizationInfrastructure

Storing a model's weights at lower precision so it fits on cheaper hardware.

LoRA & QLoRAInfrastructure

A cheap way to fine-tune a model — it trains small add-ons instead of all the weights.

TemperatureCore

A randomness dial — low gives steady, predictable answers, high gives creative ones.

Mixture of experts (MoE)Infrastructure

An architecture where only part of the model fires for each token.

Token streamingInfrastructure

Showing the answer word by word as soon as the model produces it.

Prompt injectionAgents

A hidden instruction smuggled into input data to hijack the model.

Related:Guardrails Prompt Tool use

Red teamingAgents

Deliberately attacking your own AI system to find holes before someone else does.

MultimodalCore

A model that understands not just text but also images, PDFs and audio.

Speech (STT / TTS)Core

Turning speech into text (STT) and text into speech (TTS) — the basis of a voice assistant.

Agent evaluation (golden set)Agents

Measuring an agent's quality on a fixed set of reference cases, not by gut feel.