cashcrown // wiedza
AI concepts without the jargon: RAG, embeddings, agents, GDPR and infrastructure — with definitions, relations and search. Consistent with our architecture and entities.
46 terms
A model that predicts the next text tokens — the basis of modern AI systems.
An LLM learns the statistics of language from huge corpora and generates text token by token. On its own it does not know your data — domain knowledge is added via RAG or fine-tuning.
Related:TokenInferenceRAG (retrieval-augmented generation)Fine-tuning
The smallest unit of text a model processes (a sub-word piece).
Models count cost and limits in tokens, not characters. ~1 token is about 4 characters; billing and the context window are both measured in tokens.
The maximum number of tokens a model can see at once (input + output).
When a conversation or documents exceed the window you must truncate or retrieve only the most relevant chunks — one reason RAG is used instead of stuffing the whole base into the prompt.
The instruction and context given to a model that steer its answer.
A good prompt sets a role, rules, context (e.g. RAG sources) and an output format. Injecting a malicious instruction into the prompt is prompt injection — guardrails defend against it.
Running a trained model to generate an answer.
Inference is the running cost of an AI system — measured by latency and throughput. It can run in the cloud or locally (self-hosting), which decides data residency.
Further-training a model on your own examples to change its style or behaviour.
Fine-tuning changes the model's weights and is costly; for fresh factual knowledge RAG is usually better (cheaper, updatable without retraining). The two are sometimes combined.
Related:Large language model (LLM)RAG (retrieval-augmented generation)Inference
Retrieve facts from a base first, then have the model answer grounded only in them.
RAG curbs hallucination: the model is handed concrete sources and cites them. It is the basis of trustworthy support — answers are grounded, and on a weak match the system escalates to a human instead of inventing.
Related:Embedding (vector)Vector databaseHybrid searchHallucinationLLM router
Text turned into a list of numbers where closeness = similar meaning.
Embeddings let you search by meaning, not keywords. Cashcrown computes them locally with the BGE-M3 model (1024 dimensions), so the content being embedded never leaves the infrastructure.
A store of embeddings that finds the nearest vectors in milliseconds.
The heart of semantic search in RAG. Cashcrown runs Qdrant locally as a native service — vectors and metadata stay on its own server.
Searching by a sentence's meaning rather than literal word match.
The query and documents become embeddings and are compared as vectors, so “how to protect data” finds a GDPR text even with no shared words.
Related:Embedding (vector)Vector databaseHybrid searchFAQ explorer
Combining semantic search with classic keyword search.
Semantics catches meaning while full-text search nails exact names and codes. Combining both (e.g. vectors + Postgres FTS) beats either alone.
A second pass that reorders search results by relevance.
After a fast first retrieval, a more precise model scores each candidate against the query and lifts the best to the top — improving the context handed to the LLM.
A single ingress to models: picks the model per task, masks PII, enforces limits.
All AI traffic in Cashcrown goes through the OpenClaw router — no code calls a provider directly. That makes PII masking, model fallback and telemetry enforceable in one place.
Related:Large language model (LLM)PII (personal data)Structured outputObservabilityModel selection (routing)Thinking (reasoning) mode
More:Model atlas →
An AI system that plans steps and uses tools to accomplish a task.
An agent doesn't just answer — it acts: searches, calls APIs, books a slot. Safety needs guardrails and confirmations on irreversible actions so behaviour never runs away.
A model's ability to call functions/APIs instead of only writing text.
The model is given a tool catalog with argument schemas and decides which to call. Irreversible actions (e.g. a booking) require a server-issued confirmation token, not the model's say-so.
Rules that constrain what a model may accept as input and emit as output.
On input they reject prompt injection; on output they qualify promises (e.g. price ranges, hedged deadlines). Guardrails keep the assistant from committing to things it shouldn't.
A confident-sounding but fabricated answer from a model.
Models fill gaps with plausible text even when they don't know the fact. RAG with citations and a confidence threshold (escalate to a human on a weak match) is the main defence.
Related:RAG (retrieval-augmented generation)GuardrailsLarge language model (LLM)
Forcing a model to return valid JSON that matches a schema.
Without it a model's answer is hard to parse safely. Cashcrown uses prompt-based JSON with schema validation and one repair — steadier than the slow native “json_schema” modes of some providers.
Related:PromptTool useLLM router
Information that identifies a person: email, phone, name, address.
Before anything goes to the cloud the router masks PII with tokens and rehydrates them in the response — the cloud model never sees real personal data.
Related:GDPRLLM routerData residency
EU data-protection law: consent, minimisation, the right to erasure.
In practice: consent before tracking, keeping only what's necessary, and real erasure on request. In Cashcrown these are built into the pipeline, not bolted on later.
Where your data physically lives and is processed.
Local embeddings (BGE-M3) and a local Qdrant mean sensitive content never leaves the server. Only a masked prompt reaches the cloud — a deliberate residency choice.
Running models and services on your own infrastructure, not a vendor's.
Gives control over data and cost and independence from a single vendor. Cashcrown self-hosts embeddings, the vector DB and search, reaching the cloud only for generation — with masking.
A multilingual embedding model run locally (1024 dimensions).
Turns multilingual text into vectors without sending it to the cloud — the foundation of Cashcrown's private RAG.
Related:Embedding (vector)Vector databaseSelf-hostingModel selection (routing)
Metrics, logs and traces that show what an AI system is really doing.
You cannot govern cost or quality without measuring it. Cashcrown exposes metrics (Prometheus), correlated logs and traces, so every model call is countable and debuggable.
Related:LatencyLLM routerInference
Time from question to answer; low latency = a fluid interaction.
Streaming the answer token by token cuts perceived latency — the user sees text before the model finishes. That's why the assistant “types live”.
Related:InferenceThroughputObservability
How many requests/tokens a system serves per unit of time.
With latency it describes model-serving performance. Concurrency limits and backpressure protect throughput from overload.
Related:LatencyInferenceObservability
A RAG-based assistant that answers with citations and escalates to a human.
Cashcrown's concierge combines RAG, guardrails, multilingual support and streaming — it answers live with citations and, when unsure, hands off to a human instead of inventing.
Related:RAG (retrieval-augmented generation)AI agentGuardrailsLatency
Searching questions and answers by meaning, not just words.
A semantic FAQ surfaces the best answer even when the question is phrased differently from the base — using the same embeddings as RAG.
Related:Semantic searchRAG (retrieval-augmented generation)Concierge (assistant)
A mode where the model reasons internally before composing an answer.
Thinking models do hidden reasoning — great for hard decisions but slower and costlier. Forced into ordinary chat they can return an empty answer, so we enable it only for reasoning tasks (the think parameter).
Related:Large language model (LLM)InferenceModel selection (routing)Latency
Picking the right model per task — the cheapest one that can carry it.
There is no single “best” model; the OpenClaw router picks one per task from measured throughput, time-to-first-token (TTFT) and context window — not the name. The model atlas shows the whole fleet with measured specs and per-task selection.
Related:LLM routerThroughputTTFT (time to first token)Thinking (reasoning) modeContext window
Time from sending a question to the first token of the answer appearing.
TTFT decides how “fast” a model feels — with streaming the user sees text after it. We measure it live per model because names mislead (a “flash” can be slower than a big model).
Related:LatencyThroughputModel selection (routing)
More:Model atlas →
EU regulation that classifies AI systems by risk and imposes obligations.
The AI Act splits systems into risk levels. For limited risk (chatbots, assistants) the key duty is transparency — the user must know they're talking to AI. High risk adds human oversight, technical documentation, log records and conformity assessment. Enforced from August 2026.
Related:GDPRDPIA (data protection impact assessment)Human oversight (human-in-the-loop)Guardrails
A risk assessment required when processing may cause high risk to people's rights.
A DPIA stems from GDPR and is typically required for large-scale profiling, sensitive data or automated decisions about people. An assistant that only answers from a knowledge base usually does not need one; a system that profiles or decides probably does.
Related:GDPRAI ActPII (personal data)
The requirement that a human supervise and confirm significant or irreversible AI decisions.
Human oversight is a pillar of compliance and safety: irreversible actions pass through a confirmation (human-gate), and the system acts autonomously only within a narrow, described scope. GDPR also grants a right not to be subject to solely automated decisions.
Related:AI agentGuardrailsAI Act
A model that reads a document or message and assigns it to one of a set of categories.
Classification is one of the fastest-ROI tasks: invoice coding, ticket categorization, lead scoring. The result is measurable by definition (percentage of correct assignments) and the process usually already exists manually — which makes it a good first deployment.
Related:Data extractionStructured outputLarge language model (LLM)
Pulling specific fields from text — invoice number, amount, tax ID, date, CV data.
Extraction turns unstructured text into system-ready fields. Together with classification it solves a large share of first AI ideas in companies. It works best with an enforced schema (structured output) that guarantees a valid format.
Related:ClassifierStructured outputRAG (retrieval-augmented generation)
Smoothly passing a conversation from the AI assistant to a human when the case needs it.
A handoff is a sign of a mature system, not a failure: on low confidence, customer frustration, or a case that needs a decision, the assistant escalates to a human instead of guessing. It is also part of AI Act transparency — the user can always reach a human.
Related:Human oversight (human-in-the-loop)Concierge (assistant)HallucinationAI Act
Deploying one narrow process at a fixed cost to measure value before scaling.
A pilot lowers risk on both sides: instead of a big contract we show a working system on one measurable process. If it delivers the numbers (hours saved, % cases closed without a human), we expand the scope; if not, it cost little.
Related:ClassifierObservabilityRAG (retrieval-augmented generation)
Automatically scoring and prioritizing enquiries by fit to the ideal customer profile (ICP).
Scoring is classification applied to sales: a form lead gets a score (budget, fit, readiness) and the most valuable ones reach a human first. Criteria are explicit and logged, not hidden profiling — which matters for GDPR and the AI Act.
Reading text from an image or scan — the first step before extracting fields from documents.
OCR turns a scanned invoice or contract into text the model can then classify and extract fields from (tax ID, amount, date). Combined with extraction and structured output it forms a full "scan → system-ready fields" pipeline.
Automating repetitive steps across existing apps; with AI it becomes "intelligent".
Classic RPA clicks and retypes by rigid rules. Combined with AI (classification, extraction, decisions) it handles variance — e.g. reads an invoice in any layout instead of requiring one template. We fill this role with agents that have a tool allow-list and a human-gate.
Related:AI agentTool useData extraction
Designing a model's instructions: role, rules, context, output format.
A good prompt is engineering, not a magic spell: a clear role, constraints, context (e.g. from RAG) and an enforced format. In production a prompt is versioned and tested like code, not guessed.
Artificially generated data for training or testing when real data is scarce or sensitive.
Synthetic data helps when real data is scarce, costly or GDPR-bound — e.g. for tests and edge cases. It must reflect the real distribution, or the model learns a fiction.
The rules, roles and controls over how a company builds and uses AI — who's accountable, what's allowed, how it's audited.
Governance ties scattered deployments into one regime: an AI system register, owners, data rules, an audit trail and reviews. It's the basis of AI Act compliance and risk control at scale.
Related:AI ActHuman oversight (human-in-the-loop)ObservabilityGuardrails
The full cost of an AI system: not just setup, but inference, maintenance, monitoring and updates.
The setup price is the tip of the iceberg. TCO includes inference cost (cloud vs local), maintenance, observability and updates. At scale these decide whether self-hosting beats an API.
Related:InferenceSelf-hostingPilot
The ability to show why an AI system gave a given answer or decision — the opposite of a black box.
We build explainability in practice: source citations (RAG), a log of every step and guardrails — so you can show where an answer came from. It's a requirement for trust and accountability (GDPR/AI Act).
Related:HallucinationGuardrailsHuman oversight (human-in-the-loop)AI Act