Corporate GPT: AI assistant on your knowledge base

Q: Will our data go to the cloud?

Depends on the chosen architecture. With a local model ([self-hosting](/en/wiedza/slownikself-hosting)), the entire stack runs on your infrastructure, and no text leaves the corporate network. With a cloud model, we mask [PII](/en/wiedza/slownikpii) before sending the query. The choice depends on data sensitivity and [GDPR](/en/wiedza/slownikrodo) requirements—we discuss this during the pilot phase.

RAG pipeline: an answer grounded in your sources, with a citation — not from the model's memory.

The customer service department at a software company spends several hours every day answering the same questions: how to reset a password, what the contract terms are, when an invoice is issued. The knowledge exists, the documents exist, but each consultant searches for answers separately—in Confluence, Notion, old email threads. This isn’t a knowledge gap problem. It’s a problem of accessing knowledge at the right moment. A corporate AI assistant based on documents solves exactly that.

How a corporate GPT differs from a regular chatbot#

"Chatbot" and "RAG assistant" are two different architectures worth distinguishing before choosing a technology:

Feature	Regular chatbot / fine-tuning	RAG assistant on a knowledge base
Answer source	Knowledge encoded in model weights	Your documents, indexed in real time
Up-to-dateness	Requires retraining after changes	Just reindex the base
Hallucination risk	High (model interpolates what it doesn’t know)	Low with proper guardrails configuration
Source citation	None	Document fragment + link / page number
Knowledge update cost	High (fine-tuning for every change)	Low (reindex new files)
Scope control	Difficult	Built-in by design

Practical rule: if your knowledge changes more often than once a quarter (and in most companies, it changes weekly), RAG is the right architecture. Save fine-tuning for models specializing in style and format, not current facts.

A ready platform or your own deployment?#

Before you compare architectures, it’s worth comparing the buying alternatives. A serious buyer asks: “why build it, when Microsoft 365 Copilot, ChatGPT Enterprise, or Glean already index our SharePoint and Confluence?” The honest answer: for generic Q&A over data in a single ecosystem, a ready platform is fast and often enough. Your own deployment wins when at least one of these requirements appears:

Non-standard sources — a product database (CSV/JSON), call transcripts, email threads, data outside a single ecosystem.
Self-hosting and data residency — the whole stack (self-hosting) on your infrastructure, data never leaving the country.
Your own guardrails and scope — hard control over what the assistant answers, plus an auditable trail for the AI Act and GDPR.
No vendor lock-in and cost at scale — ready platforms bill per-seat/per-message; your own on your own router gives a predictable cost.

Most often the best route is hybrid: a ready platform where it suffices, your own assistant for the process where your data and integration matter. Full criteria and a decision table: your own assistant or a ready one and the build vs buy comparison; what to choose for a specific case is suggested by the stack selector.

How RAG works step by step#

To design the system well, it’s worth understanding each step in the processing pipeline:

Indexing (once, then incrementally): Each document is split into chunks. Each chunk goes through an embedding model—we use BGE-M3 running locally—and is converted into a numerical vector. Vectors go to a vector database. No text leaves your infrastructure at this stage.

Query (real-time): The user asks a question. The question is vectorized using the same model. Semantic search retrieves 3–8 document fragments most similar to the question. Optionally, we apply reranking, which reorders fragments by relevance before passing them to the model.

Answer generation: The language model (via LLM router) receives the question plus retrieved fragments in context. The answer is formulated solely based on them. If the fragments don’t contain the answer, the model states directly: “I don’t have this information in the base” and suggests contacting a human.

This last point is the difference between an assistant you can trust and one that politely fabricates. Guardrails enforce admitting ignorance instead of interpolating answers.

What knowledge can be indexed#

Almost any structured format works well. Here’s what we handle in a typical deployment:

Word / PDF documents — procedures, regulations, product specs, sales proposals
FAQs and help centers — content exported from Zendesk, Intercom, Notion, Confluence
Product databases — descriptions, parameters, price ranges, delivery terms (JSON / CSV)
Emails and threads — customer service history as a case base (with PII anonymization)
Call transcripts — especially valuable for post-sales support

What to avoid at the start: documents with many image-based tables (scanned PDFs without text layer), bases with conflicting versions of the same information without “valid from / withdrawn” labels, and entire repositories—we index documentation, not code.

Good rule: before indexing a thousand files, index a hundred of the most important ones and measure answer accuracy. The quality of the knowledge base sets the ceiling for assistant quality, not the other way around.

Security layer: not optional#

A corporate assistant operates on data that has value and whose leakage costs. That’s why we design security from the first line, not add it afterward.

PII masked before the cloud. If documents contain personal data, we mask it before sending to a cloud model. Alternatively—the entire stack (embedding + model) runs locally on your infrastructure (self-hosting).

Guardrails enforcing scope. The system answers only questions covered by the base. Questions on topics outside the scope (e.g., a request to write code or a political opinion) are rejected with a message and an option to switch to a human.

Injection and prompt attacks. Guardrails filter user input before it reaches the model—blocking attempts to extract secrets from context, inject instructions, or attack the prompt.

Human handoff for out-of-scope cases. An assistant that doesn’t know doesn’t guess—it hands off the conversation to a human with full thread context. Without this, every model error becomes a customer problem. More on this pattern in the article about AI agent security.

Logs and accountability. Every query and answer is logged without PII—not to spy on users, but to have an audit trail, quality measurement, and GDPR compliance. This trail is the foundation of the accountability GDPR requires, while formal log registers become a hard AI Act requirement once the system falls into higher risk (profiling, scoring, decisions about people)—we lay out the full classification in the article on the AI Act and GDPR.

Where a corporate assistant delivers the biggest leverage#

Three types of deployments we see most often and that pay off fastest:

Customer service and helpdesk. In our deployments the assistant typically handles 40–70% of repetitive questions without human intervention—the real share depends on the quality of the base and the profile of tickets. Consultants see handed-off conversations with full context—no starting from “what’s this about?” Measurable result: time to first response, percentage of cases closed without escalation.

Internal knowledge base for employees. Onboarding a new employee shortens by dozens of hours because questions to senior colleagues are replaced by an assistant based on department documents. Measurable result: number of queries to the internal team, onboarding time.

Sales assistant pre-offer. A salesperson or customer on the website can ask about availability, parameters, and terms without waiting for an email response. Measurable result: time from inquiry to offer, conversion rate.

In each case, the starting point is the same: a narrow, well-described knowledge domain, a measured baseline (how much time does it take today?), a pilot on real traffic. Check your company’s readiness assessment before planning scope.

Time and cost: what to expect#

A corporate assistant is an engineering project, not a one-time platform configuration. A realistic deployment picture:

Pilot (one knowledge domain): usually a few weeks from document preparation to a working system with measured results. Detailed ranges depend on scope—calculate it in the ROI calculator.

What takes time? Not the model, not the infrastructure. Preparing and organizing the knowledge base (conflicting versions, duplicates, missing metadata) is usually 30–50% of the total pilot effort. That’s why we start with a document audit, not model configuration.

Maintenance cost. Indexing new documents is a low-cost operation. The variable cost is the number of queries to the cloud model—you can estimate it upfront in the inference calculator. For high traffic or sensitive data, a local model is often more optimal.

When it pays off (and when it doesn’t). The return is driven by one simple relationship: the volume of repetitive tickets × the cost of a consultant’s hour, minus the cost of inference. When the assistant handles 40–70% of repetitive questions (as above) from a base counted in the hundreds of tickets per month, the project usually pays back within a few months—because every hour a consultant doesn’t spend is a real saving, while the variable cost per query is small. It does NOT pay off where volume is low (a few dozen tickets/month), the questions are different every time, or the knowledge base is scattered and contradictory—in that case, first run the numbers on your own figures in the ROI calculator.

Where we DON’T promise: we don’t give fixed prices or timelines before a scope audit. Deployment scale (one domain vs. entire enterprise) changes numbers by an order of magnitude. The entry point is always a fixed-cost pilot—contact us with your process description.

Try it live#

Describe your knowledge base and main use case, and the model will show how to design the indexing pipeline and guardrails scope—as a starting point, not a ready project (playground: PII masked, zero retention):

▶Design an RAG pipeline for your knowledge basesandbox · reasoning

FAQ#

How does a corporate GPT differ from ChatGPT?#

ChatGPT responds from general knowledge encoded in the model—it knows nothing about your documents or procedures. A corporate RAG assistant responds only from your knowledge base: every answer has a source in a specific document fragment. Outside the base’s scope, it states directly “I don’t have this information,” instead of interpolating.

Will our data go to the cloud?#

Depends on the chosen architecture. With a local model (self-hosting), the entire stack runs on your infrastructure, and no text leaves the corporate network. With a cloud model, we mask PII before sending the query. The choice depends on data sensitivity and GDPR requirements—we discuss this during the pilot phase.

How large does the knowledge base need to be?#

No minimum threshold. We start pilots with a few dozen well-prepared documents. More important than quantity is quality and consistency: one well-described knowledge domain without conflicting versions delivers better results than a thousand unorganized files. The quality of the base sets the ceiling for assistant quality.

Can the assistant make mistakes?#

Yes. Every RAG system has a margin of error, especially with edge-case questions and documents with ambiguous content. That’s why guardrails enforce an “I don’t know” response instead of guessing, we log every answer for quality audit, and deployments on critical paths always include human handoff. The assistant is meant to offload humans, not replace them where mistakes cost.

How long does deployment take?#

A pilot for one knowledge domain usually takes weeks from document delivery to a working system with first measurements. The biggest variable is preparing the knowledge base on your side. Full scale and timeline require a scope audit—contact us to start with concrete numbers.

RAG pipeline: an answer grounded in your sources, with a citation — not from the model's memory.