KnowledgeCore — Enterprise RAG Knowledge Assistant
Built KnowledgeCore — a production-grade, on-premises RAG (Retrieval-Augmented Generation) knowledge assistant for a 1,200-person manufacturing enterprise. The system ingests 12,000+ internal documents (SOPs, SharePoint wikis, support tickets, policy PDFs) into a secured pgvector store, allows any employee to query in plain English and receive cited, role-filtered answers in under 1.5 seconds. Integrated with Slack, Microsoft Teams, and the internal IT portal via SSO. Reduced internal support escalations by 62% and cut new-hire time-to-productivity by 3 weeks.
Client
KnowledgeCore Enterprise
Project Value
$22,000+
Rating
The Challenge
A 1,200-person manufacturing company had 12 years of operational knowledge buried across SharePoint folders, Confluence wikis, Zendesk tickets, and email threads. Senior engineers spent an estimated 6–9 hours per week answering repetitive questions from newer team members. Onboarding a new engineer to full productivity took 8+ weeks. Leadership needed a way to make institutional knowledge searchable and self-serve — without moving sensitive SOPs to a third-party cloud AI provider, and without retraining every time a document changed. The core requirement: an enterprise RAG system that ran entirely on-premises, enforced role-based document access, and integrated into the tools employees already used daily.
The Goal
Deploy a secure, on-premises AI knowledge assistant that ingests and indexes all internal documentation without sending data to external APIs, answers employee questions in natural language with source citations and confidence scores, enforces role-based document access so junior staff cannot retrieve executive financials, and integrates into Slack, Teams, and the internal helpdesk portal — without requiring a new login or workflow. Target: p95 query latency under 2 seconds, zero data exfiltration, ISO 27001-compliant audit logging.
Solution & Implementation
1Analysis
Audited all 14 internal knowledge sources across SharePoint, Confluence, Google Drive, Zendesk, and email archives. Categorized 12,000+ documents by sensitivity tier (public / internal / restricted / executive) and mapped which employee roles should access each tier. Identified the 40 most-asked repeat questions from Zendesk ticket history to use as evaluation benchmarks. Found that 68% of L1 support tickets were answerable by existing documentation — they just were not findable. This discovery validated the business case: the problem was retrieval, not knowledge gaps.
2Designing Solution
Designed a modular RAG pipeline: document ingestion (PDF, DOCX, HTML, Markdown) → chunking with context-preserving overlap → embedding via self-hosted all-MiniLM-L6-v2 sentence transformer → storage in pgvector on PostgreSQL 16. Retrieval uses hybrid search — dense vector similarity + BM25 keyword ranking, re-ranked by a cross-encoder. Each retrieved chunk carries its source document, page, access tier, and last-modified date, passed to Llama 3.1 70B (self-hosted via Ollama) as structured context with a strict citation instruction. Zero user data leaves the enterprise network.
3Customizing Business Logic
Built a middleware layer that reads the user's SSO identity (Azure AD / Okta) and filters vector retrieval to only return chunks within the user's access tier before the LLM sees them. Built three integration surfaces: a Slack bot (slash command + DM), a Microsoft Teams bot (via Bot Framework), and an embedded web widget inside the IT helpdesk portal. All queries and responses log to an immutable audit table with timestamp, user ID, query hash, retrieved sources, and answer — satisfying ISO 27001 and internal compliance requirements. Added a feedback loop: employees can rate answers (thumbs up/down), and low-rated answers surface in a weekly review queue for the knowledge ops team to improve source documents.
4Scale & Optimize
Built a scheduled ingestion service that polls SharePoint and Confluence change feeds every 15 minutes, re-chunks and re-embeds only modified documents, and updates the vector store incrementally — keeping answers current without full re-embedding. Deployed on a 2× A10G GPU node achieving p95 query latency of 1.4 seconds. Added a confidence score display so employees can see how certain the retrieval is before acting on an answer. The combination of hybrid retrieval, cross-encoder re-ranking, and citation enforcement reduced hallucination rate to under 3% on the benchmark question set.
Results & Impact
Internal Support Queries Deflected
New-Hire Time-to-Productivity
Documents Indexed
Query Latency (p95)
Monthly Active Users
Hybrid RAG retrieval — dense vector similarity + BM25 keyword ranking + cross-encoder re-ranking — delivers cited, source-attributed answers from 12,000+ enterprise documents with under 3% hallucination rate
Zero data exfiltration: Llama 3.1 70B runs fully on-premises via Ollama; no employee query or document chunk ever leaves the corporate network
Role-filtered retrieval enforces document access tiers via Azure AD / Okta SSO middleware before the LLM sees any context — junior staff cannot retrieve executive financials
Three integration surfaces — Slack bot, Microsoft Teams bot, embedded helpdesk widget — employees query in the tools they already use with no new login required
Incremental ingestion re-indexes only changed documents every 15 minutes via SharePoint and Confluence change feeds, keeping answers current without full re-embedding
Immutable audit log captures every query, source, and response with user ID and timestamp for ISO 27001 compliance and internal security reviews
Employee feedback loop surfaces low-confidence answers to the knowledge ops team weekly, creating a continuous document improvement cycle
p95 query latency of 1.4 seconds on self-hosted GPU infrastructure serving 840+ monthly active users — faster than a typical Slack search
Key Technologies
"Before KnowledgeCore, a junior engineer would ping three different seniors to find the right SOP. Now they ask the assistant, get the exact paragraph with the source document linked, and move on. We reclaimed an estimated 900 engineer-hours in the first quarter alone."
David Harrington
KnowledgeCore Enterprise
Want similar results for your business?
We build the same for teams like yours — fixed scope, fixed price, senior engineers only.
Frequently Asked Questions
About Ortem Technologies
Ortem Technologies is a premier custom software, mobile app, and AI development company. We serve enterprise and startup clients across the USA, UK, Australia, Canada, and the Middle East. Our cross-industry expertise spans fintech, healthcare, and logistics, enabling us to deliver scalable, secure, and innovative digital solutions worldwide.
Ready to Build Something Like This?
We've built ai & enterprise solutions solutions like this for teams across the US, UK, and Middle East. Fixed scope. Fixed price. Senior engineers from day one.
Free 30-minute scoping call — no commitment, no sales pitch.


