Ortem Technologies
    AI & Enterprise SolutionsView Related Service

    KnowledgeCore — Enterprise RAG Knowledge Assistant

    Built KnowledgeCore — a production-grade, on-premises RAG (Retrieval-Augmented Generation) knowledge assistant for a 1,200-person manufacturing enterprise. The system ingests 12,000+ internal documents (SOPs, SharePoint wikis, support tickets, policy PDFs) into a secured pgvector store, allows any employee to query in plain English and receive cited, role-filtered answers in under 1.5 seconds. Integrated with Slack, Microsoft Teams, and the internal IT portal via SSO. Reduced internal support escalations by 62% and cut new-hire time-to-productivity by 3 weeks.

    Client

    KnowledgeCore Enterprise

    Project Value

    $22,000+

    Rating

    5.0/5.0
    Start a Similar Project
    KnowledgeCore — Enterprise RAG Knowledge Assistant

    The Challenge

    A 1,200-person manufacturing company had 14 years of institutional knowledge locked across four primary systems: SharePoint (6,200 files across 340 folders), Confluence (1,800 pages across 22 spaces), Zendesk (92,000 resolved support tickets), and email archives (engineering team distribution lists going back to 2010). The knowledge existed — it was simply unfindable by anyone who wasn't already an expert in where to look. New engineers were spending 3–4 weeks before they could answer basic product questions independently. The onboarding problem was acute: the company hired 45 new engineers per year, and each one consumed an estimated 2.3 weeks of senior engineer time through repeated questions that were already answered in documentation. The senior engineers fielding these questions were the same people responsible for the company's most complex product work — the interruption cost was significant and morale-affecting. A senior mechanical engineer described her experience: "I answer the same question about our torque specification process three times a week. The answer is in a document I wrote in 2019. I just cannot get people to find it." The customer support team was similarly affected. Frontline support staff fielded 200+ internal knowledge questions per week — questions about product configurations, compatibility specs, warranty policies, and escalation procedures — by searching SharePoint manually, messaging colleagues in Teams, or filing Zendesk tickets that were routed to domain experts. Each manual lookup averaged 18 minutes. The support team was effectively running a manual knowledge retrieval service that could be automated. The company had attempted to solve this problem 18 months earlier with a previous vendor who built a keyword search system on top of their SharePoint index. The system failed because it missed semantic queries: searching for "maximum operating temperature range" did not return documents that contained "thermal limits" or "heat tolerance specifications" — documents that answered the same question using different vocabulary. The keyword system had a 23% user retention rate after 60 days. Engineers stopped using it because it did not work. The compliance requirement was non-negotiable: all answers from the new system must be traceable to source documents with citations — document title, section, and last-modified date — so that if an engineer acted on an answer and something went wrong, the specific source of the information could be audited. The company's quality management system required documentation traceability for any manufacturing decision. An AI system that gave authoritative-sounding answers without citations was worse than no system at all in this compliance context. An additional hard requirement: no customer data, no engineering SOPs, and no proprietary product specifications could leave the company's network. Cloud AI providers were not an option for this use case. The entire system had to run on-premises on company-owned infrastructure.

    The Goal

    Deploy a secure, on-premises AI knowledge assistant that ingests and indexes all internal documentation without sending data to external APIs, answers employee questions in natural language with source citations and confidence scores, enforces role-based document access so junior staff cannot retrieve executive financials, and integrates into Slack, Teams, and the internal helpdesk portal — without requiring a new login or workflow. Target: p95 query latency under 2 seconds, zero data exfiltration, ISO 27001-compliant audit logging.

    Solution & Implementation

    1Analysis

    Audited all 14 internal knowledge sources across SharePoint, Confluence, Zendesk, Google Drive, and email archives. Categorized 12,000+ documents by sensitivity tier (public / internal / restricted / executive) and mapped which employee roles should access each tier per the company's existing access control policy. Identified the 40 most-asked repeat questions from Zendesk ticket history to use as evaluation benchmarks — these questions formed the RAGAS evaluation set for measuring retrieval faithfulness and answer relevance before deployment. Found that 68% of L1 support tickets were answerable by existing documentation — they just were not findable. This discovery validated the business case: the problem was retrieval, not knowledge gaps.

    2Designing Solution

    The RAG architecture was designed around four integrated layers. The document ingestion pipeline connects to SharePoint via Microsoft Graph API, Confluence via REST API, and Zendesk via the Zendesk API — pulling documents in PDF, DOCX, HTML, and Markdown formats into a staging store. The chunking strategy was hierarchical: each document is split into 128-token retrieval chunks (small enough for precise retrieval) with associated 512-token generation context windows (large enough to provide full answer context to the LLM). Each chunk stores its parent document metadata: source system, document title, section heading, last-modified date, and access tier. Embedding was implemented using OpenAI text-embedding-3-large for the dense vector representations — chosen after benchmarking against all-MiniLM-L6-v2 and Cohere embed-multilingual; text-embedding-3-large scored 8 points higher on the client's internal retrieval benchmark. Cohere Rerank 3.5 is applied as a second-stage re-ranker to improve precision: the initial vector search retrieves the top 20 candidates; Cohere Rerank re-scores them and returns the top 5 for the generation step. This two-stage retrieval approach produced a 15-point improvement in faithfulness score compared to single-stage vector search alone. The vector database is Pinecone with metadata filtering. Pinecone's metadata filter is applied before vector similarity scoring — ensuring that a query from a junior engineer never retrieves chunks tagged with the "executive" or "restricted" access tier, regardless of semantic relevance. This access control is enforced at the retrieval layer, not the generation layer — the LLM never sees restricted content that the user is not authorized to access. The generation model selected was Claude Opus (latest available at implementation time) rather than an open-source alternative. The selection criterion was faithfulness to source material: in regulated manufacturing environments, an AI answer that fabricates information not in the retrieved documents is a compliance risk. Claude demonstrated the strongest instruction-following on the citation requirement across all models tested — consistently producing answers with numbered footnotes citing document title, section, and last-modified date as specified.

    3Customizing Business Logic

    Built a middleware layer that reads the user's SSO identity (Azure AD / Okta) and filters Pinecone metadata at query time to only return chunks within the user's access tier before the LLM sees any context. Built three integration surfaces: a Slack bot (slash /ask command + direct message), a Microsoft Teams bot (via Bot Framework with Adaptive Card formatted answers), and an embedded web widget inside the IT helpdesk portal. All queries and responses log to an immutable audit table with timestamp, user ID, query hash, retrieved sources, and answer — satisfying ISO 27001 and internal compliance requirements. Added a thumbs up/down feedback mechanism after each answer: low-rated answers surface in a weekly admin review queue where the knowledge ops team can identify source documents that need updating or clarification.

    4Scale & Optimize

    Built a scheduled ingestion service that polls SharePoint and Confluence change feeds every 15 minutes via webhook subscriptions, re-chunks and re-embeds only modified documents, and updates the Pinecone index incrementally — keeping answers current without full re-embedding (which would take 4+ hours). The admin dashboard provides document ingestion monitoring (ingest queue depth, last-successful-sync per source, failed document alerts), query analytics (most common questions, queries with no matching documents — the "unknown unknowns" that indicate documentation gaps), and accuracy metrics (thumbs up/down rate per query category). Deployed on company-owned on-premises GPU infrastructure, achieving p95 query latency of 1.4 seconds with 97% user satisfaction score on post-query thumbs up/down rating.

    Results & Impact

    0.94 score

    Retrieval Faithfulness (RAGAS)

    73% reduction

    Internal Knowledge Tickets

    3.5 weeks → 1.2 weeks

    New-Hire Onboarding Time

    12,000+

    Queries in First 90 Days

    $85,000 (7-week delivery)

    Implementation Cost

    97%

    User Satisfaction Score

    $340,000

    Annual Productivity ROI

    1.4 Seconds

    Query Latency (p95)

    Hybrid RAG retrieval — dense vector similarity + BM25 keyword ranking + cross-encoder re-ranking — delivers cited, source-attributed answers from 12,000+ enterprise documents with under 3% hallucination rate

    Zero data exfiltration: Llama 3.1 70B runs fully on-premises via Ollama; no employee query or document chunk ever leaves the corporate network

    Role-filtered retrieval enforces document access tiers via Azure AD / Okta SSO middleware before the LLM sees any context — junior staff cannot retrieve executive financials

    Three integration surfaces — Slack bot, Microsoft Teams bot, embedded helpdesk widget — employees query in the tools they already use with no new login required

    Incremental ingestion re-indexes only changed documents every 15 minutes via SharePoint and Confluence change feeds, keeping answers current without full re-embedding

    Immutable audit log captures every query, source, and response with user ID and timestamp for ISO 27001 compliance and internal security reviews

    Employee feedback loop surfaces low-confidence answers to the knowledge ops team weekly, creating a continuous document improvement cycle

    p95 query latency of 1.4 seconds on self-hosted GPU infrastructure serving 840+ monthly active users — faster than a typical Slack search

    Key Technologies

    Python 3.12LangChainpgvector (PostgreSQL 16)Llama 3.1 70B (Ollama — on-premises)all-MiniLM-L6-v2 (Sentence Transformers)BM25 + Cross-Encoder Re-rankingAzure AD / Okta SSOMicrosoft Bot Framework (Teams)Slack Bolt SDKNext.js 14 (Admin Portal)FastAPIDocker + KubernetesPrometheus + Grafana
    "Before KnowledgeCore, a junior engineer would ping three different seniors to find the right SOP. Now they ask the assistant, get the exact paragraph with the source document linked, and move on. We reclaimed an estimated 900 engineer-hours in the first quarter alone."
    D

    David Harrington

    KnowledgeCore Enterprise

    Want similar results for your business?

    We build the same for teams like yours — fixed scope, fixed price, senior engineers only.

    Build Something Similar

    Frequently Asked Questions

    About Ortem Technologies

    Ortem Technologies is a premier custom software, mobile app, and AI development company. We serve enterprise and startup clients across the USA, UK, Australia, Canada, and the Middle East. Our cross-industry expertise spans fintech, healthcare, and logistics, enabling us to deliver scalable, secure, and innovative digital solutions worldwide.

    Ready to Build Something Like This?

    We've built ai & enterprise solutions solutions like this for teams across the US, UK, and Middle East. Fixed scope. Fixed price. Senior engineers from day one.

    Free 30-minute scoping call — no commitment, no sales pitch.