MacroCoderz

Overview

Klebbix RAG was developed as a modular Retrieval-Augmented Generation platform designed to unify structured and unstructured enterprise data within a secure, multi-tenant environment. The project focused on building an efficient, low-latency information retrieval layer capable of serving real-time contextual answers across spreadsheets, documents, and databases while meeting strict compliance and scalability requirements.

Client Context

A European SaaS provider managing internal and client documentation required an enterprise-grade AI retrieval engine. Their existing system relied on conventional keyword search, which failed to return relevant results across hybrid data formats. Each enterprise tenant stored thousands of files in different formats, creating fragmented information silos. The client needed a single RAG-based platform capable of integrating text documents, tabular datasets, and structured relational records while ensuring complete isolation of customer data.

Core Challenges

Data Fragmentation
Information existed across relational databases, PDFs, Word files, and CSV datasets. Traditional retrieval pipelines were unable to bridge structured and unstructured sources, leading to incomplete context generation and low-accuracy responses.

Tenant Isolation and Compliance
The system had to support multi-tenant deployment with full encryption and namespace separation. Each tenant’s vector store and relational data had to remain completely segregated to meet GDPR and ISO-27001 compliance standards.

Latency and Cost Control
Handling millions of documents per tenant led to significant embedding and inference costs. Existing solutions showed high token consumption and query latency exceeding acceptable enterprise thresholds. The new platform required cost-efficient retrieval with predictable performance at scale.

Solution Overview

The team designed Klebbix RAG as a containerized, self-hosted solution integrating structured and unstructured retrieval through a unified hybrid pipeline. The platform combined vector search for semantic relevance with SQL-based structured querying, ensuring comprehensive data coverage and consistent performance.

Hybrid Retrieval Pipeline
The core retrieval layer merged semantic vector matching with relational joins. Qdrant handled vector embeddings for text, while PostgreSQL managed structured fields. A middleware orchestrator coordinated both layers to deliver a single, ranked response context.

Security and Multi-Tenancy
Each tenant operated in a separate namespace with AES-256 encryption at rest and mutual TLS in transit. Role-based access (RBAC) was enforced through Azure AD SSO integration, maintaining strict authentication and access hierarchy.

Optimization and Cost Efficiency
Redis caching and query thresholding reduced redundant embedding requests. Cohere re-ranking optimized retrieval precision, cutting inference costs while preserving contextual accuracy.

Solution Description

Klebbix RAG was deployed using containerized microservices built with FastAPI. The backend interfaced directly with PostgreSQL for structured data and Qdrant for vector embeddings. The platform ran on Hetzner Cloud infrastructure configured for high-availability workloads.

Architecture Components

Backend Framework: FastAPI on Ubuntu LTS
Vector Database: Qdrant v1.8
Relational Database: PostgreSQL 15
Re-Ranker: Cohere Rerank-3
Authentication: Azure AD SSO with OIDC
Caching Layer: Redis 7
Monitoring: Grafana and Uptime Kuma
Deployment Environment: Docker Compose on Hetzner CAX31 (8 vCPU, 16 GB RAM)

Processing Flow

Uploaded documents and datasets were parsed, chunked, and embedded using OpenAI Ada and Cohere Embed.
Metadata and structured data were stored in PostgreSQL, while embeddings were indexed in Qdrant.
Queries triggered a hybrid retrieval process combining vector and SQL lookups.
Results were re-ranked by Cohere and passed to GPT-4 for contextual answer generation.
Redis caching reduced repeated query loads, ensuring consistent response times under 1.2 seconds per request.

Operational Impact

Decreased average query latency by 68 percent, achieving sub-second hybrid retrieval.
Reduced embedding and inference costs by 35 percent through batching and caching.
Achieved complete tenant data isolation verified under internal compliance audits.
Enabled unified retrieval across structured and unstructured formats with over 93 percent relevance accuracy on benchmark tests.

Strategic Outcomes

Klebbix RAG provided the client with a single, production-ready search infrastructure capable of handling hybrid enterprise data securely and efficiently. The system allowed teams to query documents, tables, and databases simultaneously without exposing sensitive tenant information.

The deployment established a scalable foundation for multi-tenant AI products, combining retrieval precision, compliance, and modular expandability. It continues to serve as the reference architecture for future enterprise AI infrastructure projects built on retrieval-augmented systems.