Artificial Intelligence 18 min read

Securing RAG Systems: A Three‑Layer Permission Framework for Banking AI

This article explains why vector databases lack row‑level security, presents a three‑layer permission architecture—including JWT authentication, Milvus metadata or partition filtering, and post‑retrieval validation—covers document security levels, PostgreSQL RLS, audit logging, caching strategies, and offers interview‑ready talking points.

Wu Shixiong's Large Model Academy

Mar 27, 2026

Securing RAG Systems: A Three‑Layer Permission Framework for Banking AI

Why RAG Permission Issues Are Critical

Engineers often focus on recall, re‑ranking, and model performance while overlooking who can see which data. In regulated domains such as banking, exposing confidential documents can lead to compliance violations and severe penalties.

Three‑Layer Permission Architecture

Layer 1: Access‑Layer Identity Authentication

After login, the system issues a JWT token whose payload carries core permission attributes such as branch, role, and security_level:

{
  "user_id": "emp_20041",
  "name": "Zhang Wei",
  "branch": "Shanghai Pudong",
  "role": "client_manager",
  "security_level": 2,
  "departments": ["Corporate Banking"],
  "exp": 1711382400
}

The API gateway validates the token and propagates the permission context to downstream services, avoiding repeated database lookups.

Layer 2: Retrieval‑Layer Vector Database Filtering

This is the core layer where permissions are enforced during vector search. Two approaches are provided:

Metadata Filtering (Logical Isolation) : Each document chunk is indexed with mandatory metadata fields ( branch, department, security_level, creator). The search query builds a filter expression based on the user’s token, e.g.:

from pymilvus import Collection, connections
connections.connect("default", host="localhost", port="19530")
collection = Collection("documents")

def search_with_permission(query_embedding, user_branch, user_role, top_k=10):
    filter_expr = f'(branch == "{user_branch}" or branch == "public") and security_level <= {get_security_level(user_role)}'
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE", "params": {"nprobe": 10}},
        limit=top_k,
        expr=filter_expr,
        output_fields=["text", "source", "branch", "security_level"]
    )
    return results

This ensures that only documents the user is allowed to see are retrieved, preventing low recall caused by post‑filtering.

Partition Isolation (Physical Isolation) : Each branch’s data is stored in a dedicated Milvus partition. Searches are limited to authorized partitions, providing a hard barrier even if filter logic is buggy.

def create_branch_partition(collection, branch):
    partition_name = f"branch_{branch.replace(' ', '_')}"
    if not collection.has_partition(partition_name):
        collection.create_partition(partition_name)

def insert_document(collection, doc, branch):
    partition_name = f"branch_{branch}"
    collection.insert(data=[doc], partition_name=partition_name)

def search_authorized_partitions(collection, embedding, user_branches):
    partition_names = [f"branch_{b}" for b in user_branches]
    results = collection.search(
        data=[embedding],
        anns_field="embedding",
        param={"metric_type": "COSINE"},
        limit=10,
        partition_names=partition_names
    )
    return results

Milvus supports up to 4096 partitions per collection; large banks can group branches by region and apply metadata filtering within each region.

Layer 3: Return‑Layer Secondary Verification

After retrieval, the system performs a final permission check before returning results, acting as a safety net for any misconfiguration in the previous layers.

Document Security Level Taxonomy

Documents are classified into four security levels:

Level 1 – Public : Company brochures, public policies, annual reports (accessible to all employees).

Level 2 – Internal : Operation manuals, internal regulations (available to regular staff only).

Level 3 – Sensitive : Customer risk reports, loan approval records (restricted to risk, compliance, and credit staff within the same branch).

Level 4 – Confidential : Executive compensation plans, M&A negotiations (accessible only to senior management with additional approvals).

Metadata tags are attached during ingestion, and a review workflow ensures that only authorized personnel can modify security labels.

Row‑Level Security in PostgreSQL

-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
-- Branch isolation policy
CREATE POLICY branch_isolation ON documents USING (branch = current_user_branch());
-- Security level filter
CREATE POLICY security_level_check ON documents USING (security_level <= get_user_security_level());

Application code sets session variables (e.g.,

SET LOCAL app.user_branch = 'Shanghai Pudong'; SET LOCAL app.security_level = '2';

) so that PostgreSQL automatically filters rows based on the current user.

Security Auditing

Every query is logged with user ID, branch, role, query text, retrieved document IDs and their security levels, and a timestamp:

audit_log = {
  "timestamp": "2024-03-25T14:32:10.123Z",
  "user_id": "emp_20041",
  "user_branch": "Shanghai Pudong",
  "user_role": "client_manager",
  "query": "What is the customer's credit limit?",
  "retrieved_doc_ids": ["doc_001", "doc_045", "doc_112"],
  "retrieved_doc_security_levels": [1, 2, 2],
  "response_generated": True,
  "session_id": "sess_abc123"
}

Alert rules detect abnormal patterns such as excessive queries per hour, rapid access to high‑security documents, or off‑hours activity.

Permission Caching for Performance

To avoid latency from repeated permission lookups, the full permission set is cached inside the JWT token (valid for 5 minutes). Token expiration forces a refresh, while a blacklist can immediately revoke tokens for downgraded users.

Interview Guide for RAG Permission Management

When asked in an interview, structure the answer in five layers:

Explain the problem: vector databases lack row‑level security, which is a compliance risk.

Describe the three‑layer architecture (authentication, retrieval filtering, return validation).

Contrast metadata filtering vs. partition isolation, noting trade‑offs and the 4096‑partition limit.

Share a common pitfall (post‑search filtering reduces recall) and the correct use of expr in search().

Highlight extra points such as PostgreSQL RLS, audit logging, and caching strategies.

Covering these points demonstrates both conceptual understanding and practical experience.

Conclusion

RAG permission management is an engineering completeness issue rather than a pure algorithmic challenge. Vector databases provide no built‑in access control, so a three‑layer defense—JWT‑based identity, Milvus filtering (metadata or partition), and post‑retrieval checks—must be implemented. Combining logical and physical isolation, PostgreSQL row‑level security, comprehensive audit logs, and token caching yields a robust, compliant solution that can be reused across domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

RAG Vector Database Milvus jwt Permission management Security Auditing PostgreSQL RLS

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.