Securing RAG Systems: A Three‑Layer Permission Framework for Banking AI
This article explains why vector databases lack row‑level security, presents a three‑layer permission architecture—including JWT authentication, Milvus metadata or partition filtering, and post‑retrieval validation—covers document security levels, PostgreSQL RLS, audit logging, caching strategies, and offers interview‑ready talking points.
Why RAG Permission Issues Are Critical
Engineers often focus on recall, re‑ranking, and model performance while overlooking who can see which data. In regulated domains such as banking, exposing confidential documents can lead to compliance violations and severe penalties.
Three‑Layer Permission Architecture
Layer 1: Access‑Layer Identity Authentication
After login, the system issues a JWT token whose payload carries core permission attributes such as branch, role, and security_level:
{
"user_id": "emp_20041",
"name": "Zhang Wei",
"branch": "Shanghai Pudong",
"role": "client_manager",
"security_level": 2,
"departments": ["Corporate Banking"],
"exp": 1711382400
}The API gateway validates the token and propagates the permission context to downstream services, avoiding repeated database lookups.
Layer 2: Retrieval‑Layer Vector Database Filtering
This is the core layer where permissions are enforced during vector search. Two approaches are provided:
Metadata Filtering (Logical Isolation) : Each document chunk is indexed with mandatory metadata fields ( branch, department, security_level, creator). The search query builds a filter expression based on the user’s token, e.g.:
from pymilvus import Collection, connections
connections.connect("default", host="localhost", port="19530")
collection = Collection("documents")
def search_with_permission(query_embedding, user_branch, user_role, top_k=10):
filter_expr = f'(branch == "{user_branch}" or branch == "public") and security_level <= {get_security_level(user_role)}'
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"nprobe": 10}},
limit=top_k,
expr=filter_expr,
output_fields=["text", "source", "branch", "security_level"]
)
return resultsThis ensures that only documents the user is allowed to see are retrieved, preventing low recall caused by post‑filtering.
Partition Isolation (Physical Isolation) : Each branch’s data is stored in a dedicated Milvus partition. Searches are limited to authorized partitions, providing a hard barrier even if filter logic is buggy.
def create_branch_partition(collection, branch):
partition_name = f"branch_{branch.replace(' ', '_')}"
if not collection.has_partition(partition_name):
collection.create_partition(partition_name)
def insert_document(collection, doc, branch):
partition_name = f"branch_{branch}"
collection.insert(data=[doc], partition_name=partition_name)
def search_authorized_partitions(collection, embedding, user_branches):
partition_names = [f"branch_{b}" for b in user_branches]
results = collection.search(
data=[embedding],
anns_field="embedding",
param={"metric_type": "COSINE"},
limit=10,
partition_names=partition_names
)
return resultsMilvus supports up to 4096 partitions per collection; large banks can group branches by region and apply metadata filtering within each region.
Layer 3: Return‑Layer Secondary Verification
After retrieval, the system performs a final permission check before returning results, acting as a safety net for any misconfiguration in the previous layers.
Document Security Level Taxonomy
Documents are classified into four security levels:
Level 1 – Public : Company brochures, public policies, annual reports (accessible to all employees).
Level 2 – Internal : Operation manuals, internal regulations (available to regular staff only).
Level 3 – Sensitive : Customer risk reports, loan approval records (restricted to risk, compliance, and credit staff within the same branch).
Level 4 – Confidential : Executive compensation plans, M&A negotiations (accessible only to senior management with additional approvals).
Metadata tags are attached during ingestion, and a review workflow ensures that only authorized personnel can modify security labels.
Row‑Level Security in PostgreSQL
-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
-- Branch isolation policy
CREATE POLICY branch_isolation ON documents USING (branch = current_user_branch());
-- Security level filter
CREATE POLICY security_level_check ON documents USING (security_level <= get_user_security_level());Application code sets session variables (e.g.,
SET LOCAL app.user_branch = 'Shanghai Pudong'; SET LOCAL app.security_level = '2';) so that PostgreSQL automatically filters rows based on the current user.
Security Auditing
Every query is logged with user ID, branch, role, query text, retrieved document IDs and their security levels, and a timestamp:
audit_log = {
"timestamp": "2024-03-25T14:32:10.123Z",
"user_id": "emp_20041",
"user_branch": "Shanghai Pudong",
"user_role": "client_manager",
"query": "What is the customer's credit limit?",
"retrieved_doc_ids": ["doc_001", "doc_045", "doc_112"],
"retrieved_doc_security_levels": [1, 2, 2],
"response_generated": True,
"session_id": "sess_abc123"
}Alert rules detect abnormal patterns such as excessive queries per hour, rapid access to high‑security documents, or off‑hours activity.
Permission Caching for Performance
To avoid latency from repeated permission lookups, the full permission set is cached inside the JWT token (valid for 5 minutes). Token expiration forces a refresh, while a blacklist can immediately revoke tokens for downgraded users.
Interview Guide for RAG Permission Management
When asked in an interview, structure the answer in five layers:
Explain the problem: vector databases lack row‑level security, which is a compliance risk.
Describe the three‑layer architecture (authentication, retrieval filtering, return validation).
Contrast metadata filtering vs. partition isolation, noting trade‑offs and the 4096‑partition limit.
Share a common pitfall (post‑search filtering reduces recall) and the correct use of expr in search().
Highlight extra points such as PostgreSQL RLS, audit logging, and caching strategies.
Covering these points demonstrates both conceptual understanding and practical experience.
Conclusion
RAG permission management is an engineering completeness issue rather than a pure algorithmic challenge. Vector databases provide no built‑in access control, so a three‑layer defense—JWT‑based identity, Milvus filtering (metadata or partition), and post‑retrieval checks—must be implemented. Combining logical and physical isolation, PostgreSQL row‑level security, comprehensive audit logs, and token caching yields a robust, compliant solution that can be reused across domains.
Wu Shixiong's Large Model Academy
We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
