Artificial Intelligence 7 min read

How ComRAG Revolutionizes Real‑Time Community QA with Dynamic Vector Stores

ComRAG tackles the static‑knowledge gaps, uneven QA quality, and storage explosion of community question‑answer platforms by integrating a static documentation vector store with dual dynamic CQA stores managed via a centroid‑based memory, delivering higher accuracy, lower latency, and scalable storage for industrial retrieval‑augmented generation.

Data Party THU

Sep 11, 2025

How ComRAG Revolutionizes Real‑Time Community QA with Dynamic Vector Stores

Problem Statement

Community Question Answering (CQA) platforms such as Stack Overflow contain large amounts of expert knowledge but face three industrial challenges:

Static knowledge gaps – official documentation alone cannot cover many real‑world edge cases.

Historical QA quality is uneven – older answers may be superseded by higher‑scoring ones.

Real‑time retrieval with uncontrolled storage growth – new questions arrive continuously, requiring fast lookup while limiting vector‑store size.

ComRAG Framework

Core idea: combine authoritative official documentation, community experience, and a time‑aware forgetting mechanism for low‑quality content.

Static Knowledge Vector Store

Chunk official documents, embed each chunk, build an index, and perform vector similarity search.

Serves as a fallback when no sufficiently similar community QA is found.

Dynamic CQA Vector Stores

High‑Quality Store : stores QA pairs whose quality score ≥ γ . When a new high‑scoring QA arrives, centroid‑based clustering replaces older low‑score entries, keeping the store compact.

Low‑Quality Store : stores QA pairs with score < γ . It is updated with the same centroid clustering but is used only as negative examples for the LLM to avoid repeating poor answers.

Three‑Path Query Strategy

Direct reuse : if the similarity between the query and a high‑quality QA ≥ δ , return the stored answer verbatim.

Reference generation : if the similarity lies in τ ≤ sim < δ , feed the retrieved QA as context and let the LLM rewrite the answer.

Avoid‑pitfall generation : when no high‑quality match exists, combine low‑quality QA (as negative examples) with official documentation to prompt the LLM for a reliable answer.

Adaptive Temperature

Compute the variance Δ of the quality scores of the retrieved answers.

If Δ is small (answers are consistent), increase the LLM temperature to encourage diversity.

If Δ is large (answers diverge), lower the temperature to favor reliability.

Experimental Validation

Datasets & Metrics

MSQA – Microsoft‑related technical domain; 557 k KB chunks; 9 518 initial QA pairs; 571 test questions.

ProCQA – Lisp programming; 14 k KB chunks; 3 107 initial QA pairs; 346 test questions.

PolarDBQA – PolarDB database; 1.4 k KB chunks; 1 395 initial QA pairs; 153 test questions.

Evaluation metrics:

Semantic similarity: BERT‑Score F1 and cosine similarity of embeddings.

Lexical overlap: BLEU and ROUGE‑L.

Efficiency: average latency per query (seconds).

Main Results

Semantic similarity improved by 2.1 %–25.9 % over strong baselines (DPR, BM25, RAG).

Average latency reduced by 8.7 %–23.3 % compared with the second‑best method.

After 10 iterative rounds on ProCQA, the chunk‑growth rate dropped from 20.23 % to 2.06 %.

Ablation studies: removing any component (high‑quality store, centroid memory, or adaptive temperature) degrades BERT‑Score or increases latency.

Key Conclusions & Industrial Implications

Effectiveness: up to +25.9 % semantic similarity versus DPR/BM25/RAG.

Efficiency: latency can be further cut by up to 52 % in high‑concurrency scenarios.

Storage: chunk growth suppressed below 2.1 %, saving roughly 90 % of storage.

Modularity: the LLM, embedding model, scorer, and vector store are interchangeable.

“The core value of ComRAG lies not in the model itself but in explicitly modeling time and quality through a centroid‑memory mechanism within the retrieval‑generation pipeline.”