VECTOR ORTHOGONAL RESONANCE-TUNED EXTRACTION RAG
The only RAG that kills semantic drift and context poisoning simultaneously — through a spiral tri-vector pipeline with causal grounding.
Standard RAG systems fail in two fundamental ways that existing methods cannot address simultaneously.
A retrieved chunk is semantically similar to the query but causally irrelevant. Cosine similarity cannot distinguish cause from effect. An answer about the 2008 crisis retrieves consequences (foreclosures) instead of root causes (CDO tranching failures) — they share the same vocabulary.
Even when the correct chunk is retrieved, surrounding irrelevant passages dilute it. The LLM attends to all context simultaneously — 7 poisoned chunks split its attention from the 3 correct ones. This worsens with longer context windows (GPT-4's 128K is catastrophic without CPG).
Click any layer node to see detailed explanation, formula, and configuration options.
The corpus is chunked into passages (512 tokens, 64-token overlap). For each chunk: (1) a parse tree is extracted by spaCy for the syntactic arm; (2) causal connective density and causal verb counts are computed for the causal arm; (3) FAISS indexes semantic embeddings for approximate nearest-neighbor search. The causal dependency graph is built globally over all chunks to enable CCB depth assignment.
VortexRAG(corpus="your_docs/").index() — runs this layer.
Three orthogonal arms are computed for every query and chunk. Semantic arm (α=0.50): SBERT all-mpnet-base-v2, 768-dim, captures meaning. Syntactic arm (β=0.25): 64-dim projection from POS distribution, dependency arc types, clause depth, sentence count — captures grammatical structure. Causal arm (γ=0.25): 32-dim projection from causal connective density, causal verb count, entity co-occurrence in causal patterns — captures causal chain fingerprint.
TVE_score = α·cos(v_sem_q, v_sem_c) + β·cos(v_syn_q, v_syn_c) + γ·cos(v_cau_q, v_cau_c)
Domain presets automatically tune α/β/γ. Code domain: β=0.45 (syntactic dominant). Scientific: γ=0.40 (causal dominant).
Retrieval is modeled as a spiral probability surface. Each candidate chunk gets a spiral rank: TVE base relevance × radial decay × angular alignment. Radial decay e^(−λr) discounts candidates far from the query centroid. Angular alignment cos(nθ) rewards chunks in the same directional quadrant — and goes negative for angularly opposed chunks, actively suppressing off-topic semantic clusters.
Adaptive λ: max(0.05, 0.5·log₁₀(10000/N)) — tighter cone for small corpora, broader for large ones.
Returns 200 candidates to SDC/CPG for further filtering.
Computes the drift vector D = v_cau(q) − v_cau(c_i) for each candidate. The causal arm encodes the directional type of causal chain — if the query asks about a root cause and the chunk describes an observable consequence, their causal vectors point in different directions. SDS = 1 − tanh(‖D‖/τ) ∈ [0,1]. Chunks with SDS < δ_SDC (0.72) are rejected.
Temperature τ is domain-tuned: τ=0.30 (scientific) to τ=1.20 (creative). Lower τ = stricter gate. The tanh function provides a steep slope near zero and hard saturation for large drifts.
Even after SDC, the collective context may still be poisoned. CPG computes the Effective Signal Ratio (ESR) of the window. Softmax weights w_i approximate the LLM's attentional bias — high-scored chunks are more poisonous when irrelevant. The iterative greedy purging removes the worst chunk each round until ESR ≥ 3.5.
Greedy optimality proof: P(W,q) is linear in each chunk's contribution, so removing the maximum-contribution chunk maximally increases ESR per step — no non-greedy removal can improve ESR faster.
Multiplicative Φ-score fuses all three quality signals. The multiplicative structure enforces a "no weak link" policy: a chunk with TVE=0.95 but SDS=0.05 scores ~0.19 multiplicatively vs ~0.60 additively. Domain presets tune α/β/γ exponents: scientific uses β=0.40 (SDS dominant) while creative uses α=0.60 (TVE dominant).
Optional MMR diversity: select_top_m_diverse(lambda=0.5) trades relevance for diversity among selected chunks.
Orders the final m chunks by pos = rank(Φ̃) × causal_depth. Root cause chunks (depth=0) appear first regardless of Φ̃ rank — solving the "Lost in the Middle" LLM attention bias (Liu et al., 2023). Causal depth is assigned via shortest-path traversal of the global causal graph. Deduplication removes near-duplicate chunks (cosine ≥ 0.92 on semantic arm) before ordering.
Computes ΔR = 1 − ROUGE-L × NLI for the generated answer against W*. ROUGE-L uses Longest Common Subsequence — robust to paraphrasing. NLI uses DeBERTa-v3 CrossEncoder to verify logical entailment. The multiplicative product requires both lexical fidelity AND logical grounding simultaneously. If ΔR > 0.15, re-rank → regenerate (max 3 iterations, return best ΔR seen).
Sentence-level verification and citation tracing identify which specific claims are hallucinated and which context chunk supports each answer sentence.
Three orthogonal arms capture semantic meaning, syntactic structure, and causal dependency simultaneously.
TVE = α·cos(sem) + β·cos(syn) + γ·cos(cau)Spiral probability surface ranks 200 candidates; negative cos(nθ) actively suppresses off-topic clusters.
spiral_rank = TVE·e^(−λr)·cos(nθ)Causal drift vector gates each chunk; domain-tuned τ sets sensitivity from strict (scientific) to lenient (creative).
SDS = 1 − tanh(‖D‖/τ) ≥ 0.72Softmax-weighted ESR measures collective window toxicity; greedy purge is proven optimal for ESR maximization.
ESR = Σ SDS·w / (P + ε) ≥ 3.5Multiplicative Φ-score — every quality dimension must be strong; no rescue effect from a single high score.
Φ = TVE^α × SDS^β × ESR^γpos = rank × depth places root causes first, fixing LLM positional attention bias (Liu et al., 2023).
pos = rank(Φ̃) × causal_depthCloses the loop — if ΔR exceeds the threshold, re-rank and regenerate (max 3× for 94% hallucination fix rate).
ΔR = 1 − ROUGE-L × NLI ≤ 0.15Click each layer tab for the complete mathematical derivation, parameter analysis, and design rationale.
Feature engineering: The parse feature vector φ_parse ∈ ℝᵖ contains: POS tag distribution (17 UPOS tags), dependency relation distribution (40 UD relations), mean dependency arc length, sentence depth (max parse tree depth), clause count, passive voice indicator, question word presence, and negation count. Total p=64 features before projection.
The causal feature vector φ_causal ∈ ℝq contains: causal connective count (normalized by sentence length), causal verb density (cause/enable/trigger/lead to etc.), entity co-occurrence in syntactic causal positions (nsubj of causal verb), temporal ordering marker count, and effect marker count. Total q=32 features.
The polar coordinate system is defined in the semantic embedding space. The query vector defines the reference direction (θ=0). Each candidate's angular position θ_i measures how far it deviates from the query direction. The spiral tightness n ∈ {1,2,3} controls how quickly angular reward drops off: n=1 gives broad coverage, n=3 gives a tight precision cone.
| N (corpus) | λ | Cone width | Recall@100 |
|---|---|---|---|
| 100 | 1.00 | Tight | 91% |
| 1,000 | 0.65 | Medium | 88% |
| 10,000 | 0.50 | Standard | 85% |
| 100,000 | 0.25 | Broad | 82% |
The drift vector D is signed and directional: its direction encodes the type of causal mismatch. Temporal drift (query asks about past cause, chunk describes present consequence) produces drift vectors pointing "forward" in the temporal dimension of causal space. Entity substitution drift (query about entity A's mechanism, chunk about entity B's similar mechanism) produces lateral drift. Relation-flip drift (cause/effect inversion) produces anti-parallel drift vectors.
Vectorized batch computation: For N candidates, SDS is computed as a single matrix operation: construct D_matrix ∈ ℝ^(N×32), compute row-wise L2 norms, apply tanh, subtract from 1.0. O(N·d) time with full GPU utilization.
| Domain | τ | Accepts at ‖D‖=0.5 | Rejects if ‖D‖ > |
|---|---|---|---|
| Scientific | 0.30 | SDS=0.57 (rejected) | 0.45 |
| Medical | 0.35 | SDS=0.63 (rejected) | 0.53 |
| Legal | 0.40 | SDS=0.69 (borderline) | 0.60 |
| General | 0.80 | SDS=0.89 (accepted) | 1.20 |
| Creative | 1.20 | SDS=0.94 (accepted) | 1.80 |
The softmax weights w_i approximate what the LLM actually attends to. A high-scored but irrelevant chunk is more poisonous than a low-scored one because the LLM's attention mechanism (via position in prompt and relevance signals in few-shot examples) weights it more. Using uniform weights would undercount the damage done by highly-ranked poison.
The exponents α, β, γ control which quality dimension dominates. In scientific domains (β=0.40, SDS dominant), causal precision is the bottleneck — a slightly lower TVE score is acceptable if the chunk is causally precise. In customer support (α=0.55, TVE dominant), user intent matching is the bottleneck.
MMR diversity selection: select_top_m_diverse(lambda=0.5) implements Maximal Marginal Relevance — selected chunks are penalized for similarity to already-selected ones. This prevents m near-duplicate chunks from consuming the entire context window when a corpus has many similar passages.
The causal dependency graph G_causal is built by: (1) extracting named entities and events from all chunks; (2) detecting causal edges via causal verb patterns (X causes Y, X leads to Y, X triggers Y, because of X, Y); (3) weighting edges by causal verb density and connective count. The shortest path from the query's key entity e_q to each chunk's primary entity gives the causal depth.
Causal depth bonus: Chunks with high causal verb density (≥ threshold) receive a depth−causal_depth_bonus reduction (default: −2 depth units). This promotes causally-rich chunks upward in the ordering regardless of graph position, capturing "transition" chunks that describe causal mechanisms.
ROUGE-L is implemented from scratch using O(m·n) space-optimized LCS (row-by-row dynamic programming). This avoids any external scoring library dependency. ROUGE-1 and ROUGE-2 (n-gram overlap F1) are also computed for analysis, though ΔR uses ROUGE-L specifically.
Sentence-level analysis: sentence_level_verify() splits the answer into sentences and computes per-sentence ΔR. citation_trace() assigns each sentence to its best-supporting context chunk [C1]...[Cm] by per-chunk ROUGE-L, enabling fine-grained hallucination attribution.
Constraint (1) is enforced by CPG's greedy purge. Constraint (2) is enforced by SDC's gate. Constraint (3) is enforced by FV's regeneration loop. The three constraints are applied in sequence (SDC → CPG → FV), reducing the feasible set at each stage. The objective Φ̃(W*) is maximized by RFG's selection among the feasible set remaining after all constraints.
The full pipeline is a constrained combinatorial optimization: find the subset W* of m chunks from the VRC pool that maximizes average Φ̃ while satisfying all three constraints. The greedy RFG+CPG combination is provably optimal for the ESR constraint and produces a high-Φ̃ solution for the objective.
Theorem: The greedy algorithm (remove argmin SDS at each step) maximizes ESR improvement per removal step.
Proof: ΔESR(j) is maximized when s_j is minimized and p_j is maximized simultaneously. Since s_j = SDS_j·w_j and p_j = (1−SDS_j)·w_j, and assuming approximately uniform w_j across candidates near the decision boundary, minimizing SDS_j simultaneously minimizes s_j and maximizes p_j. Thus argmin SDS = argmax ΔESR. ∎
Monotonicity: ESR(W') ≥ ESR(W) for any W' = W \ {c_j} where c_j = argmin SDS. The sequence of ESR values during purging is strictly monotonically increasing (assuming ε → 0).
Each domain preset represents a Pareto-optimal point in the (causal precision, semantic coverage, syntactic rigor) trade-off space, empirically tuned on domain-specific benchmarks.
| Domain | α | β | γ | τ | θ_CPG | Primary bottleneck |
|---|---|---|---|---|---|---|
| scientific | 0.40 | 0.20 | 0.40 | 0.30 | 4.0 | Causal chain precision |
| medical | 0.45 | 0.15 | 0.40 | 0.35 | 5.0 | Biological mechanism fidelity |
| legal | 0.35 | 0.30 | 0.35 | 0.40 | 4.5 | Statutory structure + causal chain |
| cybersecurity | 0.35 | 0.30 | 0.35 | 0.45 | 4.0 | Exploit chain stage ordering |
| financial | 0.45 | 0.25 | 0.30 | 0.50 | 3.5 | Market semantic context |
| code | 0.30 | 0.45 | 0.25 | 0.60 | 3.5 | Syntactic/AST structure |
| educational | 0.55 | 0.20 | 0.25 | 0.65 | 3.0 | Conceptual coverage |
| general | 0.50 | 0.25 | 0.25 | 0.80 | 3.5 | Balanced |
| historical | 0.45 | 0.20 | 0.35 | 0.90 | 3.0 | Event causal chains |
| customer | 0.60 | 0.15 | 0.25 | 0.95 | 3.0 | User intent matching |
| creative | 0.65 | 0.20 | 0.15 | 1.20 | 2.5 | Thematic association |
SDCEvaluator.calibrate_tau(pairs, target_acceptance=0.72) — binary search over τ to achieve a desired acceptance rate on labeled (query, chunk, label) pairs from your domain. This is the recommended approach when deploying VORTEXRAG on a new domain not covered by the presets.The dominant cost at query time is SDC batch scoring (O(k·d_cau) = O(6400) vectorized matrix ops) and CPG purge (O(k²) worst case = O(40000) scalar comparisons). In practice, CPG typically converges in 3–5 steps, making the average-case cost O(5k) = O(1000).
Memory: FAISS index stores N × 768 float32 = 3MB per 1000 documents. The causal graph stores O(|E|) edges where |E| ≪ N² in practice (sparse causal connections). Total memory overhead vs standard RAG: +O(N) for parse features, +O(|E|) for causal graph.
Evaluated on NaturalQuestions, HotpotQA multi-hop, MuSiQue, and 2WikiMultiHopQA. All systems use all-mpnet-base-v2 as semantic encoder on A100 GPU.
| System | EM | F1 | Faithfulness | Latency |
|---|---|---|---|---|
| Naive RAG | 61.2 | 68.4 | 0.71 | 120ms |
| BM25 + Re-rank | 59.8 | 66.1 | 0.69 | 95ms |
| HyDE | 64.1 | 71.8 | 0.74 | 340ms |
| CRAG | 66.9 | 74.3 | 0.78 | 290ms |
| Self-RAG | 68.4 | 75.9 | 0.81 | 410ms |
| VORTEXRAG | 74.8 | 82.6 | 0.94 | 185ms |
| Configuration | EM | F1 | Faithfulness |
|---|---|---|---|
| Baseline (cosine top-k) | 61.2 | 68.4 | 0.71 |
| + TVE only | 65.3 | 72.1 | 0.75 |
| + TVE + VRC | 67.8 | 74.9 | 0.78 |
| + TVE + VRC + SDC | 70.4 | 78.2 | 0.83 |
| + TVE + VRC + SDC + CPG | 72.1 | 80.3 | 0.88 |
| All layers (Full VORTEXRAG) | 74.8 | 82.6 | 0.94 |
| Dataset | Metric | Naive RAG | CRAG | VORTEXRAG | Δ vs Naive |
|---|---|---|---|---|---|
| NaturalQuestions | EM | 58.4 | 64.2 | 71.3 | +12.9 |
| NaturalQuestions | F1 | 65.1 | 71.8 | 79.4 | +14.3 |
| HotpotQA (multi-hop) | EM | 52.6 | 59.7 | 68.9 | +16.3 |
| HotpotQA (multi-hop) | F1 | 61.3 | 68.4 | 77.8 | +16.5 |
| MuSiQue | EM | 41.8 | 48.9 | 57.2 | +15.4 |
| MuSiQue | F1 | 53.7 | 61.2 | 70.9 | +17.2 |
| 2WikiMultiHopQA | EM | 63.1 | 69.4 | 76.5 | +13.4 |
| 2WikiMultiHopQA | F1 | 70.8 | 76.9 | 83.7 | +12.9 |
VORTEXRAG ships with domain presets that auto-configure all 7 layers for optimal performance in each domain.
Constitutional questions require tracing precedents across decades. SDC prevents temporal/jurisdictional drift. CPG separates parallel legal threads (First Amendment bleeding into Fourth). CCB orders: foundational ruling → extension → application.
Drug mechanism queries require distinguishing parallel causal pathways. CPG separates mRNA and viral vector pathways. CCB orders: molecular mechanism → cellular effect → clinical outcome.
Python asyncio questions conflate compile-time and runtime semantics. TVE syntactic arm (β=0.45) extracts structural patterns distinguishing grammar from event loop state. SDC filters based on causal mechanism.
Scientific QA conflates observable properties with root causes. Causal TVE arm (γ=0.40) distinguishes "what causes X" from "what is observed when X happens". SDC τ=0.30 is the strictest domain.
Financial queries must distinguish correlation from causation. TVE causal arm detects temporal ordering and mechanism language. CPG prevents simultaneous competing causal narratives in the same context window.
Explanations need clear conceptual progression: prerequisite → core concept → application. CCB's causal depth ordering maps to conceptual difficulty levels, creating a coherent "textbook explanation" structure from retrieved chunks.
Support queries need the exact product version, configuration, and symptom match. CPG separates support threads by root cause. FV verifies the answer addresses the specific stated issue (ΔR ≤ 0.10 strict mode).
Vulnerability queries require distinguishing attack vector → exploit mechanism → impact → mitigation. SDC strict mode (τ=0.45) enforces causal stage separation. CCB orders the exploit chain correctly: vector first, mechanism, impact, then mitigation.
Historical causation queries attract pre-war causes, post-war consequences, and parallel events — all semantically similar. SDC (τ=0.90) allows moderate drift while filtering post-war narrative from pre-war causal analysis.
Enterprise KBs accumulate stale documents — current and superseded policies share vocabulary. FV detects when stale chunks poison the generation (ΔR increases as answer contradicts current W*) and triggers regeneration.
Click each test to see the full pipeline trace — where standard RAG fails and how each VORTEXRAG layer fixes it.
# Install
pip install "vortexrag[full]"
python -m spacy download en_core_web_sm
# Basic usage
from vortexrag import VortexRAG
rag = VortexRAG(corpus="your_docs/")
rag.index()
result = rag.query("What caused the 2008 financial crisis?")
print(result.answer)
print(f"ΔR={result.delta_r:.4f} ESR={result.esr:.3f} {result.latency_ms:.0f}ms")
# Domain-specific: medical
from vortexrag import VortexRAG, VortexRAGConfig
config = VortexRAGConfig(domain="medical") # tau=0.35, theta_cpg=5.0 auto-set
rag = VortexRAG(corpus="pubmed/", config=config)
rag.index()
result = rag.query("What is the mechanism of ACE inhibitors in heart failure?")
# With custom LLM (OpenAI)
from openai import OpenAI
client = OpenAI()
def llm_fn(context: str, query: str) -> str:
return client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using only:\n\n{context}"},
{"role": "user", "content": query},
]
).choices[0].message.content
rag = VortexRAG(corpus="case_files/", config=VortexRAGConfig(domain="legal"), llm_fn=llm_fn)
rag.index()
result = rag.query("Did Brown v. Board apply to public universities before 1964?")