# 🧠 Cognitive Cache Engineering: Weaponizing LLM Memory for RF Scythe

**Status:** ✅ Phase 1 Implementation Complete  
**Architecture:** Multi-Tier Semantic Memory + Attention-Aware Eviction

---

## The Vision

The techniques used to optimize Large Language Model (LLM) inference—specifically **KV Cache compression**—map directly onto the problem of **longitudinal actor tracking** in cyber-physical environments. 

Instead of treating observations as disposable telemetry, we treat them as **tokens in a behavioral context window**. This allows SCYTHE to evolve from tracking "devices" to tracking **"behavioral entities"**.

---

## 🏛️ Multi-Tier Semantic Memory

We have moved away from an ephemeral, flat memory model toward a tiered hierarchy:

| Tier | Type | Purpose | Persistence |
|------|------|---------|-------------|
| **HOT** | Active Clusters | High-frequency updates, live trajectories. | In-memory (`MacClusterEngine.clusters`) |
| **WARM** | Recent History | Summarized continuity, recent behavioral signatures. | Swapped-out clusters in `CognitiveCacheEngine.warm_clusters` |
| **COLD** | Archival | Compressed trajectory primitives, longitudinal actor memory. | Disk-persisted (`CognitiveCacheEngine.cold_archive`) |

---

## ✂️ Semantic Eviction (Attention-Aware Pruning)

We no longer evict data purely by age. Instead, we use a **retention_score** analogous to attention salience in KV caches:

```python
retention_score = 
    0.35 * confidence + 
    0.15 * novelty + 
    0.20 * recurrence + 
    0.20 * threat_weight + 
    0.10 * motion_consistency
```

**Heavy Hitters (H2) for Actors:**
- High-confidence, high-threat, or recurring actors remain in the **HOT** cache longer, regardless of age.
- "Boring" stationary entities with low novelty decay rapidly to the **WARM** or **COLD** tiers.

---

## 📉 Low-Rank Actor Compression (Trajectory LoRA)

Instead of storing every GPS point, we compress histories into **motion primitives**:

- **Motion Basis Vectors**: "stationary-periodic", "linear-transit", "vehicular-high-speed".
- **Drift Tensors**: Low-rank spatial variance representations.
- **Impact**: 95%+ reduction in trajectory data size while preserving predictive value for DOMA forecasting.

---

## 📡 Semantic Delta Streaming

Our streaming protocol has been evolved to reduce bandwidth and redraw pressure:

1.  **Full State**: Sent only on basis change or large spatial jumps.
2.  **Semantic Delta**: Sent when motion is consistent with the current basis.
    -   Example: `{"op": "delta", "motion_basis": "vehicular-westbound", "delta": [0.01, 0.0, 0.0]}`
    -   **Result**: Drastic reduction in browser/mobile battery burn and `deck.gl`/`Filament` churn.

---

## 🏗️ Embedding Hierarchy (Cognition Layers)

We have established a three-layer hierarchy for semantic retrieval:

1.  **Tier 1 — Reflex (384-dim)**: Fast, cheap, edge-deployable. (e.g., Granite English). Used for real-time MAC continuity and spatial stitching.
2.  **Tier 2 — Analytical (768-dim)**: Richer context. (e.g., EmbeddingGemma). Used for actor attribution and campaign similarity.
3.  **Tier 3 — Strategic (LLM)**: Reasoning. (e.g., Llama 3.2). Used for operator narratives and hypothesis generation.

---

## Next Steps

- [ ] **Persistent World-Model Integration**: Consolidation of COLD archive into Postgres/pgvector.
- [ ] **Spectral Analysis**: Detecting graph biconnectivity (bottlenecked paths).
- [ ] **Reinforcement RL**: Penalizing cognitive drift in the DualAgentOrchestrator.

---

*This document serves as the foundation for SCYTHE's transition from telemetry dashboard to a continuously learning RF/network cognition organism.*