Layers 1, 3 (partial), and 4 ship in the community engine on GitHub — the public
three-phase dream-consolidation system that built Mazemaker's name. Layers 2, 5, and 6
ship in the full engine on the managed pod. Together they form the cognition stack the
100-iteration benchmark loop ran on. None of these layers is “optional”: each
was added because the prior one saturated.
01
Sponge ingestion
Every turn absorbed by a background thread. Session-end fact extraction stores durable decisions and preferences, never raw transcripts. Conflict detection runs at write time.
turn -> sponge worker -> session-end extract -> conflict gate -> store
Community engine. Ships in the open-source build.
02
Atomic Fact Extraction (AFE)
Four-stage formation pipeline. Stage A extracts atomic facts from markdown structure. Stage B runs spaCy NER over the raw text. Stage C uses an LLM to extract user-state facts (“user owns X”, “user prefers Y”) per session. Stage S crystallizes cross-source memories during the dream cycle. The corpus the recall pipeline searches over is the output of this layer, not the raw turns.
session -> A:markdown · B:NER · C:LLM · S:synthesis -> atomic-fact corpus
Full engine only. The Stage C user-statement bake is the lever that broke the R@5 = 0.7404 retrieval-tuning ceiling in the 100-iteration loop.
03
Embedding — semantic, late-interaction, graph-aware
BGE-M3 (1024d, multilingual). On top: ColBERT @ 1.5 late-interaction reranking. On top of that: DAE — Dream-Augmented Embeddings, a second embedding built during NREM consolidation that weights toward graph neighbours. Three channels fused via Reciprocal Rank Fusion.
BGE-M3 1024d -> ColBERT @ 1.5 -> DAE (NREM-weighted) -> RRF fusion
BGE-M3: community. ColBERT @ 1.5 + DAE: full engine (Pro+) only — together they take R@5 from 0.96 to 0.98 on LongMemEval-S. At the rebake-enriched corpus density, the optimal ColBERT weight shifted from 2.5 to 3.0 — era 3 of the benchmark loop.
04
Three-phase dream consolidation
The original Mazemaker engine. NREM strengthens or prunes connections, runs PPR on GPU when CUDA is available. REM discovers bridges between isolated memories via batched recall. Insight detects communities in the graph and writes cluster-summary memories. Hop-2 reasoning lift: 0.00 → 1.00 R@10. Vector DBs cannot do this.
NREM (PPR + prune) · REM (orphan bridges) · Insight (clusters)
All three phases ship in Community in their lightweight pre-iter00 form. Pro adds ColBERT-driven NREM sampling, DAE-augmented Insight adjacency, and Stage S synthesis on top — layer 5 above this loop.
05
Synthesis crystallization
Episodic → semantic distillation. Stage S runs during the dream cycle, scans recent memories for cross-source patterns, and crystallizes durable semantic memories with explicit provenance edges back to every source. The bottleneck for long-horizon agent memory; vector stores cannot synthesize because they cannot reason over their own graph.
recent episodics -> cross-source pattern -> semantic crystal + provenance
Full engine only.
06
Targeted re-formation
The operator-side layer. When a deployment hits a known weak question bucket — “the agent keeps missing preferences in the cooking domain” — the targeted re-formation tool diagnoses the specific gold sessions the formation pipeline missed and runs a query-conditional Stage C re-extraction. Per-deployment fine-tuning at API-call cost. No training, no labels, no GPU hours.
weak bucket -> diagnose missed sessions -> query-conditional rebake -> insert
Full engine only. Onboarded through the managed pod — not self-installable.
// the bottleneck migrated upward
The 100-iteration benchmark loop is the empirical proof of the layered architecture.
For 72 iterations the bottleneck was retrieval — we tuned channel weights, intent
boost, ColBERT and DAE multipliers, and saturated at R@5 = 0.7404. Four structurally
different stacks landed at the same 4-decimal number. Then the bottleneck moved.
The next 9 iterations were entirely about layer 2 (Stage C re-extraction). The next
19 were entirely about layer 3 (rerank weights re-optimized for the rebake-enriched
corpus). Each era only became visible once the prior one had finished its work.
That migration is structurally impossible in a vector-DB system. There's only one
layer to optimize. When it saturates, the engine saturates. This is the
architectural difference between a retrieval plugin and a memory operating
system.
// what ships where
The community engine at
github.com/itsXactlY/mazemaker
ships layers 1, 3-partial (BGE-M3), and 4 (the three dream phases — NREM, REM,
Insight — in their lightweight pre-iter00 form, without ColBERT-driven sampling
or DAE-augmented adjacency). It runs. It hits R@5 ≈ 0.71 on LongMemEval-oracle. It's
MIT-friendly to self-install for hobby use.
The full engine — layers 2, 5, 6 plus the rerank-feedback knob
surface tuned by the benchmark loop — ships only through the managed pod
onboarding. Not because the code is more complicated. Because the operator-side
tooling (the targeted re-formation diagnostic, the corpus-state-aware knob defaults)
is the moat. It's not self-installable. It comes through the managed pod.