The Six-Layer Cognition Stack.

Most memory systems are one layer — cosine similarity over a vector store. The community build of Mazemaker on GitHub adds three more: NREM, REM, and Insight, the dream phases that gave Mazemaker its name. The full engine — the one that ran the 100-iteration benchmark loop to R@5 = 0.8426 — is six layers deep: sponge ingestion, atomic fact extraction, embedding with ColBERT + DAE, three-phase dream consolidation, synthesis crystallization, and targeted re-formation. Each layer was added because the prior one saturated.

Six layers → The pod MCP surface

The Six Layers

Layers 1, 3 (partial), and 4 ship in the community engine on GitHub — the public three-phase dream-consolidation system that built Mazemaker's name. Layers 2, 5, and 6 ship in the full engine on the managed pod. Together they form the cognition stack the 100-iteration benchmark loop ran on. None of these layers is “optional”: each was added because the prior one saturated.

01

Sponge ingestion

Every turn absorbed by a background thread. Session-end fact extraction stores durable decisions and preferences, never raw transcripts. Conflict detection runs at write time.

turn -> sponge worker -> session-end extract -> conflict gate -> store

Community engine. Ships in the open-source build.

02

Atomic Fact Extraction (AFE)

Four-stage formation pipeline. Stage A extracts atomic facts from markdown structure. Stage B runs spaCy NER over the raw text. Stage C uses an LLM to extract user-state facts (“user owns X”, “user prefers Y”) per session. Stage S crystallizes cross-source memories during the dream cycle. The corpus the recall pipeline searches over is the output of this layer, not the raw turns.

session -> A:markdown · B:NER · C:LLM · S:synthesis -> atomic-fact corpus

Full engine only. The Stage C user-statement bake is the lever that broke the R@5 = 0.7404 retrieval-tuning ceiling in the 100-iteration loop.

03

Embedding — semantic, late-interaction, graph-aware

BGE-M3 (1024d, multilingual). On top: ColBERT @ 1.5 late-interaction reranking. On top of that: DAE — Dream-Augmented Embeddings, a second embedding built during NREM consolidation that weights toward graph neighbours. Three channels fused via Reciprocal Rank Fusion.

BGE-M3 1024d -> ColBERT @ 1.5 -> DAE (NREM-weighted) -> RRF fusion

BGE-M3: community. ColBERT @ 1.5 + DAE: full engine (Pro+) only — together they take R@5 from 0.96 to 0.98 on LongMemEval-S. At the rebake-enriched corpus density, the optimal ColBERT weight shifted from 2.5 to 3.0 — era 3 of the benchmark loop.

04

Three-phase dream consolidation

The original Mazemaker engine. NREM strengthens or prunes connections, runs PPR on GPU when CUDA is available. REM discovers bridges between isolated memories via batched recall. Insight detects communities in the graph and writes cluster-summary memories. Hop-2 reasoning lift: 0.00 → 1.00 R@10. Vector DBs cannot do this.

NREM (PPR + prune) · REM (orphan bridges) · Insight (clusters)

All three phases ship in Community in their lightweight pre-iter00 form. Pro adds ColBERT-driven NREM sampling, DAE-augmented Insight adjacency, and Stage S synthesis on top — layer 5 above this loop.

05

Synthesis crystallization

Episodic → semantic distillation. Stage S runs during the dream cycle, scans recent memories for cross-source patterns, and crystallizes durable semantic memories with explicit provenance edges back to every source. The bottleneck for long-horizon agent memory; vector stores cannot synthesize because they cannot reason over their own graph.

recent episodics -> cross-source pattern -> semantic crystal + provenance

Full engine only.

06

Targeted re-formation

The operator-side layer. When a deployment hits a known weak question bucket — “the agent keeps missing preferences in the cooking domain” — the targeted re-formation tool diagnoses the specific gold sessions the formation pipeline missed and runs a query-conditional Stage C re-extraction. Per-deployment fine-tuning at API-call cost. No training, no labels, no GPU hours.

weak bucket -> diagnose missed sessions -> query-conditional rebake -> insert

Full engine only. Onboarded through the managed pod — not self-installable.

// the bottleneck migrated upward

The 100-iteration benchmark loop is the empirical proof of the layered architecture. For 72 iterations the bottleneck was retrieval — we tuned channel weights, intent boost, ColBERT and DAE multipliers, and saturated at R@5 = 0.7404. Four structurally different stacks landed at the same 4-decimal number. Then the bottleneck moved. The next 9 iterations were entirely about layer 2 (Stage C re-extraction). The next 19 were entirely about layer 3 (rerank weights re-optimized for the rebake-enriched corpus). Each era only became visible once the prior one had finished its work.

That migration is structurally impossible in a vector-DB system. There's only one layer to optimize. When it saturates, the engine saturates. This is the architectural difference between a retrieval plugin and a memory operating system.

// what ships where

The community engine at github.com/itsXactlY/mazemaker ships layers 1, 3-partial (BGE-M3), and 4 (the three dream phases — NREM, REM, Insight — in their lightweight pre-iter00 form, without ColBERT-driven sampling or DAE-augmented adjacency). It runs. It hits R@5 ≈ 0.71 on LongMemEval-oracle. It's MIT-friendly to self-install for hobby use.

The full engine — layers 2, 5, 6 plus the rerank-feedback knob surface tuned by the benchmark loop — ships only through the managed pod onboarding. Not because the code is more complicated. Because the operator-side tooling (the targeted re-formation diagnostic, the corpus-state-aware knob defaults) is the moat. It's not self-installable. It comes through the managed pod.

Onboard the full engine → ★ Community engine on GitHub

The Controls Inside Each Layer

Six load-bearing implementation pieces, one per architectural layer. The benchmark page shows the per-iteration audit behind each.

01

Semantic storage

FastEmbed ONNX with intfloat/multilingual-e5-large writes 1024d vectors, then links nearby memories into weighted graph edges.

remember -> embed -> SQLite INSERT -> cosine kNN -> connections

02

Recall pipeline

GPU matmul when available, numpy fallback otherwise. Multi-channel retrieval fuses semantic, BM25, entity, temporal, PPR, and salience through Reciprocal Rank Fusion.

query -> FastEmbed -> GPU/CPU cosine -> RRF fusion -> ranked memory

03

Spreading activation

mazemaker_think uses BFS or Personalized PageRank. PPR is the ranking-quality channel; removing it costs −0.13 MRR on the audit suite.

source memory -> graph walk -> decay -> activation trace

04

Conflict supersession

Contradictory updates fuse or mark stale memories, preserving revision history instead of poisoning recall with duplicates. Without it, stale facts dominate winner @ 1 in 60% of cases.

new fact -> entity + semantic overlap -> supersede -> history

05

The Architect — cockpit

A wall of twelve monitors at architect.mazemaker.dev. Recall, dream replay, edge tension, top-touched, sessions, peers, MCP topology, Hermes chat. Boots only against your own pod; otherwise shows the threshold.

architect.mazemaker.dev -> 127.0.0.1:8765 -> your pod, your data

06

Compute, your choice

Pick cuda, mlx, cpu — or BYOK with sentence-transformers, FastEmbed, TF-IDF. Selection lives at mazemaker.dev; the pod adopts it within seconds via JWT compute claim → compute.toml → path-watch restart. Never silently demotes — if you pick cuda and the pod can’t see it, the pod fails loud.

mazemaker.dev pick -> JWT claim -> ~/.mazemaker/compute.toml -> engine

Layer 4 · Dream Engine Three-Phase Consolidation

Triggered after 600s idle, after 50 new memories, manually through tooling, or as a standalone daemon. Ships in the community engine. Hop-2 reasoning lift: 0.00 → 1.00 R@10 — the structural gain only autonomous consolidation can produce.

NREM

Replay 100 recent memories

Run spreading activation, strengthen active edges by +0.05, weaken inactive edges, prune dead edges below 0.05.

REM

Bridge isolated memories

Find 50 isolated memories, search similar unconnected nodes, create bridge connections at similarity x 0.3.

INSIGHT

Store communities

Detect connected components, identify bridge nodes, materialize dream insights and derived cluster memory.

Dream Engine — deep dive

Triggers fan into the cycle; the cycle splits into NREM, REM, and Insight phases.

%%{init:{'flowchart':{'htmlLabels':true,'curve':'basis','padding':8}}}%%
flowchart LR
    subgraph Trigger["TRIGGER"]
      direction TB
      T1["Idle 600s"]
      T2["50 new memories"]
      T3["Manual / Cron"]
    end

    D{{"Dream Cycle"}}
    T1 --> D
    T2 --> D
    T3 --> D
    D --> NREM
    D --> REM
    D --> INSIGHT

    subgraph NREM["PHASE 1 · NREM"]
      direction TB
      N1["Replay 100 recent memories"] --> N2["Spreading activation"]
      N2 --> N3{"Connection
active?"}
      N3 -->|Yes| N4["Strengthen +0.05"]
      N3 -->|No| N5["Weaken −0.01"]
      N3 -->|Dead < 0.05| N6["Prune"]
    end

    subgraph REM["PHASE 2 · REM"]
      direction TB
      R1["Find 50 isolated memories"] --> R2["Search similar
unconnected nodes"]
      R2 --> R3["Create bridge connections"]
      R3 --> R4["weight = similarity × 0.3"]
    end

    subgraph INSIGHT["PHASE 3 · INSIGHT"]
      direction TB
      I1["BFS connected components"] --> I2["Identify communities"]
      I2 --> I3["Find bridge nodes"]
      I3 --> I4["Store dream_insights"]
    end

    classDef trigger fill:#1a140a,stroke:#fbbf24,stroke-width:1.5px,color:#fde68a;
    classDef cycle   fill:#1a0e2a,stroke:#a78bfa,stroke-width:2.5px,color:#f0a8ff,font-weight:bold;
    classDef nrem    fill:#0e1428,stroke:#60a5fa,stroke-width:1.5px,color:#dbeafe;
    classDef rem     fill:#1a0a18,stroke:#f472b6,stroke-width:1.5px,color:#fbcfe8;
    classDef insight fill:#0a1a14,stroke:#34d399,stroke-width:1.5px,color:#a7f3d0;

    class T1,T2,T3 trigger;
    class D cycle;
    class N1,N2,N3,N4,N5,N6 nrem;
    class R1,R2,R3,R4 rem;
    class I1,I2,I3,I4 insight;

Three phases, three accents: NREM blue, REM pink, Insight green — biological-sleep inspired. Layers 2, 5, and 6 of the cognition stack sit above this loop and consume its output.

The Backend Stack

One rootless Podman pod, four containers, shared netns. The backend at api.mazemaker.dev issues a 24-hour JWT and a per-tenant license — and never sees a single byte of memory content. Everything else runs on your machine, on your disk, in your kernel.

%%{init:{'flowchart':{'htmlLabels':true,'curve':'basis','padding':10}}}%%
flowchart TB
    subgraph pod["rootless podman pod   ·   your machine"]
      direction TB
      W["wonderland
AES-256-GCM vault
vault-key = HKDF(JWT, hw-fingerprint)
derived at runtime, never on disk"]
      M["mazemaker-mcp
SQLite WAL · weighted graph · dream
six retrieval modes · MCP / SSE
no outbound network during recall"]
      E["embedding-worker
FastEmbed → ST → TF-IDF → hash
1024d · optional managed proxy"]
      L["license-client
Ed25519 · 24h JWT · 7d grace
6h heartbeat — anonymous counts"]
      W --> M
      M --> E
      M --> L
    end
    L -->|"heartbeat & usage meter only"| API["api.mazemaker.dev
issues JWT · counts tool calls
never sees memory content"]

    classDef vault   fill:#1a0e2a,stroke:#a78bfa,stroke-width:2px,color:#ededf2;
    classDef engine  fill:#0e1a24,stroke:#5cd6ff,stroke-width:2px,color:#ededf2;
    classDef worker  fill:#0e1a18,stroke:#10b981,stroke-width:2px,color:#ededf2;
    classDef license fill:#241a0e,stroke:#fbbf24,stroke-width:2px,color:#ededf2;
    classDef remote  fill:#0a0a0d,stroke:#8b5cf6,stroke-width:1.5px,stroke-dasharray:4 3,color:#c4b5fd;
    class W vault;
    class M engine;
    class E worker;
    class L license;
    class API remote;

One pod, four containers, shared netns. Each layer one accent — the backend across the dashed line never sees memory content.

01

wonderland

AES-256-GCM encrypted vault. The vault key is derived at runtime via HKDF-SHA256(JWT, hardware-fingerprint) — it lives in memory only, never written to disk, and rotates every JWT refresh.

JWT + fingerprint -> HKDF -> AES-256 key -> GCM seal/open

02

mazemaker-mcp

The full retrieval engine: SQLite WAL, weighted graph, dream consolidation, six retrieval modes. Speaks MCP over SSE on the pod-internal loopback. No outbound network calls during recall.

localhost:7791  MCP/SSE  ·  ./data/  SQLite WAL

03

embedding-worker

FastEmbed ONNX as default, sentence-transformers and TF-IDF as fallbacks, plus an optional OpenRouter proxy for the free tier so users without a GPU still get hosted-quality vectors.

CPU FastEmbed -> ST GPU -> TF-IDF -> hash 1024d

04

license-client

Verifies the Ed25519-signed JWT, posts a 6-hour heartbeat with anonymous usage counts, and grants a 7-day offline grace period. The fingerprint binds the license to the machine without any account credential round-trip.

Ed25519 verify -> 24h JWT · 7d grace · 6h heartbeat

zero-knowledge

The backend ledgers that a tool was called, never what was stored. Memory content stays inside the pod boundary, encrypted at rest by wonderland.

rootless · quadlet

Pure user-namespace Podman with Quadlet .pod and .container units. No Docker daemon, no root, no VM, no SSH — just systemctl --user.

cuda · mlx · byok

Compute selector adopts your choice from mazemaker.dev within seconds. Strict mode: if the pod can’t see the device you picked, it refuses to start — never silently runs on the wrong layer.

federation by handshake

Pair two pods by pasting handshake URLs in the architect. No keys to copy by hand, no IPs to share — the rendezvous is the rest.

cloud-portable

The same pod ships to AWS, GCP, Hetzner, a dedicated server, or your local machine. Only the Cloudflare tunnel container changes when you swap providers — the four memory containers do not.

LLM-Callable Surface

Mazemaker exposes ten MCP tools. The Hermes provider surfaces the core four schemas; dream controls, prune, quota, graph stats, and chronological browse live on the Memory class and the daemon path.

mazemaker_remember

Store with conflict detection

Persist facts, decisions, code notes, labels, and metadata. Auto-embed and auto-connect into the graph.

Core MCP

mazemaker_recall

Semantic search

Search memories by meaning and fuse optional retrieval channels through Reciprocal Rank Fusion.

Core MCP

mazemaker_think

Activation traversal

Start from one memory and traverse graph neighborhoods with BFS or PPR ranking.

Core MCP

mazemaker_graph

Graph statistics

Inspect memory count, graph density, connection topology, and strongest associations.

Core MCP

mazemaker_dream

Force consolidation

Run all phases or target NREM, REM, or Insight manually when using the full Memory class surface.

Memory class

mazemaker_dream_stats

Dream telemetry

Inspect sessions, phase outcomes, insights, strengthened edges, bridges, and pruning output.

Memory class

mazemaker_stats

Engine vitals

Total memories, edges, embedding fingerprint, compute device. The pulse the architect cockpit reads every five seconds.

Core MCP

mazemaker_prune

Targeted forgetting

Drop memories by id, label glob, or age. Protected by an import-grace marker so a fresh install can’t accidentally wipe itself before the dream cycle has settled.

Memory class

mazemaker_quota

Live quota state

Calls remaining today and this month, managed-provider budget, tier; matches what the operator console shows on mazemaker.dev.

Core MCP

mazemaker_browse

Chronological browse

Walk memories by recency or label glob — no semantic query. The cockpit’s genealogy view and timeline scrub call this; you call it when you want order instead of relevance.

Memory class

Embedding Backends Auto-Priority

The embedding worker tries the backends in order; the first that loads cleanly wins. CUDA / MLX / CPU is a separate selector at mazemaker.dev.

PriorityBackendModelSpeedRequirements

1stFastEmbedintfloat/multilingual-e5-large~50msfastembed

2ndsentence-transformersBAAI/bge-m3 1024d~200msGPU recommended

3rdTF-IDFlocal corpusvariesnumpy only

4thHash1024d fallbackinstantnothing

Dashboard, Architect, Pod

The console at mazemaker.dev gives you everything: API key rotation, passkey enrollment, usage + dream-cycle status, and a tool playground. The Architect at architect.mazemaker.dev renders twelve monitors over the live graph. Both surfaces reach into the local pod — neither stores memory content remotely.

%%{init:{'flowchart':{'htmlLabels':true,'curve':'basis','padding':10}}}%%
flowchart LR
    user(("you"))
    dev["mazemaker.dev
operator console
API keys, quota, billing,
compute selector"]
    arch["architect.mazemaker.dev
twelve monitors, live cockpit
recall, dream, peers, sessions"]
    user --> dev
    user --> arch

    subgraph pod["your local pod"]
      direction TB
      W["wonderland
AES-256-GCM vault"]
      M["mazemaker-mcp
SQLite · graph · dream"]
      L["license-client
compute.toml"]
      W --> M
    end

    arch -.->|"loopback :8765"| W
    dev -.->|"JWT compute claim"| L
    L -.->|"path-watch restart"| M

    classDef you      fill:#0a0a0d,stroke:#a78bfa,stroke-width:2px,color:#f0a8ff;
    classDef remote   fill:#0a0a0d,stroke:#8b5cf6,stroke-width:1.5px,stroke-dasharray:4 3,color:#c4b5fd;
    classDef vault    fill:#1a0e2a,stroke:#a78bfa,stroke-width:2px,color:#ededf2;
    classDef engine   fill:#0e1a24,stroke:#5cd6ff,stroke-width:2px,color:#ededf2;
    classDef license  fill:#241a0e,stroke:#fbbf24,stroke-width:2px,color:#ededf2;
    class user you;
    class dev,arch remote;
    class W vault;
    class M engine;
    class L license;

Two surfaces, one pod. The architect cockpit only renders against your own machine; the operator console roundtrips compute + quota to the pod over the JWT.

Open dashboard Visit the architect Read integration spec

The Architect’s Room

Twelve monitors over the live graph: recall, dream replay, edge tension, top-touched, sessions, peers, MCP topology, Hermes chat. Boots only against your own pod.

The Architect dashboard at architect.mazemaker.dev — twelve-monitor wall with EDGES panel open showing real-time edge tension as a connection-history graph — architect.mazemaker.dev twelve panels live over your pod connection_history rendered on canvas SSE live · 5s tick

Go deeper.

Three companion pages cover the parts that the architecture diagram only hints at.

→

The Architect cockpit

Twelve monitors, the dream replay, the chrono-scrub, the Hermes skill-indexing pipeline. What lives behind architect.mazemaker.dev.

behind the door →

→

Onboarding flow

What the install one-liner actually does. Ten stages, one browser handoff, twelve minutes. Every guarantee, every failure mode.

from curl to cockpit →

→

Four-domain topology

How mazemaker.online, .dev, api., and architect. combine without ever crossing memory data. The privacy guarantee, by schema.

the four-domain map →

The Six-Layer Cognition Stack.

The Six Layers

Sponge ingestion

Atomic Fact Extraction (AFE)

Embedding — semantic, late-interaction, graph-aware

Three-phase dream consolidation

Synthesis crystallization

Targeted re-formation

The Controls Inside Each Layer

Semantic storage

Recall pipeline

Spreading activation

Conflict supersession

The Architect — cockpit

Compute, your choice

Layer 4 · Dream Engine Three-Phase Consolidation

Replay 100 recent memories

Bridge isolated memories

Store communities

Dream Engine — deep dive

The Backend Stack

wonderland

mazemaker-mcp

embedding-worker

license-client

LLM-Callable Surface

Store with conflict detection

Semantic search

Activation traversal

Graph statistics

Force consolidation

Dream telemetry

Engine vitals

Targeted forgetting

Live quota state

Chronological browse

Embedding Backends Auto-Priority

Dashboard, Architect, Pod

The Architect’s Room

Go deeper.

The Architect cockpit

Onboarding flow

Four-domain topology

Build the maze.Your agent finds the way.

Build the maze.
Your agent finds the way.