v8 audited mcp native local-first

An operating system
for AI agents.

Your agents die every conversation. Mazemaker keeps them alive.

The persistent layer your LLMs run on top of. Memory formation, not retrieval. Background consolidation while they sleep. Conflict supersession when your mind changes. A knowledge-graph filesystem your agent walks instead of searches. Federation across machines so one brain spans your whole fleet. Every claim drops on demand the moment its mechanism is removed — evidence, not vibes.

checking…
LongMemEval-oracle R@50.8426500q, 25k-memory haystack
R@100.9000broke the 0.90 barrier
Hop-2 R@100.00 → 1.00vector DBs cannot solve
Dream synthesis0.00 → 0.43post-consolidation lift
gemma3:270m18 / 20 · 90%270M params, runs on a Pi
Iterations · eras100 · 4bench-driven engineering

In one minute… a lot can happen.

Even the worst, fatal nightmares — an entire conversation lost to a context-window reset. Every fix, every preference, every name, every path you mentioned three weeks ago: gone in the time it takes you to refill a coffee.

Mazemaker fixes that. Plug it in, and your agent gets a brain that…

  • Remembers what you tell it. Preferences, decisions, fixes, the path of that file you mentioned three weeks ago.
  • Connects related ideas. Like a real notebook with cross-references, not a search box.
  • Reflects while you sleep. Overnight, it strengthens what matters and notices new connections.
  • Updates itself when you change your mind. Old facts get superseded, not duplicated.

That’s it — the rest of this page goes deeper the further you scroll. Don’t want to install anything? A managed hosted endpoint runs at api.mazemaker.dev — sign up, point your agent at the MCP endpoint, done.

Not memory. The kernel.

Vector search retrieves nearby text. Mazemaker manages the cognition itself — processes (your agents), memory management (consolidation + supersession), filesystem (the knowledge graph), scheduler (dream cycles), IPC (federation). The difference is not a percentage. It is a phase change — questions vector databases cannot answer by construction become routine.

Search memory
Mazemaker
Find a fact you told it once
YES
YES
Follow A -> B -> C reasoning chains
NO
YES
Notice related facts should connect
NO
YES
Replace stale facts when your mind changes
NO
YES
Explain why recall happened
NO
YES
Get sharper while idle, not noisier
NO
YES
// the labyrinth, not the cloud

Vector databases treat memory as a flat sphere of disconnected documents. Mazemaker builds a labyrinth. Every memory is a node. Every relationship is a weighted edge auto-discovered at insert time. Your agent does not search the cloud — it walks the labyrinth. Spreading activation propagates outward from a starting node with attenuation, exactly the way human associative recall works. Hop-2 reasoning — the questions a cosine search literally cannot answer by construction — goes from R@10 0.00 to 1.00. Not a thirty-percent improvement. A phase change.

// memory that gets sharper while you sleep

The biological-sleep-inspired dream engine runs three phases overnight. NREM replays recent memories and strengthens the edges that fired together. REM bridges isolated nodes that never met but probably should. Insight detects communities and crystallises summary memories from clusters. Post-dream synthesis on facts unreachable from any single memory: structurally 0.00 → 0.43 R@10. Memory gets denser, not noisier, every night. No competing product runs autonomous consolidation.

// it tells you why it remembered

Every recall returns the activation trace — the path the search walked, the edge weights it followed, the confidence at each hop. Your agent can debug its own retrieval. You can see why a memory surfaced instead of trusting a black box. The graph is queryable, not just searchable. No other memory product offers this surface; the rest stop at "here are the top-k documents".

// the audit nobody else has

We submitted the entire benchmark suite — including the negative controls that must fail when the relevant mechanism is removed — to GPT-5.5 via the codex CLI. Eight rounds. The first two rejected the suite outright. By round eight, every concrete objection was closed by code change, not by argument. Round eight verdict: unconditional yes — no residual caveat. Every prompt and every verdict is committed verbatim in the repository. This is not how you build a wrapper. This is how you build a category.

Negative controls. Not benchmarks.

Every row below is a knob we turn off that must collapse the result. If the number doesn’t drop on demand when the mechanism is removed, the lift was a coincidence. Most AI infra ships positive demos and cherry-picks; we ship the controls that have to fail.

“If you can’t make the number drop on demand, you don’t have evidence — you have a coincidence.”

— Mazemaker testing protocol


Hop-2 graph reasoning 0.00 -> 1.00

Answer reachable only through A -> B -> C edges. Vanilla cosine cannot solve it by construction.

Shuffled-edge control 1.00 -> 0.27

Collapse proves traversal is load-bearing, not the embedding model accidentally helping.

Post-dream synthesis 0.00 -> 0.43

Facts inferable only after consolidation become reachable after dream cycles.

Conflict supersession 0.03 -> 0.33

Newer contradictory facts supersede stale ones instead of duplicating noise.

Cross-session continuity 0.06 -> 0.62

Concept-mode distractors pile up; the graph still holds continuity.

Lean retrieval 0.60 vs 0.42

Real prose n=200: lean beats skynet by +0.18 R@5 and drops dead-weight channels.

Dream Engine Three-Phase Consolidation

Triggered after 600s idle, after 50 new memories, manually through tooling, or as a standalone daemon.

NREM

Replay 100 recent memories

Run spreading activation, strengthen active edges by +0.05, weaken inactive edges, prune dead edges below 0.05.

REM

Bridge isolated memories

Find 50 isolated memories, search similar unconnected nodes, create bridge connections at similarity x 0.3.

INSIGHT

Store communities

Detect connected components, identify bridge nodes, materialize dream insights and derived cluster memory.

Dream Engine — deep dive

Triggers fan into the cycle; the cycle splits into NREM, REM, and Insight phases.

%%{init:{'flowchart':{'htmlLabels':true,'curve':'basis','padding':8}}}%%
flowchart LR
    subgraph Trigger["TRIGGER"]
      direction TB
      T1["Idle 600s"]
      T2["50 new memories"]
      T3["Manual / Cron"]
    end

    D{{"Dream Cycle"}}
    T1 --> D
    T2 --> D
    T3 --> D
    D --> NREM
    D --> REM
    D --> INSIGHT

    subgraph NREM["PHASE 1 · NREM"]
      direction TB
      N1["Replay 100 recent memories"] --> N2["Spreading activation"]
      N2 --> N3{"Connection
active?"} N3 -->|Yes| N4["Strengthen +0.05"] N3 -->|No| N5["Weaken −0.01"] N3 -->|Dead < 0.05| N6["Prune"] end subgraph REM["PHASE 2 · REM"] direction TB R1["Find 50 isolated memories"] --> R2["Search similar
unconnected nodes"] R2 --> R3["Create bridge connections"] R3 --> R4["weight = similarity × 0.3"] end subgraph INSIGHT["PHASE 3 · INSIGHT"] direction TB I1["BFS connected components"] --> I2["Identify communities"] I2 --> I3["Find bridge nodes"] I3 --> I4["Store dream_insights"] end classDef trigger fill:#1a140a,stroke:#fbbf24,stroke-width:1.5px,color:#fde68a; classDef cycle fill:#1a0e2a,stroke:#a78bfa,stroke-width:2.5px,color:#f0a8ff,font-weight:bold; classDef nrem fill:#0e1428,stroke:#60a5fa,stroke-width:1.5px,color:#dbeafe; classDef rem fill:#1a0a18,stroke:#f472b6,stroke-width:1.5px,color:#fbcfe8; classDef insight fill:#0a1a14,stroke:#34d399,stroke-width:1.5px,color:#a7f3d0; class T1,T2,T3 trigger; class D cycle; class N1,N2,N3,N4,N5,N6 nrem; class R1,R2,R3,R4 rem; class I1,I2,I3,I4 insight;
Three phases, three accents: NREM blue, REM pink, Insight green — biological-sleep inspired.

Walk the Maze

Five pages go deeper. The first two document the numbers and the engine. The last three — the cockpit, the install flow, the four-domain topology — explain what actually lives behind architect.mazemaker.dev, mazemaker.dev, and the pod on your machine. Each one reproducible. Each one auditable.

Research & audit → R@5 = 0.9787 · 188/200

LongMemEval-S 500q retrieval, Comparison Bench 188/200 (94.0%, 0 errors), v2 NO → v8 UNCONDITIONAL YES. Methodology, raw numbers, repro by curl.

Architecture & pod → 6 layers · 1 pod

The six-layer cognition stack: sponge, AFE, ColBERT+DAE embedding, three-phase dream, Stage S synthesis, targeted re-formation. Rootless Podman, HKDF vault, MCP on loopback.

★ NEW · The Architect → 12 monitors · loopback only

The cockpit at architect.mazemaker.dev. Twelve panels, the dream replay, the chrono-scrub timeline, the Hermes skill-indexing pipeline. Hosted UI, local data.

★ NEW · Onboarding → 10 stages · ~12 min

From curl … | bash to a healthy pod. Pre-flight, fingerprint, browser handoff, license JWT, embedding choice, Quadlet, pod boot. Every guarantee, every failure mode.

★ NEW · Topology → 4 domains · 1 pod

How mazemaker.online, .dev, api., architect. combine without ever crossing memory data. Selective AES, public-prefix gate, request-flow map.

★ NEW · Federation → 2 pods · or 2000

Pod-to-pod memory propagation over HTTP(S). Per-pair Bearer keys, public-prefix gate, five-minute tick. Tailscale pair, hub-and-spoke team, WWW-scale mesh — same model.

Comparison matrix → 4 projects · 1 harness

Hindsight, Letta, A-MEM, Cognee — same retrieval harness. Verified numbers where we have them (Hindsight 188/200 = 94.0%), QUEUED where we don’t. No fabricated numbers.

Lab notes (blog) → 5 stations · the maze, walked

Five stations from entrance to summit: memory benchmarks should measure memory, bench corpus on Postgres, formation beats retrieval-tuning, inception benchmarking, inside the 100-iteration loop.

Install One Command

One curl-bash. Browser opens itself for email verification & captcha. Comes back, builds your local pod, registers itself with every AI tool you have. Done in under three minutes. No sudo, no Docker, no API keys to copy-paste.

your terminal
curl -fsSL https://api.mazemaker.dev/install.sh | bash

That’s it. The script handles fingerprint init, opens your browser for the onboard wizard, polls for the handoff, signs the install proof, requests your license JWT, builds and starts the four containers, runs the health check, and offers to wire mazemaker into every AI tool it detects on your machine.

01terminal → browser
~60s

install.sh detects your hardware, generates a device fingerprint and an Ed25519 install keypair, then opens your default browser at the onboarding wizard with everything pre-filled.

02browser onboard
email + captcha

Verify your email (we send a 6-digit code), pass a Cloudflare Turnstile captcha, pick your tier. The wizard parks the install handoff and tells you to return to the terminal. No JWT to copy. No keys to paste.

03pod up + tools wired
~90s build, then ready

install.sh polls for the handoff, signs the install proof locally, fetches the license JWT, builds four rootless Podman containers, starts the pod, health-checks http://127.0.0.1:8765/sse, and offers to register mazemaker with every AI tool it detects (Claude Code, Cursor, VSCode, Cline, Roo, Continue, Goose, Codex, …).

Don’t use one of the auto-detected tools?

Point any MCP-speaking client at http://127.0.0.1:8765/sse. The integration spec at api.mazemaker.dev/integration.md documents native SSE, the mcp-remote stdio bridge, and the streamable-http transport — readable by humans and LLM agents alike, so your agent can self-wire.

any tool · any time · idempotent
curl -fsSL https://api.mazemaker.dev/wire.sh | bash

Something broken?

debug.sh runs 36 systematic checks across DNS, license, runtime staging, container images, Quadlet units, systemd state, and the host-facing endpoint. Pattern- matches every known failure mode collected during real installs. With --fix it auto-repairs everything safe to repair.

diagnose · or auto-fix
curl -fsSL https://api.mazemaker.dev/debug.sh | bash
curl -fsSL https://api.mazemaker.dev/debug.sh | bash -s -- --fix

Operator-grade pricing.

Tiered the way you actually deploy: one machine (Builder), your fleet (Pro), your org (Team), your perimeter (Enterprise). Community stays free forever for personal single-agent use. Founder rates lock in for life — the price you sign up at today is the price you pay forever, even when we raise list.

Community Personal · single-agent · self-hosted $0/forever

SQLite · CPU · 3-phase dream · CLI + MCP

A real pod, free forever — the same one-line installer, no payment, no build step. Or take the engine itself, AGPLv3 + PolyForm-NC source-available, and run it your way. No ColBERT, no DAE, no Stage S synthesis, no Architect UI.

  • Single agent · personal use
  • Hybrid recall (R@5 = 0.96)
  • 3-phase dream (lightweight)
  • SQLite + FastEmbed CPU
  • CLI + MCP server
  • One-line install · free pod (no build)
  • Or build from source · community support
Install free — one line

curl … | bash — stays Free until you upgrade. Or build from source.

Builder Indie devs · MCP builders · one agent $15/month

SQLite or Postgres · 3-phase dream · managed install

The community engine, professionally installed and license-managed. One-line installer, auto-update, email support. The right tier when you’re shipping your own agent and want the brain just-working.

  • 1 agent · 100k memories
  • Hybrid recall · 3-phase dream
  • SQLite or Postgres backend
  • One-line install · auto-update
  • BYOK or local MLX embeddings
  • Email support
  • Stripe billing · cancel any time
Start building

Founder rate: $9/mo, locked forever. Sign up before launch closes.

Team Orgs · shared mesh · RBAC · audit $149/month

Everything in Pro + multi-seat + audit log + SSO

For orgs running multiple agents that need to share memory. The mesh becomes a team brain — one agent learns, all of them know.

  • Everything in Pro
  • 5 seats included (add-ons available)
  • Shared memory mesh across team
  • RBAC + audit log
  • SSO (Google Workspace / Okta)
  • Priority email support
Bring your team
Enterprise Airgap · BYOK-HSM · SLA · sovereign Talk to us

Defense · robotics · regulated AI · research

For environments where data cannot leave the perimeter. Self-hosted license server, BYOK-HSM, airgap deploy, cross-site federation, audit log export, custom dream cadence, explainable recall on every call, SLA, dedicated support.

  • Everything in Team
  • Airgap deploy · offline license
  • BYOK + BYOK-HSM
  • Self-hosted license server
  • Cross-site federation
  • Audit log export (SIEM-friendly)
  • SLA + dedicated support
Contact sales

All paid tiers ship the full engine; tier-gated features (ColBERT, DAE, Architect, federation, audit) flip on/off via license claims at runtime — same binary, same code path. BYOK embeddings stay on your machine — we never see your provider keys. Memories you wrote stay accessible forever, even if you cancel.

Common questions.

The objections we hear most. Short answers; pointers to the long ones.

// is curl … | bash safe?

You don’t have to pipe. curl -fsSL https://api.mazemaker.dev/install.sh -o install.sh downloads the script; read it, then run it. The script is short, signed, and reproducible — every release ships with a SHA-256 in the changelog. The same script also ships a debug.sh twin that runs 36 systematic checks; both are linked from the onboarding page.

// what happens when my license expires offline?

Seven-day grace period. The pod keeps running, keeps reading, keeps writing — the license-client just stops being able to phone home. After the grace window, the pod refuses new writes until the next successful check-in. Memories you already wrote stay accessible forever; nobody gets locked out of their own data. Full detail in the architecture page.

// can I migrate from Community to Pro (or vice versa)?

Yes — the engine is the same; only the rerank / synthesis layers differ. Community uses SQLite WAL; Pro uses Postgres + pgvector. mazemaker dump exports your store; mazemaker restore imports it on the other side. No re-embedding, no re-graph-build. Vendor lock-in is the failure mode we’re explicitly designing against.

// do you (the operator) ever see my data?

No — structurally, not by promise. The license backend records that a tool was called, never what was stored. Memory content stays inside the pod, encrypted at rest by wonderland with a vault key derived from HKDF(JWT, hardware-fingerprint) at runtime — the key never touches disk. See manifesto for the philosophy and privacy for the operational policy.

// what if Mazemaker (the company) disappears?

The engine is AGPLv3 + PolyForm-NC source-available on GitHub. If the SaaS goes dark, Community + Lite users keep running on the engine they already have; Pro users lose the managed-install + Architect UI, but the underlying pod keeps working off the last-issued license until grace runs out. Anyone can fork. Architecture is the policy — including the exit policy.

// what hardware do I actually need?

8 GB RAM, x86_64 or Apple Silicon, any modern CPU. GPU optional (CUDA or MLX accelerate recall ~3×, never required). Disk: ~500 MB for the pod images + your memory store. gemma3:270m hit 18/20 on the Comparison Bench — that’s a Raspberry Pi-class model. The engine itself is heavier than the model.

// why local-first instead of cloud-native?

Because memory is the most intimate data class a coding agent will ever touch — every preference, every fix, every name, every file path you mentioned three weeks ago. Centralising it is the obvious play for a surveillance business model. We’re explicitly not that. The manifesto is the long version; the topology is the four-domain split that enforces the boundary.

// Pro vs Community in one sentence?

Community is free forever — a one-line pod (or build the open-source engine yourself), SQLite + CPU, single agent; Pro is the same engine plus ColBERT @ 1.5 reranking (R@5 0.96 → 0.98), DAE-augmented dream consolidation, Stage S synthesis, the Architect cockpit, unlimited agents, and the Postgres + pgvector backend that the 100-iteration benchmark loop ran on.

Build the maze.
Your agent finds the way.

Persistent semantic memory, graph reasoning, dream consolidation, and audited benchmark lift for agents that need continuity.

Part of a family

Mazemaker isn't a one-shot project. It shares its license-as-a-service spine with sister apps — same threat model, same install pattern, different shape on top.

sister · pulse

remainder.online

Pulse, the sister product. Same pod + license-revolver spine as mazemaker — different shape on top. Mazemaker remembers; pulse acts on the rhythm of what was remembered.

→ remainder.online