Architecture
Overview
Membrane runs as a long-lived daemon (membraned) or as an embedded Go library. Either way, the same subsystems wire together under a single Membrane struct that exposes a unified API surface.
+------------------+ +------------------+ +----------------------+
| Ingestion Plane |---->| Policy Plane |---->| Storage & Retrieval |
+------------------+ +------------------+ +----------------------+
| | |
CaptureMemory Classification, SQLCipher (encrypted),
candidates sensitivity, audit trails,
decay profiles trust-gated access
All write paths flow left to right: raw experience enters the Ingestion Plane, the Policy Plane stamps it with a sensitivity level and lifecycle, and the result lands in the authoritative store. Retrieval travels right to left, applying trust filters and salience ranking before returning records to the caller.
Three logical planes
Ingestion plane
The ingestion plane accepts graph-aware CaptureMemory candidates and converts them into typed MemoryRecord values. The ingestion.Service coordinates three internal components:
- Classifier — determines which memory type (
episodic,working,semantic, etc.) a candidate belongs to based on its shape. - Policy engine — assigns sensitivity, confidence, initial salience, and a type-specific decay profile. Tool outputs receive an initial confidence of
0.9; observations receive0.7; events receive0.8. - Interpretation and store write — persists the primary record, optional interpretation metadata, linked entity records, graph edges, and the initial audit log entry.
Episodic records are immutable once ingested. All other types can be revised through explicit revision operations.
Policy plane
The policy plane runs inline with ingestion and governs two cross-cutting concerns:
- Sensitivity assignment — defaults to the configured
default_sensitivity("low"out of the box) unless overridden per-record. Sensitivity is stored on the record and used by the trust filter at retrieval time. - Decay profile assignment — each memory type gets a different exponential decay half-life: episodic records decay in ~1 hour by default; working memory in ~1 day; entity, semantic, competence, and plan-graph records in ~30 days.
Storage and retrieval plane
The storage and retrieval plane is where records live and where queries are answered. Retrieval follows a canonical layer order defined in pkg/retrieval/retrieval.go:
working → entity → semantic → competence → plan_graph → episodic
Each layer is filtered by the caller's TrustContext before results are merged. When a pgvector backend and query embedding are available, results are ranked using a hybrid score (70% vector similarity, 30% salience). Otherwise, results are ranked by salience descending. An optional Limit caps the returned set.
Subsystems
Accepts graph-aware capture candidates for events, tool outputs, observations, working state, and agent turns. Classifies, validates, interprets, links, and persists records with full provenance and an initial audit entry.
Layered query across all memory types with trust-context filtering. Records are ranked by salience, or by a hybrid vector+salience score when Postgres + pgvector is configured (recommended). Competence and plan-graph records go through an additional multi-solution selector.
Applies exponential salience decay to all non-pinned records on a configurable schedule. Exposes Reinforce and Penalize to adjust salience explicitly in response to outcomes.
Five atomic revision operations — supersede, fork, retract, merge, contest — each producing a provenance link and an audit trail entry. Embedding vectors are updated when configured.
Background pipeline that promotes raw episodic traces into durable semantic facts, competence records, and plan graphs. Capture interpretation can also create entity records and graph links.
Point-in-time snapshot collector. Reports total records, salience distribution, retrieval usefulness, competence success rate, plan reuse frequency, and revision rate.
HTTP client for OpenAI-compatible embedding endpoints. Generates query embeddings at retrieval time and record embeddings after ingestion or revision when a Postgres backend is configured.
Pluggable store interface backed by SQLite (default, with optional SQLCipher encryption) or Postgres + pgvector. Supports transactional updates, audit appends, and salience-only updates.
How they wire together
The membrane.New constructor in pkg/membrane/membrane.go wires all subsystems based on the provided Config:
// Membrane wires all subsystems together and exposes the unified API surface.
type Membrane struct {
config *Config
store storage.Store
ingestion *ingestion.Service
retrieval *retrieval.Service
decay *decay.Service
revision *revision.Service
consolidation *consolidation.Service
metrics *metrics.Collector
embedding *embedding.Service
decayScheduler *decay.Scheduler
consolScheduler *consolidation.Scheduler
}
Subsystems that depend on embeddings or LLM extraction are conditionally constructed: if EmbeddingEndpoint is set and the Postgres backend is selected, the embedding service is created and passed to retrieval, revision, and consolidation. If LLMEndpoint is set, consolidation is upgraded with the LLM-backed semantic extractor. If IngestLLMEnabled, IngestLLMEndpoint, and IngestLLMModel are set, capture is upgraded with ingest-side interpretation and candidate resolution.
Storage model
Backends
| Backend | When to use | Notes |
|---|---|---|
| SQLite (default) | Single-process deployments, local development, zero-infrastructure setups | Uses SQLCipher for encryption at rest when MEMBRANE_ENCRYPTION_KEY is set |
| Postgres | Multi-writer deployments, concurrent agents | JSONB payload storage, same retrieval semantics as SQLite |
| Postgres + pgvector | Embedding-backed retrieval and selection | Enables hybrid vector+salience ranking for all record types and embedding-backed multi-solution selection for competence and plan-graph records |
Record structure
Every MemoryRecord shares a common envelope regardless of type:
- ID — UUID generated at ingestion time
- Type — one of
episodic,working,entity,semantic,competence,plan_graph - Sensitivity —
public,low,medium,high, orhyper - Confidence — epistemic confidence score (0.0–1.0), set by the policy engine at ingestion
- Salience — current relevance score (0.0–1.0), modified by decay, reinforcement, and penalization
- Payload — type-specific JSON struct stored inside the authoritative record
- Provenance — list of source references describing where the record came from
- Relations — directed edges to other records:
mentions_entity,referenced_by,derived_from,supersedes,contested_by,supports,contradicts - AuditLog — append-only list of every action taken on the record
Relationship graph
Relations between records are stored alongside the records themselves. When a revision operation runs (e.g., Supersede), the new record gains a supersedes relation to the old one, and the old record's status is updated. This gives every record a queryable lineage without a separate graph store.
Deployment tiers
Membrane scales from a zero-infrastructure default to a full pipeline with embedding similarity search and LLM-backed knowledge extraction.
| Tier | Backend | Embedding | LLM | Behavior |
|---|---|---|---|---|
| 1 | SQLite | — | — | Zero-infra default; confidence-based applicability fallback for competence and plan-graph selection |
| 2 | Postgres | — | — | Concurrent writers; JSONB storage; same retrieval semantics as tier 1 |
| 3 | Postgres + pgvector | Yes | — | Hybrid vector+salience ranking for all record types; embedding-backed multi-solution selection for competence and plan_graph |
| 4 | Postgres + pgvector | Yes | Yes | Full system: LLM-backed episodic-to-semantic extraction runs asynchronously during consolidation |
Tiers 3 and 4 require a Postgres database with the pgvector extension enabled. The provided docker-compose.yml starts a pgvector/pgvector:pg16 image with the correct user and database for local development.
Background jobs
Two schedulers run as goroutines when m.Start(ctx) is called. Both stop cleanly when the context is cancelled.
| Job | Default interval | Purpose |
|---|---|---|
| Decay | 1 hour | Applies exponential salience decay (salience × 2^(−elapsed/halfLife)) to all non-pinned records using the per-record DecayProfile |
| Pruning | Runs with decay | Deletes records whose salience has reached 0 and whose DeletionPolicy is auto_prune; pinned records are never pruned |
| Consolidation | 6 hours | Runs the full consolidation pipeline: episodic compression → structural semantic extraction → LLM semantic extraction (if configured) → competence extraction → plan-graph extraction |
Decay curve
Membrane uses exponential decay. The formula from pkg/decay/curves.go:
// Exponential computes exponential decay: salience * 2^(-elapsed/halfLife),
// floored at MinSalience.
func Exponential(currentSalience, elapsedSeconds float64, profile schema.DecayProfile) float64 {
halfLife := float64(profile.HalfLifeSeconds)
if halfLife <= 0 {
return math.Max(currentSalience, profile.MinSalience)
}
decayed := currentSalience * math.Exp(-elapsedSeconds*math.Log(2)/halfLife)
return math.Max(decayed, profile.MinSalience)
}
Default half-lives are set by the policy engine:
| Memory type | Default half-life |
|---|---|
| Episodic | 1 hour |
| Working | 1 day |
| Entity | 30 days |
| Semantic | 30 days |
| Competence | 30 days |
| Plan graph | 30 days |
Consolidation pipeline
The consolidation.Service.RunAll method runs these sub-consolidators in sequence:
- Episodic consolidator — compresses old episodic records by reducing their salience once they exceed an age threshold.
- Semantic consolidator — scans episodic records for observation-like patterns and promotes them to semantic facts; reinforces existing duplicates rather than creating new ones.
- Semantic extractor (optional, requires LLM) — sends batches of episodic records to a chat completion endpoint and stores the extracted subject-predicate-object triples as semantic memory.
- Competence consolidator — identifies repeated successful episodic patterns and promotes them to competence records with success-rate tracking.
- Plan-graph consolidator — extracts multi-step tool-call sequences from episodic tool graphs and stores them as reusable plan graphs.
Security model
Encryption at rest
When MEMBRANE_ENCRYPTION_KEY (or encryption_key in the config) is set, Membrane opens the SQLite database with PRAGMA key via SQLCipher. The database file is unreadable without the key. This setting has no effect on the Postgres backend.
Transport security
TLS is optional. Set tls_cert_file and tls_key_file in the config to enable it. Without TLS the gRPC server runs in plaintext — acceptable for loopback connections but not for networked deployments.
Authentication
The daemon enforces a bearer token check on every gRPC call when MEMBRANE_API_KEY (or api_key in the config) is non-empty. The client must send the token in the authorization metadata header. Authentication is disabled when the key is not set.
Rate limiting
A token-bucket rate limiter is applied per connection. The default is 100 requests per second, configurable via rate_limit_per_second. Set to 0 to disable.
Trust-aware retrieval
The TrustContext passed with every retrieval request controls which records are visible:
- MaxSensitivity — records at a higher sensitivity level are excluded entirely; records exactly one level above
MaxSensitivitymay be returned in redacted form (metadata only, payload stripped). - Scopes — if the trust context specifies scopes, only records whose
Scopematches are returned. Records with an empty scope are unscoped and visible to all callers. - Authenticated — carried in the trust context for policy decisions.
Sensitivity levels in ascending order: public → low → medium → high → hyper.
// sensitivityOrder maps sensitivity levels to a numeric ordering.
var sensitivityOrder = map[schema.Sensitivity]int{
schema.SensitivityPublic: 0,
schema.SensitivityLow: 1,
schema.SensitivityMedium: 2,
schema.SensitivityHigh: 3,
schema.SensitivityHyper: 4,
}
Input validation
The policy engine validates every ingestion candidate before creating a record. Checks include: required fields per candidate kind, sensitivity value in the allowed set, and NaN/Inf rejection on numeric fields. Payload size limits, string length limits, and tag count limits are enforced at the gRPC boundary.
gRPC API surface
The daemon exposes a gRPC service with typed protobuf envelopes. Arbitrary content/object/tool fields use google.protobuf.Value; records and graph responses are first-class protobuf messages.
| Method | Plane | Description |
|---|---|---|
CaptureMemory | Ingestion | Capture a memory candidate plus linked entity/semantic records |
RetrieveGraph | Retrieval | Retrieve ranked root memories and a bounded graph neighborhood |
RetrieveByID | Retrieval | Fetch single record by ID |
Supersede | Revision | Replace a record with a new version |
Fork | Revision | Create conditional variant of a record |
Retract | Revision | Mark a record as retracted |
Merge | Revision | Combine multiple records into one |
Contest | Revision | Mark a record as contested by conflicting evidence |
Reinforce | Decay | Boost a record's salience |
Penalize | Decay | Reduce a record's salience |
GetMetrics | Metrics | Retrieve observability metrics snapshot |
Observability
GetMetrics returns a point-in-time snapshot from the metrics.Collector. Example response:
{
"total_records": 160,
"records_by_type": {
"episodic": 80,
"entity": 18,
"semantic": 35,
"competence": 15,
"plan_graph": 7,
"working": 5
},
"avg_salience": 0.62,
"avg_confidence": 0.78,
"active_records": 148,
"pinned_records": 3,
"total_audit_entries": 890,
"memory_growth_rate": 0.15,
"retrieval_usefulness": 0.42,
"competence_success_rate": 0.85,
"plan_reuse_frequency": 2.3,
"revision_rate": 0.08
}
| Metric | Description |
|---|---|
memory_growth_rate | Fraction of records created in the last 24 hours |
retrieval_usefulness | Ratio of reinforce actions to total audit entries |
competence_success_rate | Average success rate across competence records |
plan_reuse_frequency | Average execution count across plan-graph records |
revision_rate | Fraction of audit entries that are revisions (supersede, fork, merge) |
Next steps
Schemas and lifecycle rules for the five memory layers and entity graph.
How salience decay and background consolidation keep memory lean and useful.
Full model for sensitivity levels, trust contexts, and redacted access.
Running Membrane in production: Postgres, TLS, authentication, and scaling.