LLM Integration
Membrane is designed to sit between your orchestration layer and model calls. Capture execution traces as typed memory, retrieve a trust-gated graph neighborhood before prompting, then reinforce the records that actually helped.
The 4-Step Pattern
Capture
Store tool outputs, observations, agent turns, and working state with captureMemory. Membrane creates a primary record and can link entities or related records.
Retrieve graph context
Before a model call, use retrieveGraph with a task descriptor and trust context. The response includes ranked root memories and connected graph nodes.
Prompt
Project the graph into concise prompt context. Include record IDs so the model can cite sources and your application can reinforce useful records later.
Reinforce or penalize
After the model output is validated, reinforce useful records or penalize misleading records. Salience changes drive future ranking.
Full TypeScript + OpenAI Example
import OpenAI from "openai";
import {
MembraneClient,
Sensitivity,
SourceKind,
type GraphNode,
} from "@bennettschwartz/membrane";
const memory = new MembraneClient("localhost:9090", {
apiKey: process.env.MEMBRANE_API_KEY,
});
const llm = new OpenAI({
apiKey: process.env.LLM_API_KEY,
// OpenAI-compatible providers are supported here, for example:
// baseURL: "https://openrouter.ai/api/v1",
});
const graph = await memory.retrieveGraph("plan auth service rollout", {
trust: {
max_sensitivity: Sensitivity.MEDIUM,
authenticated: true,
actor_id: "planner-agent",
scopes: ["project-auth"],
},
memoryTypes: ["entity", "semantic", "competence", "working", "plan_graph"],
rootLimit: 10,
nodeLimit: 25,
edgeLimit: 100,
maxHops: 1,
});
function formatNode(node: GraphNode): string {
return JSON.stringify({
id: node.record.id,
type: node.record.type,
root: node.root,
hop: node.hop,
confidence: node.record.confidence,
salience: node.record.salience,
payload: node.record.payload,
});
}
const context = graph.nodes.map(formatNode).join("\n");
const completion = await llm.chat.completions.create({
model: "gpt-5.5",
messages: [
{ role: "system", content: "Use memory context as evidence. Cite record ids." },
{ role: "user", content: `Task: plan auth service rollout\n\nMemory:\n${context}` },
],
});
const answer = completion.choices[0]?.message?.content ?? "";
const planCapture = await memory.captureMemory(
{
task: "plan auth service rollout",
answer,
},
{
source: "planner-agent",
sourceKind: SourceKind.AGENT_TURN,
reasonToRemember: "Persist the generated rollout plan and its supporting context",
summary: answer.slice(0, 500),
tags: ["llm", "plan", "auth"],
scope: "project-auth",
sensitivity: Sensitivity.MEDIUM,
}
);
await memory.reinforce(
planCapture.primary_record.id,
"planner-agent",
"plan was accepted after review"
);
memory.close();
Capture During Execution
Use one capture method for all runtime evidence. Change sourceKind and content shape based on what happened.
// After a tool runs
const toolCapture = await memory.captureMemory(
{
tool_name: "go test",
args: { packages: ["./pkg/auth"] },
result: { exit_code: 0, stdout: "ok ./pkg/auth" },
},
{
sourceKind: SourceKind.TOOL_OUTPUT,
source: "auth-agent",
reasonToRemember: "Successful auth package verification",
summary: "Auth package tests passed",
tags: ["auth", "tests"],
scope: "project-auth",
sensitivity: Sensitivity.LOW,
}
);
// After a useful observation
const observationCapture = await memory.captureMemory(
{
subject: "auth service",
predicate: "uses_retry_policy",
object: { max_attempts: 3, backoff: "exponential" },
},
{
sourceKind: SourceKind.OBSERVATION,
source: "auth-agent",
reasonToRemember: "Retry policy affects debugging and rollout planning",
summary: "Auth service retries requests up to three times",
tags: ["auth", "runtime"],
scope: "project-auth",
sensitivity: Sensitivity.LOW,
}
);
Retrieve Before Prompting
Pass a concrete task descriptor and bounded graph limits. The root nodes are the highest-signal records; connected nodes add entities, facts, and provenance.
const graph = await memory.retrieveGraph("debug auth retry failures", {
trust: {
max_sensitivity: Sensitivity.MEDIUM,
authenticated: true,
actor_id: "auth-agent",
scopes: ["project-auth"],
},
memoryTypes: ["entity", "semantic", "competence", "episodic"],
rootLimit: 8,
nodeLimit: 20,
edgeLimit: 80,
maxHops: 1,
});
If you only need the most relevant records, project root nodes:
const rootRecords = graph.nodes
.filter((node) => node.root)
.map((node) => node.record);
If entity and provenance context matter, include all graph nodes and edges:
const context = JSON.stringify({
nodes: graph.nodes,
edges: graph.edges,
root_ids: graph.root_ids,
});
Build The Prompt
Use compact JSON lines when prompts need source IDs:
const memoryContext = graph.nodes
.map((node) => {
const record = node.record;
return JSON.stringify({
id: record.id,
type: record.type,
root: node.root,
hop: node.hop,
payload: record.payload,
});
})
.join("\n");
const messages = [
{ role: "system", content: "Use memory context as evidence. Cite record ids." },
{ role: "user", content: `Task: debug auth retry failures\n\nMemory:\n${memoryContext}` },
];
Reinforce On Success
Reinforce records that contributed to a useful answer:
for (const id of graph.root_ids) {
await memory.reinforce(id, "auth-agent", "used in accepted debugging plan");
}
Reduce salience when a record misled the agent:
await memory.penalize(recordId, 0.2, "auth-agent", "not relevant to this incident");
Tips For Effective Integration
Scope Records To Projects
Use scope on capture and trust.scopes on retrieval to isolate memories:
await memory.captureMemory(
{ text: "Frontend build uses pnpm" },
{
sourceKind: SourceKind.OBSERVATION,
scope: "project-frontend",
sensitivity: Sensitivity.LOW,
}
);
const graph = await memory.retrieveGraph("deploy frontend", {
trust: {
max_sensitivity: Sensitivity.LOW,
authenticated: true,
actor_id: "frontend-agent",
scopes: ["project-frontend"],
},
});
Choose Memory Types Per Task
| Task | Recommended types |
|---|---|
| Planning | entity, semantic, competence, plan_graph |
| Debugging | entity, episodic, competence, semantic |
| Context restoration | working, entity, semantic |
| Self-correction | competence, episodic |
| Preference retrieval | entity, semantic |
Keep Prompt Context Bounded
Use rootLimit, nodeLimit, edgeLimit, and maxHops instead of retrieving a large flat list. Start with rootLimit: 8, nodeLimit: 20, and maxHops: 1, then tune by prompt size.
LLM-Backed Interpretation
CaptureMemory uses a separate ingest-side interpreter configuration. Enable it when you want capture-time summaries, mentions, relation candidates, entity resolution, and proposed linked facts.
backend: postgres
postgres_dsn: "postgres://membrane:membrane@localhost:5432/membrane?sslmode=disable"
ingest_llm_enabled: true
ingest_llm_endpoint: "https://api.openai.com/v1/chat/completions"
ingest_llm_model: "gpt-5-mini"
# ingest_llm_api_key: "" # or set MEMBRANE_INGEST_LLM_API_KEY
The background consolidation scheduler uses llm_* settings separately. Configure those when you want asynchronous episodic-to-semantic extraction:
llm_endpoint: "https://api.openai.com/v1/chat/completions"
llm_model: "gpt-5-mini"
# llm_api_key: "" # or set MEMBRANE_LLM_API_KEY
The consolidation scheduler promotes eligible episodic traces into semantic facts, competence records, and plan graphs over time.
Embedding-Backed Retrieval
With Postgres + pgvector and an embedding endpoint configured, graph roots are ranked with hybrid vector and salience scoring.
backend: postgres
postgres_dsn: "postgres://membrane:membrane@localhost:5432/membrane?sslmode=disable"
embedding_endpoint: "https://api.openai.com/v1/embeddings"
embedding_model: "text-embedding-3-small"
embedding_dimensions: 1536
# embedding_api_key: "" # or set MEMBRANE_EMBEDDING_API_KEY
Specific task descriptors improve retrieval. Prefer "debug auth retry failures in project-auth" over "help".