Why does a PM agent need persistent memory?

A PM agent tracks stakeholders, decisions, SLAs and risks that all outlive a single conversation. Without persistent memory, the agent re-asks the same questions every day. With it, the agent recalls what the team agreed last Tuesday and treats follow-up questions as continuations rather than fresh sessions.

How does the agent capture decisions automatically?

After each conversation turn, an extractor (Claude Haiku) reads the recent messages and proposes new memories: stakeholders, decisions, risks, milestones, SLAs. New entries are deduplicated against the existing store; contradictions trigger supersession. The user does not manually log anything.

How do I keep multiple projects isolated?

Use Alma environments: one environment per project. Each environment has its own memories, episodes, procedures and Soul blocks. Cross-project queries are not possible without an explicit switch, which is the safe default for a PM tool where leaking decisions across projects is a real risk.

What does this cost on a typical team?

Per-user marginal cost is dominated by the LLM calls themselves. The memory layer adds one cheap assemble call (~30 ms, a few KB) and one Haiku extract call per turn. For a team running 20 messages/day per user, the memory overhead is well under a cent per day.

Building a PM Agent with Claude API and Persistent Memory

May 2026 · 11 min read · Fran Olivares, Founder of OlivaresAI

A project-management agent built on the Claude API and Alma persistent memory tracks stakeholders, decisions, SLAs and standup notes across days and weeks without losing context. The architecture has four parts: a Claude conversation loop, a memory store keyed by project entity, an extractor that pulls structured records from each chat, and a context assembler that injects the right slice into every prompt. Persistent memory is what turns Claude from a smart drafting tool into an agent that remembers what the team decided last Tuesday.

Project management is mostly memory work. Who owns this stream? What did we agree about the migration window? Why did we descope the rate limiter? When does the legal review block the release? An agent that has to re-ask these questions every morning isn't an agent — it's a slightly faster intern. The way you change that is by giving the model a persistent memory layer it can read and write between turns, populated automatically from the conversation. This guide walks through the reference architecture and the integration code, using the Anthropic Claude API as the LLM and Alma's REST API as the memory layer.

Why does a PM agent specifically need persistent memory?

Three structural reasons. First, the entities a PM tracks (people, decisions, deliverables, SLAs, risks) are themselves long-lived — they outlive any single conversation by definition. Second, the conversational style is high-frequency and low-effort: short standup messages, quick clarifications, "what did we say about X?" questions. Loading the right slice of context cheaply matters. Third, the cost of forgetting is high: a missed decision becomes a missed release, a forgotten dependency becomes a blocker.

A stateless Claude conversation handles a single planning session well. As soon as the user wants continuity ("yesterday we agreed…", "what's blocking the auth team this week?"), the conversation has to either replay full history into the context window (expensive, eventually impossible) or rely on a memory layer outside the model.

What does the reference architecture look like?

Four moving parts:

Claude API conversation loop. A standard messages.create stream from the Anthropic SDK with a project-aware system prompt. Tool use is enabled so the model can ask the memory layer for entities by name when needed.
Memory store keyed by project entity. Each stakeholder, deliverable, decision and SLA is a memory record with a category tag (stakeholder, decision, sla, risk) and an importance score. Episodes capture compressed daily standups; procedures capture recurring workflows ("the way we handle hotfixes").
Automatic extractor. After each conversation turn, a small Claude Haiku call reads the recent messages and proposes new memories (or updates to existing ones). This is what makes the agent low-friction: the user doesn't manually log decisions; the agent does it for them.
Context assembler. Before each user turn, the memory API is asked for the most relevant entries given the user query. The result becomes part of the Claude system prompt — think of it as a curated cheat-sheet for this specific question.

How do I structure memory categories for a PM agent?

Five categories cover most teams: stakeholder (people with role + responsibilities), decision (what was agreed, when, by whom, with the rationale), sla (commitments to other teams or customers), risk (open issues with owner + mitigation), milestone (target date + scope + status). Each memory carries an importance score so the assembler can prioritise high-stakes items in retrieval.

The category isn't just for organisation — it's part of the retrieval signal. When the user asks "what did we decide?", the assembler weighs decision-category memories higher. When they ask "who's blocked?", risk-category memories rank up. The Alma context assembly exposes per-category boost weights for exactly this use case.

What's the integration loop in actual code?

Three phases per user message. The pseudo-code below uses Node.js with the Alma SDK and the Anthropic SDK, but the same shape works in Python or any other stack:

Phase 1 — assemble context. const { systemPrompt } = await alma.context.assemble({ query: userMessage, environmentId: projectId });
Phase 2 — call Claude. const stream = anthropic.messages.stream({ model: 'claude-opus-4-7', system: systemPrompt, messages: [{ role: 'user', content: userMessage }] });
Phase 3 — extract memories. After the stream completes, await alma.memories.extract({ text: lastTurn, environmentId: projectId });

Phase 3 runs in the background — the user sees the streamed response immediately, and extraction happens in the next ~1 s without blocking. New memories are deduplicated, contradictions are detected against existing entries, and the store stays clean automatically. Full SDK reference: @olivaresai/alma-sdk; HTTP equivalents in the REST API documentation.

How does the agent handle multi-project isolation?

Use Alma environments: one environment per project. Each environment has its own memories, episodes, procedures and Soul blocks, completely isolated from the others. The agent passes environmentId on every memory call; the API enforces the boundary. Cross-project queries are simply not possible without an explicit environment switch — which is the right default for a PM tool where leaking decisions from project A into project B is a real problem.

For team-wide PM agents (multiple humans interacting with the same agent), use the Alma teams resource: each team has shared memories visible to all members, plus per-user memories for personal preferences. Role-based access controls who can write what.

What does a daily standup turn look like end-to-end?

User message: "standup: backend team, Maria is unblocking the migration today, José is on the rate limiter; we decided to push the GA release to Friday because legal is still reviewing the DPA". The agent's flow:

Assemble context: the assembler returns relevant context — the prior standup episode, the decision memory about the GA release, the stakeholder records for Maria and José, the risk memory about the legal review.
Claude responds: "Updated standup logged. Maria → migration unblocking; José → rate limiter; GA release moved to Friday pending legal review of DPA. Risk #4 (legal review) status remains open with you as owner."
Extractor runs: a new episode summarises the standup, the GA-release decision memory is updated with the new date, the legal-review risk memory's last touched timestamp refreshes (so it doesn't decay), and a procedure memory captures the workflow "when GA slips, log decision + update risk".

Common workflows to keep in mind

Decision archaeology. "Why did we descope the rate limiter?" — the agent retrieves the decision memory plus the surrounding episode and the risk memory it referenced. Returns the answer with citations to the records, so the user can drill into the conversation if needed.

Stakeholder lookup. "Who owns the migration?" — straight memory query against the stakeholder category, returns the record. If the answer is stale (the role changed last week), contradiction detection catches it on the next conversation that mentions the new owner.

Recurring report generation. "Generate a status report for the auth stream this week" — the agent assembles a context window of episodes, decisions and risks tagged for that stream, then drafts the report from that curated slice. This is significantly cheaper and more accurate than asking Claude to summarise raw chat history.

How do I keep the system prompt small while still being grounded?

Default token budgets in the assembler: ~2 K tokens for memories, ~1 K for episodes, ~500 for procedures, ~500 for Soul blocks. Total ~4 K — well under any model's context budget, and the cache hits get amortised across the conversation. If your project is small (<100 active memories), you can lower the budget further. If it's large (10 K+ memories), the assembler stays at ~4 K because retrieval limits the number of records included even when the store is big.

Two things matter operationally: the Soul blocks (the agent's identity) should be cached as a stable system-prompt prefix so repeated calls don't re-pay the input tokens; and the dynamic context (memories + episodes) should sit after the cache breakpoint so each call only re-uploads the changed portion. Anthropic's prompt-caching documentation covers the breakpoint placement.

What does this cost in practice?

On a typical PM team flow (~20 messages/day per user, mostly standups + clarifications), the marginal cost is dominated by the LLM calls themselves. The memory layer adds: one assemble call (a few KB read + retrieval, ~30 ms), one extract call (Haiku, ~$0.001 per turn). Total memory overhead per day per active user: well under a cent. Compare to the value of the PM team not losing decisions — and the math is obvious.

How do I start prototyping?

Alma's Starter plan ($14/mo) is the entry tier and includes the persistent memory layer. Sign up at alma.olivares.ai, generate an API key in Settings, and clone the SDK starter from the developers page. Wire the three-phase loop in your agent code, point it at a single test project, and run it for a week. The store will populate naturally from the conversations; you'll see decisions, stakeholders and risks accumulate without manual data entry. From there it's just cranking up the categories and tuning the assembler.

See plans