Building a PM Agent with Claude API and Persistent Memory

May 2026 · 11 min read · Fran Olivares, Founder of OlivaresAI

A project-management agent built on the Claude API and Alma persistent memory tracks stakeholders, decisions, SLAs and standup notes across days and weeks without losing context. The architecture has four parts: a Claude conversation loop, a memory store keyed by project entity, an extractor that pulls structured records from each chat, and a context assembler that injects the right slice into every prompt. Persistent memory is what turns Claude from a smart drafting tool into an agent that remembers what the team decided last Tuesday.

Project management is mostly memory work. Who owns this stream? What did we agree about the migration window? Why did we descope the rate limiter? When does the legal review block the release? An agent that has to re-ask these questions every morning isn't an agent — it's a slightly faster intern. The way you change that is by giving the model a persistent memory layer it can read and write between turns, populated automatically from the conversation. This guide walks through the reference architecture and the integration code, using the Anthropic Claude API as the LLM and Alma's REST API as the memory layer.

Why does a PM agent specifically need persistent memory?

Three structural reasons. First, the entities a PM tracks (people, decisions, deliverables, SLAs, risks) are themselves long-lived — they outlive any single conversation by definition. Second, the conversational style is high-frequency and low-effort: short standup messages, quick clarifications, "what did we say about X?" questions. Loading the right slice of context cheaply matters. Third, the cost of forgetting is high: a missed decision becomes a missed release, a forgotten dependency becomes a blocker.

A stateless Claude conversation handles a single planning session well. As soon as the user wants continuity ("yesterday we agreed…", "what's blocking the auth team this week?"), the conversation has to either replay full history into the context window (expensive, eventually impossible) or rely on a memory layer outside the model.

What does the reference architecture look like?

Four moving parts:

How do I structure memory categories for a PM agent?

Five categories cover most teams: stakeholder (people with role + responsibilities), decision (what was agreed, when, by whom, with the rationale), sla (commitments to other teams or customers), risk (open issues with owner + mitigation), milestone (target date + scope + status). Each memory carries an importance score so the assembler can prioritise high-stakes items in retrieval.

The category isn't just for organisation — it's part of the retrieval signal. When the user asks "what did we decide?", the assembler weighs decision-category memories higher. When they ask "who's blocked?", risk-category memories rank up. The Alma context assembly exposes per-category boost weights for exactly this use case.

What's the integration loop in actual code?

Three phases per user message. The pseudo-code below uses Node.js with the Alma SDK and the Anthropic SDK, but the same shape works in Python or any other stack:

Phase 3 runs in the background — the user sees the streamed response immediately, and extraction happens in the next ~1 s without blocking. New memories are deduplicated, contradictions are detected against existing entries, and the store stays clean automatically. Full SDK reference: @olivaresai/alma-sdk; HTTP equivalents in the REST API documentation.

How does the agent handle multi-project isolation?

Use Alma environments: one environment per project. Each environment has its own memories, episodes, procedures and Soul blocks, completely isolated from the others. The agent passes environmentId on every memory call; the API enforces the boundary. Cross-project queries are simply not possible without an explicit environment switch — which is the right default for a PM tool where leaking decisions from project A into project B is a real problem.

For team-wide PM agents (multiple humans interacting with the same agent), use the Alma teams resource: each team has shared memories visible to all members, plus per-user memories for personal preferences. Role-based access controls who can write what.

What does a daily standup turn look like end-to-end?

User message: "standup: backend team, Maria is unblocking the migration today, José is on the rate limiter; we decided to push the GA release to Friday because legal is still reviewing the DPA". The agent's flow:

Common workflows to keep in mind

Decision archaeology. "Why did we descope the rate limiter?" — the agent retrieves the decision memory plus the surrounding episode and the risk memory it referenced. Returns the answer with citations to the records, so the user can drill into the conversation if needed.

Stakeholder lookup. "Who owns the migration?" — straight memory query against the stakeholder category, returns the record. If the answer is stale (the role changed last week), contradiction detection catches it on the next conversation that mentions the new owner.

Recurring report generation. "Generate a status report for the auth stream this week" — the agent assembles a context window of episodes, decisions and risks tagged for that stream, then drafts the report from that curated slice. This is significantly cheaper and more accurate than asking Claude to summarise raw chat history.

How do I keep the system prompt small while still being grounded?

Default token budgets in the assembler: ~2 K tokens for memories, ~1 K for episodes, ~500 for procedures, ~500 for Soul blocks. Total ~4 K — well under any model's context budget, and the cache hits get amortised across the conversation. If your project is small (<100 active memories), you can lower the budget further. If it's large (10 K+ memories), the assembler stays at ~4 K because retrieval limits the number of records included even when the store is big.

Two things matter operationally: the Soul blocks (the agent's identity) should be cached as a stable system-prompt prefix so repeated calls don't re-pay the input tokens; and the dynamic context (memories + episodes) should sit after the cache breakpoint so each call only re-uploads the changed portion. Anthropic's prompt-caching documentation covers the breakpoint placement.

What does this cost in practice?

On a typical PM team flow (~20 messages/day per user, mostly standups + clarifications), the marginal cost is dominated by the LLM calls themselves. The memory layer adds: one assemble call (a few KB read + retrieval, ~30 ms), one extract call (Haiku, ~$0.001 per turn). Total memory overhead per day per active user: well under a cent. Compare to the value of the PM team not losing decisions — and the math is obvious.

How do I start prototyping?

Alma's Starter plan ($14/mo) is the entry tier and includes the persistent memory layer. Sign up at alma.olivares.ai, generate an API key in Settings, and clone the SDK starter from the developers page. Wire the three-phase loop in your agent code, point it at a single test project, and run it for a week. The store will populate naturally from the conversations; you'll see decisions, stakeholders and risks accumulate without manual data entry. From there it's just cranking up the categories and tuning the assembler.

Related reading: Persistent Memory for AI: Complete 2026 Guide · How to Give AI Persistent Memory · Three-Layer Memory Architecture · Context Assembly Documentation · Environments.

See plans