A context window is the working memory of one conversation: everything the model can "see" right now. It is large but temporary, and it resets when the conversation ends. Persistent memory sits outside the conversation. It captures the durable facts worth keeping and re-injects only the relevant ones into each new session, so the model behaves as if it remembers you without re-reading the entire history every time.
Retrieval-augmented generation (RAG) retrieves passages from documents you supply — a knowledge base, a set of files — and grounds answers in them. Persistent memory is about you: it captures and structures what you tell the AI over time (preferences, decisions, ongoing projects) rather than indexing a document corpus. The two are complementary, and many systems use both.
Facts ("I use TypeScript, not Java"), preferences ("answer concisely"), decisions ("we chose Postgres over MySQL") and recurring patterns. In Alma this is organised into three layers — memories (discrete facts), episodes (conversation summaries) and procedures (learned workflows) — each scored and retrieved by relevance so the right context surfaces at the right time.
Alma is a persistent memory layer wrapped in a full workspace: you chat, it remembers, and the same memory is reachable from Claude Desktop, Cursor and VSCode over MCP. You can export it any time. It is the memory your AI was missing, not locked inside one provider.