What is the easiest way to give AI persistent memory?

Install the Alma MCP server (@olivaresai/alma-mcp) into Claude Desktop, Cursor, Windsurf, or any MCP-compatible client. Setup is about five minutes and requires no code — only adding the package to your client JSON config with your Alma API key.

Do I need an API key to use Alma persistent memory?

Yes. Sign up at alma.olivares.ai, choose a paid plan and generate an API key in Settings. The same key is used by the MCP server, the JavaScript SDK, and the REST API.

Can I use Alma memory with my own custom AI app?

Yes — use the JavaScript SDK (@olivaresai/alma-sdk) for Node.js or call the REST API from any language. Both expose 140+ endpoints covering memory CRUD, semantic and keyword search, context assembly, soul blocks, and chat streaming.

Does Alma work with Claude, ChatGPT, Gemini, or other models?

Alma is model-agnostic when used via the SDK or REST API: assemble context returns a plain string you can pass to any LLM (Anthropic, OpenAI, Google, others). The web app at alma.olivares.ai uses Claude Haiku, Sonnet, and Opus directly.

How to Give AI Persistent Memory

April 2026 · 10 min read · Fran Olivares, Founder of OlivaresAI

There are three ways to give any AI persistent memory: install an MCP server like @olivaresai/alma-mcp into your client config in five minutes — no code; use the JavaScript SDK to fetch context before LLM calls and extract memories after; or call the REST API directly from any language. All three connect to the same Alma memory layer.

Each AI tool keeps its own shallow memory of you, and none of them share it. Switch products and you start from zero: the new assistant has no idea about your name, your project, your preferences. That siloing is the fundamental limitation, and it is the single biggest reason AI feels like a tool instead of a collaborator. This guide walks you through three concrete approaches to solving it, from zero-code setup to full API integration.

Why does AI memory not follow you between tools?

When you use ChatGPT, Claude, or any AI chat, each one keeps its own walled-off memory, and it tends to be shallow. You explain the same things over and over: your tech stack, your coding style, your project architecture, your preferences. This wastes time and produces worse results because the AI never builds a deep understanding of who you are or what you are working on.

Platform-native memory features (ChatGPT Memory, Claude Projects) help, but they are limited in capacity, locked to a single platform, and offer no developer API. If you are building an AI-powered product, you need an independent memory layer.

How do I add memory via the MCP server (no code)?

The Model Context Protocol (MCP) is the fastest path. If your AI runs in Claude Desktop, Cursor, Windsurf, Claude Code, or any MCP-compatible client, you can add persistent memory in under 5 minutes.

Step 1: Sign up at alma.olivares.ai and generate an API key in Settings.

Step 2: Add @olivaresai/alma-mcp to your MCP client config with your API key. For Claude Desktop, edit claude_desktop_config.json. For Cursor, use the MCP settings panel.

Step 3: Restart your client. The server exposes 35 tools: alma_remember (save a memory), alma_recall (search memories), alma_assemble (build context from all memory layers), alma_extract (extract facts from text), and more. Your AI can now read from and write to a persistent memory store that survives across every conversation.

MCP is ideal for personal workflows — Claude Desktop for general AI work, Cursor for coding, Claude Code for terminal-based development. One memory, everywhere.

How do I integrate AI memory using the JavaScript SDK?

The JavaScript SDK (@olivaresai/alma-sdk) gives you full programmatic control for custom applications. The core integration pattern has three steps:

Before the LLM call: Call client.context.assemble({ query }) to get a system prompt enriched with relevant memories, episodes, procedures, and soul blocks.
Pass to any LLM: The assembled context is a plain string. Pass it as the system prompt to Anthropic, OpenAI, Gemini, or any model. Your memory layer is model-agnostic.
After the LLM call: Call client.memories.extract({ text }) to save new facts from the conversation. Or create memories directly with client.memories.create().

The SDK wraps all 140+ API endpoints with full TypeScript types. Install with npm install @olivaresai/alma-sdk. It is ESM-only and requires Node.js 18+.

How do I add AI memory via REST API from any language?

The REST API provides direct HTTP access from any language or platform. Key endpoints:

POST /api/v1/context/assemble — Build a context prompt from memories, episodes, procedures, and soul blocks
POST /api/v1/memories — Create a memory with content, category, importance, and confidence
GET /api/v1/memories/search?q=query&mode=hybrid — Hybrid semantic + keyword search
POST /api/v1/memories/extract — LLM-powered extraction of facts from text
POST /api/v1/blocks — Configure Soul Engine blocks for AI identity

Authentication is via API key (X-API-Key header). Base URL: https://alma.olivares.ai/api/v1.

How does Alma's memory layer actually work?

Alma's three-layer architecture separates knowledge into three types:

Memories — Discrete facts and preferences, semantically indexed with vector embeddings. Each has importance, confidence, category, and source metadata.
Episodes — Compressed conversation summaries. What was discussed, decided, and learned.
Procedures — Learned step-by-step workflows and behavioral patterns.

When you start a conversation, context assembly searches all three layers using hybrid search, scores results by relevance (50%), importance (15%), confidence (15%), recency (10%), and frequency (10%), then injects the top-ranked context into the system prompt — all in under 100ms.

Memories are automatically extracted from conversations every 4 messages. The extractor identifies 0-30 facts per conversation using Claude Haiku. Duplicates are detected via Jaccard similarity (60% threshold) and merged. Stale memories with low importance expire after 120 days of inactivity.

How do I give my AI a consistent identity?

Memory alone gives your AI facts. The Soul Engine gives it identity. Configure structured blocks — personality, expertise, communication style, rules, and context — that persist across every conversation. Unlike a single system prompt that gets diluted in long conversations, Soul Engine blocks are versioned, organized, and always injected with priority.

How do I keep work and personal AI contexts separate?

Environments let you isolate memory contexts. Keep work, personal, and client-specific memories completely separate. Each environment has its own memories, episodes, procedures, and soul blocks. The AI switches personality and knowledge when you switch environments.

How do I start using Alma's persistent memory?

Get started at alma.olivares.ai. The Starter plan ($14/mo) includes unlimited memories on a $2 weekly AI budget, 1 environment, and full chat access. All integration methods — MCP, SDK, API — work on every plan.

For more depth: AI Memory Management: Complete Guide 2026 · Building AI Assistants That Remember Everything · Persistent Memory vs RAG

See plans