April 2026 · 10 min read · Fran Olivares, Founder of OlivaresAI
@olivaresai/alma-mcp into your client config in five minutes — no code; use the JavaScript SDK to fetch context before LLM calls and extract memories after; or call the REST API directly from any language. All three connect to the same Alma memory layer.Each AI tool keeps its own shallow memory of you, and none of them share it. Switch products and you start from zero: the new assistant has no idea about your name, your project, your preferences. That siloing is the fundamental limitation, and it is the single biggest reason AI feels like a tool instead of a collaborator. This guide walks you through three concrete approaches to solving it, from zero-code setup to full API integration.
When you use ChatGPT, Claude, or any AI chat, each one keeps its own walled-off memory, and it tends to be shallow. You explain the same things over and over: your tech stack, your coding style, your project architecture, your preferences. This wastes time and produces worse results because the AI never builds a deep understanding of who you are or what you are working on.
Platform-native memory features (ChatGPT Memory, Claude Projects) help, but they are limited in capacity, locked to a single platform, and offer no developer API. If you are building an AI-powered product, you need an independent memory layer.
The Model Context Protocol (MCP) is the fastest path. If your AI runs in Claude Desktop, Cursor, Windsurf, Claude Code, or any MCP-compatible client, you can add persistent memory in under 5 minutes.
Step 1: Sign up at alma.olivares.ai and generate an API key in Settings.
Step 2: Add @olivaresai/alma-mcp to your MCP client config with your API key. For Claude Desktop, edit claude_desktop_config.json. For Cursor, use the MCP settings panel.
Step 3: Restart your client. The server exposes 35 tools: alma_remember (save a memory), alma_recall (search memories), alma_assemble (build context from all memory layers), alma_extract (extract facts from text), and more. Your AI can now read from and write to a persistent memory store that survives across every conversation.
MCP is ideal for personal workflows — Claude Desktop for general AI work, Cursor for coding, Claude Code for terminal-based development. One memory, everywhere.
The JavaScript SDK (@olivaresai/alma-sdk) gives you full programmatic control for custom applications. The core integration pattern has three steps:
client.context.assemble({ query }) to get a system prompt enriched with relevant memories, episodes, procedures, and soul blocks.client.memories.extract({ text }) to save new facts from the conversation. Or create memories directly with client.memories.create().The SDK wraps all 140+ API endpoints with full TypeScript types. Install with npm install @olivaresai/alma-sdk. It is ESM-only and requires Node.js 18+.
The REST API provides direct HTTP access from any language or platform. Key endpoints:
POST /api/v1/context/assemble — Build a context prompt from memories, episodes, procedures, and soul blocksPOST /api/v1/memories — Create a memory with content, category, importance, and confidenceGET /api/v1/memories/search?q=query&mode=hybrid — Hybrid semantic + keyword searchPOST /api/v1/memories/extract — LLM-powered extraction of facts from textPOST /api/v1/blocks — Configure Soul Engine blocks for AI identityAuthentication is via API key (X-API-Key header). Base URL: https://alma.olivares.ai/api/v1.
Alma's three-layer architecture separates knowledge into three types:
When you start a conversation, context assembly searches all three layers using hybrid search, scores results by relevance (50%), importance (15%), confidence (15%), recency (10%), and frequency (10%), then injects the top-ranked context into the system prompt — all in under 100ms.
Memories are automatically extracted from conversations every 4 messages. The extractor identifies 0-30 facts per conversation using Claude Haiku. Duplicates are detected via Jaccard similarity (60% threshold) and merged. Stale memories with low importance expire after 120 days of inactivity.
Memory alone gives your AI facts. The Soul Engine gives it identity. Configure structured blocks — personality, expertise, communication style, rules, and context — that persist across every conversation. Unlike a single system prompt that gets diluted in long conversations, Soul Engine blocks are versioned, organized, and always injected with priority.
Environments let you isolate memory contexts. Keep work, personal, and client-specific memories completely separate. Each environment has its own memories, episodes, procedures, and soul blocks. The AI switches personality and knowledge when you switch environments.
Get started at alma.olivares.ai. The Starter plan ($14/mo) includes unlimited memories on a $2 weekly AI budget, 1 environment, and full chat access. All integration methods — MCP, SDK, API — work on every plan.
For more depth: AI Memory Management: Complete Guide 2026 · Building AI Assistants That Remember Everything · Persistent Memory vs RAG