Microsoft Research Releases Memora, Reducing Token Consumption by 98%

2026-07-01 15:00

Favorite

en.Wedoany.com Reported - Microsoft Research has developed a long-term memory system called Memora, designed to provide AI agents with more scalable and reliable memory capabilities by decoupling memory content from retrieval methods.

AI brain on a computer screen

As AI agents need to maintain contextual memory over weeks or months, rather than handling only single sessions, traditional memory methods tend to cause information fragmentation and slower retrieval. Microsoft Research states that Memora, by decoupling memory content from retrieval methods, can reduce contextual token usage by up to 98% while maintaining or surpassing full-context accuracy.

Current long-term AI deployment faces memory system bottlenecks. Modern large language models start from scratch with each session, requiring repeated reading of the entire history for long conversations. New information is stored as raw text or summaries, and key details may be lost.

Existing solutions each have limitations. The Mem0 system extracts atomic facts from conversations; retrieval-augmented generation (RAG) methods index text fragments; and graph-based memory systems (such as Zep and GraphRAG) build structures through entity relationships. However, these methods fall into two extremes: content fragmentation systems (such as RAG and Mem0) retain details but lose narrative coherence; coarse-grained abstraction systems compress experiences but lose constraints and numerical details; and graph-based systems require strict ontologies with retrieval dependent on the content itself.

The Memora architecture addresses these issues by decoupling stored content from retrieval methods. Each memory entry consists of two parts: the primary abstraction is a phrase of 6 to 8 words that captures the basic content of the memory, while the memory value contains the rich content itself. New information on the same topic is merged into existing memory entries to avoid fragmentation. Additionally, the system introduces cue anchors—short, context-aware tags extracted from each memory value—providing alternative access paths to the same memory.

Memora also includes a strategy-guided retriever that, instead of returning the top k similar items at once, iteratively refines queries through cue anchors, presents relevant but dissimilar memories, and decides when to stop. Sanchit Vir Gogia, Chief Analyst at Greyhound Research, stated that Memora rejects the shortcut of equating retrieval with memory, separating the rich details of memory from lookup handles, making retrieval a navigational act.

Microsoft evaluated Memora on two benchmarks: LoCoMo (averaging 600 dialogue turns) and LongMemEval (using 115,000 tokens of context). Test results show that Memora achieves an LLM-judged accuracy of 86.3% on LoCoMo and 87.4% on LongMemEval, outperforming RAG, Mem0, Nemori, Zep, LangMem, and full-context reasoning. Memora stores about half the number of memory entries per conversation (344) compared to Mem0 (651), while reducing token consumption by up to 98% compared to full-context reasoning.

Gogia noted that lower token consumption does not directly equate to lower infrastructure costs. The context reduction in benchmarks does not mean enterprise bills will drop by 98%; actual costs also include memory construction, indexing, storage, and audit logs. Memora's strongest strategy retrieval mode takes about five to six seconds per query, while the simpler semantic mode takes less than one second, with token savings partially offset by retrieval latency and additional reasoning.

Memora is currently an active project at Microsoft Research, with related research code publicly available on GitHub. Gogia advises IT leaders to view Memora as architectural research rather than production-ready software, urging caution until its code is fully verifiable, maintainable, and supportable. Additionally, enterprises need to establish governance and compliance policies to ensure secure management and auditability of AI memory, including defining who can write or read memories, how long memories persist, and how auditors can reconstruct memories, to meet regulatory requirements such as the EU Artificial Intelligence Act and India's Digital Personal Data Protection Act.

America

Information and Communication Artificial Intelligence Engineering

This bulletin is compiled and reposted from information of global Internet and strategic partners, aiming to provide communication for readers. If there is any infringement or other issues, please inform us in time. We will make modifications or deletions accordingly. Unauthorized reproduction of this article is strictly prohibited. Email: news@wedoany.com

Previous：NASA and AWS Achieve 4K Video Livestream of Artemis 2 Lunar Flyby

Next：How Biological Nitrification and Denitrification Reduce Total Nitrogen