i8D40hHwGOI6QFlpFe7ENx7uo.jpg.png
  • png

Backboard Achieves Highest Score Ever on LoCoMo (90.1%)

New stateful memory architecture. Standard protocol. Fully reproducible.

A funny thing happened on the way to baseline our novel AI Memory architecture by using the industry standard benchmark, LoCoMo: We broke the record. And we did it with no gaming, no adjustments, just pure, reproducible execution!

Backboard scored 90.1 percent overall accuracy using the standard task set and GPT-4.1 as the LLM judge. Full results, category breakdowns, and latency are available below, along with a one-click script and API so anyone can replicate the run. This is now LIVE in our API so anyone can plug in and start testing.

Full Result set with replication scripts here: https://github.com/Backboard-io/Backboard-Locomo-Benchmark

About the Benchmark

LoCoMo was designed to test memory across many sessions, long dialogues, and time-dependent questions. It is widely used to evaluate whether systems truly remember and reason over long horizons. snap-research.github.io+2arxiv.org+2

How we compare

Recent public writeups place leading memory libraries around 67 to 69 percent on LoCoMo, and a simple Letta filesystem baseline around 74 percent. Backboard’s 90.1 percent suggests a material step forward for long-term conversational memory. We will maintain a live comparison table on our results page.

CleanShot 2025-11-13 at 11.05.51@2x.png
  • png

Best in Class in Every Measure

CleanShot 2025-11-13 at 11.06.28@2x.png
  • png

Reproducibility and transparency

  • Same dataset and task set as LoCoMo
  • GPT-4.1 LLM judge with fixed prompts and seed
  • Logs, prompts, and verdicts published for every question

Run it yourself in minutes using our public script or by calling the evaluation API.

If memory is the foundation of intelligence, transparency must be the foundation of benchmarks.

Get started

Build with Backboard today. Sign up takes under a minute.

References

LoCoMo benchmark overview and paper. snap-research.github.io+2arxiv.org+2

About Backboard.io

THE PRIMITIVE FOR THE MODERN AI STACK

Backboard.io is the foundational infrastructure layer for production AI — the primitive your entire stack is built on. While others bolt memory on as an afterthought, we built everything on top of the world's best AI memory, so every part of your stack is stateful, context-aware, and ready to scale from day one.

One integration gives you persistent memory, model routing across 17,000+ LLMs, multi-agent coordination, RAG workflows, long-term context retrieval, and tool calls — without stitching together a dozen fragile services. Bring your own API key and your entire existing stack becomes instantly stateful. No vendor lock-in. No rearchitecting. Just a single, composable primitive that grows with you from weekend prototype to enterprise deployment.

Ranked #1 on LoCoMo and LongMemEval, Backboard.io is the only provider capable of running benchmarks at the message level. Our scores aren't gamed — they're based on the academic prescription and independently third-party verified, so you can trust the foundation you're building on.

The best AI applications of the next decade won't be remembered for their models. They'll be remembered for their architecture. Build yours on Backboard.io.



Receive exclusive news

Are you a journalist or do you work for a publication?
Sign up and request access to exclusive news.

Request access

Receive Backboard.io news on your RSS reader.

Or subscribe through Atom URL manually