Backboard Achieves Highest Score Ever on LoCoMo (90.1%)
Backboard scored 90.1 percent overall accuracy using the standard task set and GPT-4.1 as the LLM judge. Full results, category breakdowns, and latency are available below, along with a one-click script and API so anyone can replicate the run. This is now LIVE in our API so anyone can plug in and start testing.
Read article