Hindsight/Benchmarks
← Back

LoComo Benchmark

Long conversation memory evaluation

Overall

Accuracy

89.61%

Correct

1380 / 1540

Conversations

10

By Category

Multi-hop

86.2%

243 / 282

Single-hop

83.8%

269 / 321

Temporal

70.8%

68 / 96

Open-domain

95.1%

800 / 841

Conversations (10)