LoComo Benchmark
Long conversation memory evaluation
Overall
Accuracy
89.61%
Correct
1380 / 1540
Conversations
10
By Category
Multi-hop
86.2%
243 / 282
Single-hop
83.8%
269 / 321
Temporal
70.8%
68 / 96
Open-domain
95.1%
800 / 841
Conversations (10)
Long conversation memory evaluation
Overall
Accuracy
89.61%
Correct
1380 / 1540
Conversations
10
By Category
Multi-hop
86.2%
243 / 282
Single-hop
83.8%
269 / 321
Temporal
70.8%
68 / 96
Open-domain
95.1%
800 / 841
Conversations (10)