Evaluations
Evaluations
Memory systems should be evaluated by the behavior they improve. Memory Layer includes a repeatable evaluation harness for testing whether memory changes agent outcomes, retrieval quality, cost, and latency.
What the eval harness protects against
- Overclaiming from a demo.
- Confusing retrieval success with autonomous coding success.
- Ignoring token and latency cost.
- Treating stale or wrong memories as harmless.
Next
Read Ablation tests and Run evaluations.