Core Loop
ICRL improves an agent by turning successful episodes into reusable context. High-level cycle:- Attempt task in environment.
- Store successful trajectory.
- Retrieve similar prior steps on next tasks.
- Reuse those examples in prompts.
- Update utility feedback and curate low-value trajectories.
Training vs Inference
train(...): successful runs can be stored (optionally gated by verification).run(...): database is read-only for that episode.
Retrieval Granularity
Both implementations use step-level retrieval in the ReAct loop:- Plan retrieval:
retrieve_for_plan(goal) - Step retrieval:
retrieve_for_step(goal, plan, observation)
Curation Signals
Utility is not only “did retrieval lead to success.” Python also tracks deferred validation metadata (for code-change persistence and supersession), then combines signals into utility scoring.Why This Works
- Relevant examples reduce planning errors.
- Step-level examples improve local decisions.
- Curation limits drift from stale or low-signal trajectories.
- Over time, average trajectory quality improves without hand-written few-shot sets.

