Skip to main content

Overview

The file system agent demonstrates ICRL’s training and inference flows using a simulated filesystem environment. The agent learns to navigate directories, read files, and perform operations like ls, cd, cat, find, mkdir, and cp. Successful trajectories are stored and retrieved for similar future tasks.

Source Files

FilePurpose
examples/file_api_env.pyFileSystemEnvironment and Task definitions
examples/tasks.pyTRAINING_TASKS and EVAL_TASKS
examples/demo_with_real_llm.pyFull demo with real LLM calls
tests/test_with_mock.pyMock LLM demo (no API keys)

Run With Mock LLM (No API Keys)

uv run python tests/test_with_mock.py
This demonstrates:
  • Training phase with trajectory accumulation
  • Database persistence across sessions
  • Trajectory retrieval for in-context examples
  • Evaluation on held-out tasks

Run With Real Model

OPENAI_API_KEY=... uv run python examples/demo_with_real_llm.py
Or with Anthropic:
ANTHROPIC_API_KEY=... uv run python examples/demo_with_real_llm.py
Optional model override:
MODEL=gpt-4o-mini uv run python examples/demo_with_real_llm.py
The demo runs in two phases:
  1. Training — The agent completes several tasks. Successful runs are stored.
  2. Evaluation — Held-out tasks are run with retrieval enabled. The agent uses stored trajectories as examples.

Environment Behavior

FileSystemEnvironment accepts command-like actions:
CommandDescription
ls [dir]List directory contents
cd <dir>Change directory
cat <file>Display file contents
find <pattern>Search for files matching pattern
pwdPrint working directory
mkdir <name>Create directory
cp <src> <dst>Copy file
Each task has a verify function that determines success based on the final FileSystemState.