Skip to main content

Overview

The preference learning demo shows ICRL’s ability to learn and adapt to individual user preferences, communication styles, and working patterns over time. Vanilla LLMs treat every user the same; ICRL can match verbosity, format, depth, and tone to each user’s past approved interactions.

The Problem

Every user has different preferences:
  • Verbosity: Some want detailed explanations, others want terse commands
  • Format: Some prefer bullet points, others want prose
  • Depth: Some want “just tell me what to do”, others want “explain why”
  • Tone: Some prefer formal, others casual
Vanilla LLMs treat every user the same. You end up repeatedly saying “Be more concise” or “Give me the command, not the explanation”.

How ICRL Solves This

ICRL stores successful interactions that reflect your preferences. Over time:
  1. Learns your style from past interactions you approved
  2. Retrieves preference-matched examples for similar requests
  3. Adapts responses automatically without re-prompting

Demo Structure

examples/preference_learning_demo
README.md
setup_demo.py
run_demo.py
user_profiles
expert_terse.json
learner_detailed.json
manager_summary.json
scenarios
seed_interactions.json
test_requests.json

User Profiles

ProfileStyleWantsExample
Expert TerseMinimal explanationCommands, code snippets”How do I rebase?” yields git rebase -i HEAD~3
Learner DetailedThorough explanationsWhy things work, pitfalls, examplesFull explanation with diagrams
Manager SummaryHigh-level overviewImpact, timeline, risks, decisions”Rebasing reorganizes commit history…”

Running the Demo

From the project root:
cd examples/preference_learning_demo

# 1. Setup — seeds trajectories for each user profile
uv run python setup_demo.py

# 2. Run the comparison test
uv run python run_demo.py

# 3. View detailed evaluation
uv run python evaluate_responses.py

Expected Results

User ProfileICRL ScoreVanilla ScoreImprovement
Expert Terse~90%~40%+50%
Learner Detailed~85%~50%+35%
Manager Summary~85%~30%+55%

Prerequisites

  • OPENAI_API_KEY or ANTHROPIC_API_KEY set
  • uv run from project root, or python with PYTHONPATH including src/

Key Insight

This demo proves ICRL’s value for personalization where:
  • One size doesn’t fit all
  • User preferences are implicit in past interactions
  • The same question should have different “right” answers for different users