diff --git a/LLM-Agent-Testbed-Architecture.md b/LLM-Agent-Testbed-Architecture.md new file mode 100644 index 0000000..68ea524 --- /dev/null +++ b/LLM-Agent-Testbed-Architecture.md @@ -0,0 +1,658 @@ +# LLM Agent Testbed Architecture + +**Status**: Planning +**Parent Issue**: #154 - Grounded Multi-Agent Testbed +**Last Updated**: 2025-12-14 + +## Overview + +This document describes the architecture for running LLM agents in McRogueFace environments. The system serves dual purposes: + +1. **Human-playable game** with traditional roguelike mechanics +2. **Agent testbed** for studying grounded language understanding + +The key insight is that both modes share the same physics layer, but differ in their perception layer. An **ActionLog** bridges this gap, providing structured event data that humans can view as a combat log and agents receive as text context. + +--- + +## Architecture Layers + +``` +┌─────────────────────────────────────────────────────────────┐ +│ PHYSICS LAYER │ +│ Entity interactions: bump, ev_enter, ev_exit │ +│ Deterministic, turn-based, grid-based │ +└─────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ EPISTEMOLOGY LAYER │ +│ ActionLog + SpeechChannel + TurnContext │ +│ "What happened and who perceived it" │ +└─────────────────────────────────────────────────────────────┘ + │ + ┌─────────────────┼─────────────────┐ + ▼ ▼ ▼ +┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ +│ HUMAN CLIENT │ │ AGENT CLIENT │ │ REPLAY CLIENT │ +│ • Renders grid │ │ • Screenshot + │ │ • Step through │ +│ • Keyboard input │ │ TurnContext │ │ ActionLog │ +│ • Optional log UI │ │ • LLM query/parse │ │ • Animate actions │ +└───────────────────┘ └───────────────────┘ └───────────────────┘ +``` + +--- + +## Physics Layer: Entity Interaction Model + +Adapted from Crypt of Sokoban's proven interaction system. + +### Core Events + +| Event | Trigger | Purpose | +|-------|---------|---------| +| `bump(other, dx, dy)` | Entity attempts to enter occupied tile | Collision resolution, combat, interaction | +| `ev_enter(other)` | Entity successfully enters tile | Triggers (pressure plates, pickups) | +| `ev_exit(other)` | Entity leaves tile | State reversal (plate release) | + +### Resolution Order + +The `draw_order` property determines which entity handles `bump` first when multiple occupy a tile: + +| draw_order | Entity Type | Rationale | +|------------|-------------|-----------| +| 10 | Player/Agent | Highest priority - always respond to bumps | +| 7 | Enemies | Combat entities | +| 5 | Items | Can be picked up | +| 2 | Doors | Conditional passage | +| 1 | Floor triggers | Lowest - checked last, allows overlap | + +### Entity Base Class + +```python +class GameEntity: + """Base class for all interactive entities.""" + + draw_order: int = 5 + description: str = "an entity" + detailed_description: str = "You see an entity." + + def bump(self, actor, dx, dy, test=False) -> ActionRecord: + """Called when another entity tries to move into this tile.""" + raise NotImplementedError + + def ev_enter(self, other) -> Optional[ActionRecord]: + """Called when another entity enters this tile.""" + pass + + def ev_exit(self, other) -> Optional[ActionRecord]: + """Called when another entity leaves this tile.""" + pass + + def act(self) -> Optional[ActionRecord]: + """Called on this entity's turn (for NPCs).""" + pass +``` + +--- + +## Epistemology Layer: ActionLog + +The central event bus that records all game actions for perception filtering. + +### ActionRecord + +```python +@dataclass +class ActionRecord: + """A single recorded action in the game world.""" + + turn: int # Game turn number + actor_id: str # Unique entity identifier + actor_description: str # "the knight", "a guard" + action_type: str # "move", "take", "unlock", "attack", "speak" + + # Action details + args: Dict[str, Any] # {"direction": "north", "item": "brass_key"} + target_id: Optional[str] # Entity acted upon + target_description: Optional[str] + + # Outcome + result: str # "success", "blocked", "hit", "miss" + result_message: str # "The knight unlocks the door with the brass key." + + # Spatial info (for perception filtering) + position: Tuple[int, int] # Where it happened + sound_radius: int # How far the sound travels (0 = silent) +``` + +### ActionLog Class + +```python +class ActionLog: + """Central record of all game actions.""" + + def record(self, action: ActionRecord) -> None: + """Record an action.""" + + def get_visible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]: + """Get actions the observer could SEE (in FOV when they happened).""" + + def get_audible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]: + """Get actions the observer could HEAR. + + - In FOV: Full detail ("The guard says 'Halt!'") + - Out of FOV, in range: Vague ("You hear a voice to the east") + - Out of range: Nothing + """ + + def get_turn_summary(self, turn: int) -> List[ActionRecord]: + """Get all actions from a specific turn (for replay).""" +``` + +### SpeechChannel + +Speech is a special action type with FOV-based reception: + +```python +class SpeechChannel: + """Handles agent-to-agent communication.""" + + def speak(self, speaker, message: str, volume: str = "normal") -> ActionRecord: + """ + Broadcast speech from speaker. + + volume: + - "whisper": Adjacent tiles only (radius 1) + - "normal": FOV range (same as sight) + - "shout": Entire room + + Returns ActionRecord with sound_radius set appropriately. + """ + + def get_heard_speech(self, listener, grid, since_turn: int) -> List[SpeechRecord]: + """ + Get speech heard by listener. + + Returns: + - Full text if speaker was in listener's FOV + - "You hear indistinct speech to the {direction}" if out of FOV but in range + """ +``` + +### TurnContext + +What an entity perceives when it's their turn: + +```python +@dataclass +class TurnContext: + """Complete perception state for an entity's turn.""" + + # Identity + actor_id: str + actor_description: str + position: Tuple[int, int] + current_room: str + + # Visual perception (current FOV) + visible_entities: List[EntitySnapshot] + visible_terrain: List[TileSnapshot] + + # Auditory perception (since last turn) + heard_speech: List[SpeechRecord] + heard_sounds: List[str] # "footsteps to the north", "a door creaking" + + # Observed actions (in FOV since last turn) + observed_actions: List[ActionRecord] + + # Available actions + available_actions: List[str] # ["GO NORTH", "TAKE brass_key", "SPEAK '...'"] + + # Inventory + inventory: List[str] + + def to_prose(self) -> str: + """Generate natural language description for LLM context.""" + parts = [] + + # Location + parts.append(f"You are in {self.current_room}.") + + # What you see + if self.visible_entities: + parts.append(self._describe_visible()) + + # What happened (observed actions) + if self.observed_actions: + parts.append("Since your last turn:") + for action in self.observed_actions: + parts.append(f" - {action.result_message}") + + # What you heard + if self.heard_speech: + for speech in self.heard_speech: + if speech.in_fov: + parts.append(f'{speech.speaker} says: "{speech.message}"') + else: + parts.append(f"You hear someone speaking to the {speech.direction}.") + + # Available actions + parts.append(f"Available actions: {', '.join(self.available_actions)}") + + return "\n".join(parts) +``` + +--- + +## Game Entities + +Academically-presentable entity types for agent learning scenarios. + +### KeyEntity + +```python +class KeyEntity(GameEntity): + """A key that can be picked up and used on matching doors.""" + + draw_order = 5 + + def __init__(self, x, y, key_id: str, display_name: str = "a brass key"): + self.key_id = key_id + self.description = display_name + self.detailed_description = f"{display_name}. It might unlock something." + + def bump(self, actor, dx, dy, test=False) -> ActionRecord: + # Actor picks up the key + if hasattr(actor, 'inventory'): + if not test: + actor.inventory.append(self.key_id) + self.die() # Remove from world + return ActionRecord( + action_type="take", + result="success", + result_message=f"{actor.description} picks up {self.description}.", + sound_radius=2 + ) + return ActionRecord( + action_type="take", + result="failure", + result_message=f"{self.description} lies on the ground.", + sound_radius=0 + ) +``` + +### LockedDoorEntity + +```python +class LockedDoorEntity(GameEntity): + """A door that requires a specific key to open.""" + + draw_order = 2 + + def __init__(self, x, y, key_id: str, destination_room: str): + self.key_id = key_id + self.destination_room = destination_room + self.locked = True + self.description = "a locked door" + self.detailed_description = "A sturdy wooden door. It has a brass keyhole." + + def bump(self, actor, dx, dy, test=False) -> ActionRecord: + if self.locked: + # Check if actor has the key + if hasattr(actor, 'inventory') and self.key_id in actor.inventory: + if not test: + self.unlock() + return ActionRecord( + action_type="unlock", + result="success", + result_message=f"{actor.description} unlocks the door.", + sound_radius=5 # Loud click + ) + else: + return ActionRecord( + action_type="open", + result="blocked", + result_message="The door is locked.", + sound_radius=1 + ) + else: + # Door is open, allow passage + if not test: + actor.do_move(actor.x + dx, actor.y + dy) + return ActionRecord( + action_type="move", + result="success", + result_message=f"{actor.description} passes through the doorway.", + sound_radius=2 + ) + + def unlock(self): + self.locked = False + self.sprite_number = OPEN_DOOR_SPRITE + self.description = "an open doorway" +``` + +### GuardEntity + +```python +class GuardEntity(GameEntity): + """An NPC that patrols a fixed route and reacts to intruders.""" + + draw_order = 7 + + def __init__(self, x, y, patrol_route: List[Tuple[int, int]], + behavior: str = "patrol"): + self.patrol_route = patrol_route + self.patrol_index = 0 + self.behavior = behavior # "patrol", "stationary", "chase" + self.alert = False + self.description = "a guard" + self.detailed_description = "An armored guard. They look vigilant." + self.sight_range = 6 + + def act(self) -> Optional[ActionRecord]: + """Called on guard's turn.""" + + if self.behavior == "patrol": + return self._patrol_step() + elif self.behavior == "chase" and self.target: + return self._chase_step() + + return ActionRecord( + action_type="wait", + result="success", + result_message=f"{self.description} stands watch.", + sound_radius=0 + ) + + def _patrol_step(self) -> ActionRecord: + """Move to next point on patrol route.""" + next_pos = self.patrol_route[self.patrol_index] + self.patrol_index = (self.patrol_index + 1) % len(self.patrol_route) + + # Move toward next patrol point + dx = sign(next_pos[0] - self.x) + dy = sign(next_pos[1] - self.y) + + if self.try_move(dx, dy): + return ActionRecord( + action_type="move", + result="success", + result_message=f"{self.description} continues their patrol.", + sound_radius=3 # Footsteps + ) + + def check_fov_for_intruders(self, entities, grid) -> Optional[ActionRecord]: + """Check if any player/agent is visible.""" + for entity in entities: + if isinstance(entity, PlayerEntity) and grid.is_in_fov(entity.x, entity.y): + self.alert = True + self.target = entity + self.behavior = "chase" + return ActionRecord( + action_type="speak", + result="success", + result_message=f'{self.description} shouts: "Halt! Intruder!"', + sound_radius=10 # Shout carries far + ) + return None +``` + +### CombatantEntity + +```python +class CombatantEntity(GameEntity): + """Basic enemy that engages in melee combat.""" + + draw_order = 7 + + def __init__(self, x, y, hp: int = 3, damage: int = 1, + description: str = "a hostile creature"): + self.hp = hp + self.max_hp = hp + self.damage = damage + self.description = description + + def bump(self, actor, dx, dy, test=False) -> ActionRecord: + """Handle being bumped (attacked).""" + + if self.hp <= 0: + # Dead - allow walking over + if not test: + actor.do_move(self.x, self.y) + return ActionRecord( + action_type="move", + result="success", + result_message=f"{actor.description} steps over the fallen {self.description}.", + sound_radius=1 + ) + + # Combat! + if hasattr(actor, 'damage'): + damage_dealt = actor.damage + if not test: + self.hp -= damage_dealt + + if self.hp <= 0: + return ActionRecord( + action_type="attack", + result="kill", + result_message=f"{actor.description} defeats {self.description}!", + sound_radius=5 + ) + else: + return ActionRecord( + action_type="attack", + result="hit", + result_message=f"{actor.description} strikes {self.description}.", + sound_radius=4 + ) + + return ActionRecord( + action_type="bump", + result="blocked", + result_message=f"{self.description} blocks the path.", + sound_radius=1 + ) +``` + +--- + +## Agent Learning Scenarios + +Progressive scenarios for evaluating agent capabilities. + +### Scenario 1: Key Hunt + +**Setup:** +- Agent spawns in Room A +- Key in Room B (visible from A through doorway) +- Locked door to Goal Room C + +**Learning Objectives:** +- Object recognition ("I see a key") +- Cause-effect reasoning ("key unlocks door") +- Sequential planning (go to B → get key → return to door → unlock → enter C) + +**Success Metric:** Agent reaches Goal Room C + +--- + +### Scenario 2: Guard Patrol + +**Setup:** +- Single guard on 4-point rectangular patrol +- Agent must cross patrol path to reach goal +- No combat option - must avoid detection + +**Learning Objectives:** +- Behavior observation ("guard moves clockwise") +- Prediction ("guard will be at X in 2 turns") +- Timing ("I should move now while guard faces away") + +**Success Metric:** Agent reaches goal without triggering alert + +--- + +### Scenario 3: Cooperative Unlock + +**Setup:** +- Agent A in Room 1 with key +- Agent B in Room 2 near locked door +- Door blocks Agent B's goal + +**Learning Objectives:** +- Situation communication ("I have the key") +- Coordination ("Wait, I'm coming to unlock it") +- Theory of mind ("Agent B needs the door opened") + +**Success Metric:** Both agents reach their respective goals + +--- + +### Scenario 4: Combat Decision + +**Setup:** +- Weak enemy blocks direct path to goal +- Alternative route available (longer, safe) +- Agent has limited HP + +**Learning Objectives:** +- Risk assessment ("Can I win this fight?") +- Resource management ("Is the shortcut worth the HP?") +- Alternative planning ("I could go around instead") + +**Success Metric:** Agent reaches goal (either path) + +--- + +## Development Phases + +### Phase 1: ActionLog Foundation + +**Goal:** Instrument entity interactions without breaking existing CoS gameplay. + +**Tasks:** +1. Create `ActionRecord` dataclass +2. Create `ActionLog` class with basic recording +3. Modify `COSEntity.bump()` to return `ActionRecord` +4. Add `ActionRecord` returns to `ev_enter()`, `ev_exit()` +5. Wire `ActionLog` into game loop + +**Validation:** CoS plays normally; ActionLog captures all events + +--- + +### Phase 2: TurnContext & Perception + +**Goal:** Generate per-entity perception from ActionLog. + +**Tasks:** +1. Create `TurnContext` dataclass +2. Implement FOV-filtered action retrieval +3. Implement `to_prose()` for LLM text generation +4. Add `SpeechChannel` with FOV-based reception +5. Create `TurnOrchestrator` that generates TurnContext per turn + +**Validation:** Print TurnContext.to_prose() alongside gameplay; verify accuracy + +--- + +### Phase 3: Clean Entity Set + +**Goal:** Implement academically-presentable entities. + +**Tasks:** +1. `KeyEntity` - takeable item +2. `LockedDoorEntity` - conditional passage +3. `GuardEntity` - patrol behavior with FOV detection +4. `CombatantEntity` - basic melee combat +5. Level loader for scenario definitions + +**Validation:** Human can play through all 4 scenarios + +--- + +### Phase 4: Agent Integration + +**Goal:** Connect LLM agents to the game world. + +**Tasks:** +1. Adapt existing `TurnOrchestrator` from vllm_demo +2. Connect `TurnContext.to_prose()` to LLM prompts +3. Parse LLM responses to `ActionRecord` +4. Execute actions through entity system +5. Screenshot + text context packaging + +**Validation:** Single agent completes Scenario 1 (Key Hunt) + +--- + +### Phase 5: Multi-Agent & Speech + +**Goal:** Enable agent-to-agent communication. + +**Tasks:** +1. Multi-agent turn sequencing +2. Speech action execution via `SpeechChannel` +3. Speech reception in `TurnContext` +4. Test Scenario 3 (Cooperative Unlock) + +**Validation:** Two agents coordinate via speech to solve puzzle + +--- + +### Phase 6: Evaluation & Analysis + +**Goal:** Systematic evaluation of agent capabilities. + +**Tasks:** +1. Simulation logging (full ActionLog export) +2. Success/failure metrics per scenario +3. Behavior analysis tools +4. Comparison across LLM models + +**Deliverable:** Paper-ready evaluation results + +--- + +## File Structure + +``` +tests/ +├── vllm_demo/ # Existing demo code +│ ├── action_parser.py # LLM response parsing +│ ├── action_executor.py # → Refactor to use ActionLog +│ ├── world_graph.py # → Keep for room descriptions +│ ├── turn_orchestrator.py # → Enhance with TurnContext +│ └── scenarios/ # New: scenario definitions +│ ├── key_hunt.py +│ ├── guard_patrol.py +│ ├── cooperative_unlock.py +│ └── combat_decision.py +│ +src/scripts/ +├── cos_entities.py # → Add ActionRecord returns +├── game_entities.py # New: clean academic entities +├── action_log.py # New: ActionLog system +├── speech_channel.py # New: agent communication +└── turn_context.py # New: perception generation +``` + +--- + +## Related Issues + +- #154 - Grounded Multi-Agent Testbed (parent) +- #155 - Deterministic Text Descriptions (closed) +- #156 - Turn-based LLM Agent Orchestration (in progress) +- #153 - Separate render loop from game state loop (complete) + +--- + +## References + +- Crypt of Sokoban entity system: `src/scripts/cos_entities.py` +- Existing VLLM demos: `tests/vllm_demo/` +- FOV/Perspective system: Issue #154 comment (2025-12-01) \ No newline at end of file