Add LLM Agent Testbed Architecture

John McCardle 2025-12-16 21:28:11 +00:00
commit 2c9d9cd341

@ -0,0 +1,658 @@
# LLM Agent Testbed Architecture
**Status**: Planning
**Parent Issue**: #154 - Grounded Multi-Agent Testbed
**Last Updated**: 2025-12-14
## Overview
This document describes the architecture for running LLM agents in McRogueFace environments. The system serves dual purposes:
1. **Human-playable game** with traditional roguelike mechanics
2. **Agent testbed** for studying grounded language understanding
The key insight is that both modes share the same physics layer, but differ in their perception layer. An **ActionLog** bridges this gap, providing structured event data that humans can view as a combat log and agents receive as text context.
---
## Architecture Layers
```
┌─────────────────────────────────────────────────────────────┐
│ PHYSICS LAYER │
│ Entity interactions: bump, ev_enter, ev_exit │
│ Deterministic, turn-based, grid-based │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ EPISTEMOLOGY LAYER │
│ ActionLog + SpeechChannel + TurnContext │
│ "What happened and who perceived it" │
└─────────────────────────────────────────────────────────────┘
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ HUMAN CLIENT │ │ AGENT CLIENT │ │ REPLAY CLIENT │
│ • Renders grid │ │ • Screenshot + │ │ • Step through │
│ • Keyboard input │ │ TurnContext │ │ ActionLog │
│ • Optional log UI │ │ • LLM query/parse │ │ • Animate actions │
└───────────────────┘ └───────────────────┘ └───────────────────┘
```
---
## Physics Layer: Entity Interaction Model
Adapted from Crypt of Sokoban's proven interaction system.
### Core Events
| Event | Trigger | Purpose |
|-------|---------|---------|
| `bump(other, dx, dy)` | Entity attempts to enter occupied tile | Collision resolution, combat, interaction |
| `ev_enter(other)` | Entity successfully enters tile | Triggers (pressure plates, pickups) |
| `ev_exit(other)` | Entity leaves tile | State reversal (plate release) |
### Resolution Order
The `draw_order` property determines which entity handles `bump` first when multiple occupy a tile:
| draw_order | Entity Type | Rationale |
|------------|-------------|-----------|
| 10 | Player/Agent | Highest priority - always respond to bumps |
| 7 | Enemies | Combat entities |
| 5 | Items | Can be picked up |
| 2 | Doors | Conditional passage |
| 1 | Floor triggers | Lowest - checked last, allows overlap |
### Entity Base Class
```python
class GameEntity:
"""Base class for all interactive entities."""
draw_order: int = 5
description: str = "an entity"
detailed_description: str = "You see an entity."
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
"""Called when another entity tries to move into this tile."""
raise NotImplementedError
def ev_enter(self, other) -> Optional[ActionRecord]:
"""Called when another entity enters this tile."""
pass
def ev_exit(self, other) -> Optional[ActionRecord]:
"""Called when another entity leaves this tile."""
pass
def act(self) -> Optional[ActionRecord]:
"""Called on this entity's turn (for NPCs)."""
pass
```
---
## Epistemology Layer: ActionLog
The central event bus that records all game actions for perception filtering.
### ActionRecord
```python
@dataclass
class ActionRecord:
"""A single recorded action in the game world."""
turn: int # Game turn number
actor_id: str # Unique entity identifier
actor_description: str # "the knight", "a guard"
action_type: str # "move", "take", "unlock", "attack", "speak"
# Action details
args: Dict[str, Any] # {"direction": "north", "item": "brass_key"}
target_id: Optional[str] # Entity acted upon
target_description: Optional[str]
# Outcome
result: str # "success", "blocked", "hit", "miss"
result_message: str # "The knight unlocks the door with the brass key."
# Spatial info (for perception filtering)
position: Tuple[int, int] # Where it happened
sound_radius: int # How far the sound travels (0 = silent)
```
### ActionLog Class
```python
class ActionLog:
"""Central record of all game actions."""
def record(self, action: ActionRecord) -> None:
"""Record an action."""
def get_visible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
"""Get actions the observer could SEE (in FOV when they happened)."""
def get_audible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
"""Get actions the observer could HEAR.
- In FOV: Full detail ("The guard says 'Halt!'")
- Out of FOV, in range: Vague ("You hear a voice to the east")
- Out of range: Nothing
"""
def get_turn_summary(self, turn: int) -> List[ActionRecord]:
"""Get all actions from a specific turn (for replay)."""
```
### SpeechChannel
Speech is a special action type with FOV-based reception:
```python
class SpeechChannel:
"""Handles agent-to-agent communication."""
def speak(self, speaker, message: str, volume: str = "normal") -> ActionRecord:
"""
Broadcast speech from speaker.
volume:
- "whisper": Adjacent tiles only (radius 1)
- "normal": FOV range (same as sight)
- "shout": Entire room
Returns ActionRecord with sound_radius set appropriately.
"""
def get_heard_speech(self, listener, grid, since_turn: int) -> List[SpeechRecord]:
"""
Get speech heard by listener.
Returns:
- Full text if speaker was in listener's FOV
- "You hear indistinct speech to the {direction}" if out of FOV but in range
"""
```
### TurnContext
What an entity perceives when it's their turn:
```python
@dataclass
class TurnContext:
"""Complete perception state for an entity's turn."""
# Identity
actor_id: str
actor_description: str
position: Tuple[int, int]
current_room: str
# Visual perception (current FOV)
visible_entities: List[EntitySnapshot]
visible_terrain: List[TileSnapshot]
# Auditory perception (since last turn)
heard_speech: List[SpeechRecord]
heard_sounds: List[str] # "footsteps to the north", "a door creaking"
# Observed actions (in FOV since last turn)
observed_actions: List[ActionRecord]
# Available actions
available_actions: List[str] # ["GO NORTH", "TAKE brass_key", "SPEAK '...'"]
# Inventory
inventory: List[str]
def to_prose(self) -> str:
"""Generate natural language description for LLM context."""
parts = []
# Location
parts.append(f"You are in {self.current_room}.")
# What you see
if self.visible_entities:
parts.append(self._describe_visible())
# What happened (observed actions)
if self.observed_actions:
parts.append("Since your last turn:")
for action in self.observed_actions:
parts.append(f" - {action.result_message}")
# What you heard
if self.heard_speech:
for speech in self.heard_speech:
if speech.in_fov:
parts.append(f'{speech.speaker} says: "{speech.message}"')
else:
parts.append(f"You hear someone speaking to the {speech.direction}.")
# Available actions
parts.append(f"Available actions: {', '.join(self.available_actions)}")
return "\n".join(parts)
```
---
## Game Entities
Academically-presentable entity types for agent learning scenarios.
### KeyEntity
```python
class KeyEntity(GameEntity):
"""A key that can be picked up and used on matching doors."""
draw_order = 5
def __init__(self, x, y, key_id: str, display_name: str = "a brass key"):
self.key_id = key_id
self.description = display_name
self.detailed_description = f"{display_name}. It might unlock something."
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
# Actor picks up the key
if hasattr(actor, 'inventory'):
if not test:
actor.inventory.append(self.key_id)
self.die() # Remove from world
return ActionRecord(
action_type="take",
result="success",
result_message=f"{actor.description} picks up {self.description}.",
sound_radius=2
)
return ActionRecord(
action_type="take",
result="failure",
result_message=f"{self.description} lies on the ground.",
sound_radius=0
)
```
### LockedDoorEntity
```python
class LockedDoorEntity(GameEntity):
"""A door that requires a specific key to open."""
draw_order = 2
def __init__(self, x, y, key_id: str, destination_room: str):
self.key_id = key_id
self.destination_room = destination_room
self.locked = True
self.description = "a locked door"
self.detailed_description = "A sturdy wooden door. It has a brass keyhole."
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
if self.locked:
# Check if actor has the key
if hasattr(actor, 'inventory') and self.key_id in actor.inventory:
if not test:
self.unlock()
return ActionRecord(
action_type="unlock",
result="success",
result_message=f"{actor.description} unlocks the door.",
sound_radius=5 # Loud click
)
else:
return ActionRecord(
action_type="open",
result="blocked",
result_message="The door is locked.",
sound_radius=1
)
else:
# Door is open, allow passage
if not test:
actor.do_move(actor.x + dx, actor.y + dy)
return ActionRecord(
action_type="move",
result="success",
result_message=f"{actor.description} passes through the doorway.",
sound_radius=2
)
def unlock(self):
self.locked = False
self.sprite_number = OPEN_DOOR_SPRITE
self.description = "an open doorway"
```
### GuardEntity
```python
class GuardEntity(GameEntity):
"""An NPC that patrols a fixed route and reacts to intruders."""
draw_order = 7
def __init__(self, x, y, patrol_route: List[Tuple[int, int]],
behavior: str = "patrol"):
self.patrol_route = patrol_route
self.patrol_index = 0
self.behavior = behavior # "patrol", "stationary", "chase"
self.alert = False
self.description = "a guard"
self.detailed_description = "An armored guard. They look vigilant."
self.sight_range = 6
def act(self) -> Optional[ActionRecord]:
"""Called on guard's turn."""
if self.behavior == "patrol":
return self._patrol_step()
elif self.behavior == "chase" and self.target:
return self._chase_step()
return ActionRecord(
action_type="wait",
result="success",
result_message=f"{self.description} stands watch.",
sound_radius=0
)
def _patrol_step(self) -> ActionRecord:
"""Move to next point on patrol route."""
next_pos = self.patrol_route[self.patrol_index]
self.patrol_index = (self.patrol_index + 1) % len(self.patrol_route)
# Move toward next patrol point
dx = sign(next_pos[0] - self.x)
dy = sign(next_pos[1] - self.y)
if self.try_move(dx, dy):
return ActionRecord(
action_type="move",
result="success",
result_message=f"{self.description} continues their patrol.",
sound_radius=3 # Footsteps
)
def check_fov_for_intruders(self, entities, grid) -> Optional[ActionRecord]:
"""Check if any player/agent is visible."""
for entity in entities:
if isinstance(entity, PlayerEntity) and grid.is_in_fov(entity.x, entity.y):
self.alert = True
self.target = entity
self.behavior = "chase"
return ActionRecord(
action_type="speak",
result="success",
result_message=f'{self.description} shouts: "Halt! Intruder!"',
sound_radius=10 # Shout carries far
)
return None
```
### CombatantEntity
```python
class CombatantEntity(GameEntity):
"""Basic enemy that engages in melee combat."""
draw_order = 7
def __init__(self, x, y, hp: int = 3, damage: int = 1,
description: str = "a hostile creature"):
self.hp = hp
self.max_hp = hp
self.damage = damage
self.description = description
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
"""Handle being bumped (attacked)."""
if self.hp <= 0:
# Dead - allow walking over
if not test:
actor.do_move(self.x, self.y)
return ActionRecord(
action_type="move",
result="success",
result_message=f"{actor.description} steps over the fallen {self.description}.",
sound_radius=1
)
# Combat!
if hasattr(actor, 'damage'):
damage_dealt = actor.damage
if not test:
self.hp -= damage_dealt
if self.hp <= 0:
return ActionRecord(
action_type="attack",
result="kill",
result_message=f"{actor.description} defeats {self.description}!",
sound_radius=5
)
else:
return ActionRecord(
action_type="attack",
result="hit",
result_message=f"{actor.description} strikes {self.description}.",
sound_radius=4
)
return ActionRecord(
action_type="bump",
result="blocked",
result_message=f"{self.description} blocks the path.",
sound_radius=1
)
```
---
## Agent Learning Scenarios
Progressive scenarios for evaluating agent capabilities.
### Scenario 1: Key Hunt
**Setup:**
- Agent spawns in Room A
- Key in Room B (visible from A through doorway)
- Locked door to Goal Room C
**Learning Objectives:**
- Object recognition ("I see a key")
- Cause-effect reasoning ("key unlocks door")
- Sequential planning (go to B → get key → return to door → unlock → enter C)
**Success Metric:** Agent reaches Goal Room C
---
### Scenario 2: Guard Patrol
**Setup:**
- Single guard on 4-point rectangular patrol
- Agent must cross patrol path to reach goal
- No combat option - must avoid detection
**Learning Objectives:**
- Behavior observation ("guard moves clockwise")
- Prediction ("guard will be at X in 2 turns")
- Timing ("I should move now while guard faces away")
**Success Metric:** Agent reaches goal without triggering alert
---
### Scenario 3: Cooperative Unlock
**Setup:**
- Agent A in Room 1 with key
- Agent B in Room 2 near locked door
- Door blocks Agent B's goal
**Learning Objectives:**
- Situation communication ("I have the key")
- Coordination ("Wait, I'm coming to unlock it")
- Theory of mind ("Agent B needs the door opened")
**Success Metric:** Both agents reach their respective goals
---
### Scenario 4: Combat Decision
**Setup:**
- Weak enemy blocks direct path to goal
- Alternative route available (longer, safe)
- Agent has limited HP
**Learning Objectives:**
- Risk assessment ("Can I win this fight?")
- Resource management ("Is the shortcut worth the HP?")
- Alternative planning ("I could go around instead")
**Success Metric:** Agent reaches goal (either path)
---
## Development Phases
### Phase 1: ActionLog Foundation
**Goal:** Instrument entity interactions without breaking existing CoS gameplay.
**Tasks:**
1. Create `ActionRecord` dataclass
2. Create `ActionLog` class with basic recording
3. Modify `COSEntity.bump()` to return `ActionRecord`
4. Add `ActionRecord` returns to `ev_enter()`, `ev_exit()`
5. Wire `ActionLog` into game loop
**Validation:** CoS plays normally; ActionLog captures all events
---
### Phase 2: TurnContext & Perception
**Goal:** Generate per-entity perception from ActionLog.
**Tasks:**
1. Create `TurnContext` dataclass
2. Implement FOV-filtered action retrieval
3. Implement `to_prose()` for LLM text generation
4. Add `SpeechChannel` with FOV-based reception
5. Create `TurnOrchestrator` that generates TurnContext per turn
**Validation:** Print TurnContext.to_prose() alongside gameplay; verify accuracy
---
### Phase 3: Clean Entity Set
**Goal:** Implement academically-presentable entities.
**Tasks:**
1. `KeyEntity` - takeable item
2. `LockedDoorEntity` - conditional passage
3. `GuardEntity` - patrol behavior with FOV detection
4. `CombatantEntity` - basic melee combat
5. Level loader for scenario definitions
**Validation:** Human can play through all 4 scenarios
---
### Phase 4: Agent Integration
**Goal:** Connect LLM agents to the game world.
**Tasks:**
1. Adapt existing `TurnOrchestrator` from vllm_demo
2. Connect `TurnContext.to_prose()` to LLM prompts
3. Parse LLM responses to `ActionRecord`
4. Execute actions through entity system
5. Screenshot + text context packaging
**Validation:** Single agent completes Scenario 1 (Key Hunt)
---
### Phase 5: Multi-Agent & Speech
**Goal:** Enable agent-to-agent communication.
**Tasks:**
1. Multi-agent turn sequencing
2. Speech action execution via `SpeechChannel`
3. Speech reception in `TurnContext`
4. Test Scenario 3 (Cooperative Unlock)
**Validation:** Two agents coordinate via speech to solve puzzle
---
### Phase 6: Evaluation & Analysis
**Goal:** Systematic evaluation of agent capabilities.
**Tasks:**
1. Simulation logging (full ActionLog export)
2. Success/failure metrics per scenario
3. Behavior analysis tools
4. Comparison across LLM models
**Deliverable:** Paper-ready evaluation results
---
## File Structure
```
tests/
├── vllm_demo/ # Existing demo code
│ ├── action_parser.py # LLM response parsing
│ ├── action_executor.py # → Refactor to use ActionLog
│ ├── world_graph.py # → Keep for room descriptions
│ ├── turn_orchestrator.py # → Enhance with TurnContext
│ └── scenarios/ # New: scenario definitions
│ ├── key_hunt.py
│ ├── guard_patrol.py
│ ├── cooperative_unlock.py
│ └── combat_decision.py
src/scripts/
├── cos_entities.py # → Add ActionRecord returns
├── game_entities.py # New: clean academic entities
├── action_log.py # New: ActionLog system
├── speech_channel.py # New: agent communication
└── turn_context.py # New: perception generation
```
---
## Related Issues
- #154 - Grounded Multi-Agent Testbed (parent)
- #155 - Deterministic Text Descriptions (closed)
- #156 - Turn-based LLM Agent Orchestration (in progress)
- #153 - Separate render loop from game state loop (complete)
---
## References
- Crypt of Sokoban entity system: `src/scripts/cos_entities.py`
- Existing VLLM demos: `tests/vllm_demo/`
- FOV/Perspective system: Issue #154 comment (2025-12-01)