Add LLM Agent Testbed Architecture

2025-12-16 21:28:11 +00:00 · 2025-12-16 21:28:11 +00:00 · 2c9d9cd341
commit 2c9d9cd341
parent 79c9584cac
1 changed files with 658 additions and 0 deletions
--- a/LLM-Agent-Testbed-Architecture.md
+++ b/LLM-Agent-Testbed-Architecture.md
@ -0,0 +1,658 @@
 # LLM Agent Testbed Architecture
 **Status**: Planning  
 **Parent Issue**: #154 - Grounded Multi-Agent Testbed  
 **Last Updated**: 2025-12-14
 ## Overview
 This document describes the architecture for running LLM agents in McRogueFace environments. The system serves dual purposes:
 1. **Human-playable game** with traditional roguelike mechanics
 2. **Agent testbed** for studying grounded language understanding
 The key insight is that both modes share the same physics layer, but differ in their perception layer. An **ActionLog** bridges this gap, providing structured event data that humans can view as a combat log and agents receive as text context.
 ---
 ## Architecture Layers
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                    PHYSICS LAYER                             │
 │         Entity interactions: bump, ev_enter, ev_exit         │
 │         Deterministic, turn-based, grid-based                │
 └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
 ┌─────────────────────────────────────────────────────────────┐
 │                  EPISTEMOLOGY LAYER                          │
 │         ActionLog + SpeechChannel + TurnContext              │
 │         "What happened and who perceived it"                 │
 └─────────────────────────────────────────────────────────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
 ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
 │   HUMAN CLIENT    │ │   AGENT CLIENT    │ │  REPLAY CLIENT    │
 │ • Renders grid    │ │ • Screenshot +    │ │ • Step through    │
 │ • Keyboard input  │ │   TurnContext     │ │   ActionLog       │
 │ • Optional log UI │ │ • LLM query/parse │ │ • Animate actions │
 └───────────────────┘ └───────────────────┘ └───────────────────┘
 ```
 ---
 ## Physics Layer: Entity Interaction Model
 Adapted from Crypt of Sokoban's proven interaction system.
 ### Core Events
 | Event | Trigger | Purpose |
 |-------|---------|---------|
 | `bump(other, dx, dy)` | Entity attempts to enter occupied tile | Collision resolution, combat, interaction |
 | `ev_enter(other)` | Entity successfully enters tile | Triggers (pressure plates, pickups) |
 | `ev_exit(other)` | Entity leaves tile | State reversal (plate release) |
 ### Resolution Order
 The `draw_order` property determines which entity handles `bump` first when multiple occupy a tile:
 | draw_order | Entity Type | Rationale |
 |------------|-------------|-----------|
 | 10 | Player/Agent | Highest priority - always respond to bumps |
 | 7 | Enemies | Combat entities |
 | 5 | Items | Can be picked up |
 | 2 | Doors | Conditional passage |
 | 1 | Floor triggers | Lowest - checked last, allows overlap |
 ### Entity Base Class
 ```python
 class GameEntity:
    """Base class for all interactive entities."""
    draw_order: int = 5
    description: str = "an entity"
    detailed_description: str = "You see an entity."
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        """Called when another entity tries to move into this tile."""
        raise NotImplementedError
    def ev_enter(self, other) -> Optional[ActionRecord]:
        """Called when another entity enters this tile."""
        pass
    def ev_exit(self, other) -> Optional[ActionRecord]:
        """Called when another entity leaves this tile."""
        pass
    def act(self) -> Optional[ActionRecord]:
        """Called on this entity's turn (for NPCs)."""
        pass
 ```
 ---
 ## Epistemology Layer: ActionLog
 The central event bus that records all game actions for perception filtering.
 ### ActionRecord
 ```python
@dataclass
 class ActionRecord:
    """A single recorded action in the game world."""
    turn: int                    # Game turn number
    actor_id: str                # Unique entity identifier
    actor_description: str       # "the knight", "a guard"
    action_type: str             # "move", "take", "unlock", "attack", "speak"
    # Action details
    args: Dict[str, Any]         # {"direction": "north", "item": "brass_key"}
    target_id: Optional[str]     # Entity acted upon
    target_description: Optional[str]
    # Outcome
    result: str                  # "success", "blocked", "hit", "miss"
    result_message: str          # "The knight unlocks the door with the brass key."
    # Spatial info (for perception filtering)
    position: Tuple[int, int]    # Where it happened
    sound_radius: int            # How far the sound travels (0 = silent)
 ```
 ### ActionLog Class
 ```python
 class ActionLog:
    """Central record of all game actions."""
    def record(self, action: ActionRecord) -> None:
        """Record an action."""
    def get_visible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
        """Get actions the observer could SEE (in FOV when they happened)."""
    def get_audible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
        """Get actions the observer could HEAR.
        - In FOV: Full detail ("The guard says 'Halt!'")
        - Out of FOV, in range: Vague ("You hear a voice to the east")
        - Out of range: Nothing
        """
    def get_turn_summary(self, turn: int) -> List[ActionRecord]:
        """Get all actions from a specific turn (for replay)."""
 ```
 ### SpeechChannel
 Speech is a special action type with FOV-based reception:
 ```python
 class SpeechChannel:
    """Handles agent-to-agent communication."""
    def speak(self, speaker, message: str, volume: str = "normal") -> ActionRecord:
        """
        Broadcast speech from speaker.
        volume:
        - "whisper": Adjacent tiles only (radius 1)
        - "normal": FOV range (same as sight)
        - "shout": Entire room
        Returns ActionRecord with sound_radius set appropriately.
        """
    def get_heard_speech(self, listener, grid, since_turn: int) -> List[SpeechRecord]:
        """
        Get speech heard by listener.
        Returns:
        - Full text if speaker was in listener's FOV
        - "You hear indistinct speech to the {direction}" if out of FOV but in range
        """
 ```
 ### TurnContext
 What an entity perceives when it's their turn:
 ```python
@dataclass
 class TurnContext:
    """Complete perception state for an entity's turn."""
    # Identity
    actor_id: str
    actor_description: str
    position: Tuple[int, int]
    current_room: str
    # Visual perception (current FOV)
    visible_entities: List[EntitySnapshot]
    visible_terrain: List[TileSnapshot]
    # Auditory perception (since last turn)
    heard_speech: List[SpeechRecord]
    heard_sounds: List[str]  # "footsteps to the north", "a door creaking"
    # Observed actions (in FOV since last turn)
    observed_actions: List[ActionRecord]
    # Available actions
    available_actions: List[str]  # ["GO NORTH", "TAKE brass_key", "SPEAK '...'"]
    # Inventory
    inventory: List[str]
    def to_prose(self) -> str:
        """Generate natural language description for LLM context."""
        parts = []
        # Location
        parts.append(f"You are in {self.current_room}.")
        # What you see
        if self.visible_entities:
            parts.append(self._describe_visible())
        # What happened (observed actions)
        if self.observed_actions:
            parts.append("Since your last turn:")
            for action in self.observed_actions:
                parts.append(f"  - {action.result_message}")
        # What you heard
        if self.heard_speech:
            for speech in self.heard_speech:
                if speech.in_fov:
                    parts.append(f'{speech.speaker} says: "{speech.message}"')
                else:
                    parts.append(f"You hear someone speaking to the {speech.direction}.")
        # Available actions
        parts.append(f"Available actions: {', '.join(self.available_actions)}")
        return "\n".join(parts)
 ```
 ---
 ## Game Entities
 Academically-presentable entity types for agent learning scenarios.
 ### KeyEntity
 ```python
 class KeyEntity(GameEntity):
    """A key that can be picked up and used on matching doors."""
    draw_order = 5
    def __init__(self, x, y, key_id: str, display_name: str = "a brass key"):
        self.key_id = key_id
        self.description = display_name
        self.detailed_description = f"{display_name}. It might unlock something."
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        # Actor picks up the key
        if hasattr(actor, 'inventory'):
            if not test:
                actor.inventory.append(self.key_id)
                self.die()  # Remove from world
            return ActionRecord(
                action_type="take",
                result="success",
                result_message=f"{actor.description} picks up {self.description}.",
                sound_radius=2
            )
        return ActionRecord(
            action_type="take",
            result="failure",
            result_message=f"{self.description} lies on the ground.",
            sound_radius=0
        )
 ```
 ### LockedDoorEntity
 ```python
 class LockedDoorEntity(GameEntity):
    """A door that requires a specific key to open."""
    draw_order = 2
    def __init__(self, x, y, key_id: str, destination_room: str):
        self.key_id = key_id
        self.destination_room = destination_room
        self.locked = True
        self.description = "a locked door"
        self.detailed_description = "A sturdy wooden door. It has a brass keyhole."
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        if self.locked:
            # Check if actor has the key
            if hasattr(actor, 'inventory') and self.key_id in actor.inventory:
                if not test:
                    self.unlock()
                return ActionRecord(
                    action_type="unlock",
                    result="success",
                    result_message=f"{actor.description} unlocks the door.",
                    sound_radius=5  # Loud click
                )
            else:
                return ActionRecord(
                    action_type="open",
                    result="blocked",
                    result_message="The door is locked.",
                    sound_radius=1
                )
        else:
            # Door is open, allow passage
            if not test:
                actor.do_move(actor.x + dx, actor.y + dy)
            return ActionRecord(
                action_type="move",
                result="success",
                result_message=f"{actor.description} passes through the doorway.",
                sound_radius=2
            )
    def unlock(self):
        self.locked = False
        self.sprite_number = OPEN_DOOR_SPRITE
        self.description = "an open doorway"
 ```
 ### GuardEntity
 ```python
 class GuardEntity(GameEntity):
    """An NPC that patrols a fixed route and reacts to intruders."""
    draw_order = 7
    def __init__(self, x, y, patrol_route: List[Tuple[int, int]], 
                 behavior: str = "patrol"):
        self.patrol_route = patrol_route
        self.patrol_index = 0
        self.behavior = behavior  # "patrol", "stationary", "chase"
        self.alert = False
        self.description = "a guard"
        self.detailed_description = "An armored guard. They look vigilant."
        self.sight_range = 6
    def act(self) -> Optional[ActionRecord]:
        """Called on guard's turn."""
        if self.behavior == "patrol":
            return self._patrol_step()
        elif self.behavior == "chase" and self.target:
            return self._chase_step()
        return ActionRecord(
            action_type="wait",
            result="success",
            result_message=f"{self.description} stands watch.",
            sound_radius=0
        )
    def _patrol_step(self) -> ActionRecord:
        """Move to next point on patrol route."""
        next_pos = self.patrol_route[self.patrol_index]
        self.patrol_index = (self.patrol_index + 1) % len(self.patrol_route)
        # Move toward next patrol point
        dx = sign(next_pos[0] - self.x)
        dy = sign(next_pos[1] - self.y)
        if self.try_move(dx, dy):
            return ActionRecord(
                action_type="move",
                result="success", 
                result_message=f"{self.description} continues their patrol.",
                sound_radius=3  # Footsteps
            )
    def check_fov_for_intruders(self, entities, grid) -> Optional[ActionRecord]:
        """Check if any player/agent is visible."""
        for entity in entities:
            if isinstance(entity, PlayerEntity) and grid.is_in_fov(entity.x, entity.y):
                self.alert = True
                self.target = entity
                self.behavior = "chase"
                return ActionRecord(
                    action_type="speak",
                    result="success",
                    result_message=f'{self.description} shouts: "Halt! Intruder!"',
                    sound_radius=10  # Shout carries far
                )
        return None
 ```
 ### CombatantEntity
 ```python
 class CombatantEntity(GameEntity):
    """Basic enemy that engages in melee combat."""
    draw_order = 7
    def __init__(self, x, y, hp: int = 3, damage: int = 1, 
                 description: str = "a hostile creature"):
        self.hp = hp
        self.max_hp = hp
        self.damage = damage
        self.description = description
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        """Handle being bumped (attacked)."""
        if self.hp <= 0:
            # Dead - allow walking over
            if not test:
                actor.do_move(self.x, self.y)
            return ActionRecord(
                action_type="move",
                result="success",
                result_message=f"{actor.description} steps over the fallen {self.description}.",
                sound_radius=1
            )
        # Combat!
        if hasattr(actor, 'damage'):
            damage_dealt = actor.damage
            if not test:
                self.hp -= damage_dealt
            if self.hp <= 0:
                return ActionRecord(
                    action_type="attack",
                    result="kill",
                    result_message=f"{actor.description} defeats {self.description}!",
                    sound_radius=5
                )
            else:
                return ActionRecord(
                    action_type="attack",
                    result="hit",
                    result_message=f"{actor.description} strikes {self.description}.",
                    sound_radius=4
                )
        return ActionRecord(
            action_type="bump",
            result="blocked",
            result_message=f"{self.description} blocks the path.",
            sound_radius=1
        )
 ```
 ---
 ## Agent Learning Scenarios
 Progressive scenarios for evaluating agent capabilities.
 ### Scenario 1: Key Hunt
 **Setup:**
 - Agent spawns in Room A
 - Key in Room B (visible from A through doorway)
 - Locked door to Goal Room C
 **Learning Objectives:**
 - Object recognition ("I see a key")
 - Cause-effect reasoning ("key unlocks door")
 - Sequential planning (go to B → get key → return to door → unlock → enter C)
 **Success Metric:** Agent reaches Goal Room C
 ---
 ### Scenario 2: Guard Patrol
 **Setup:**
 - Single guard on 4-point rectangular patrol
 - Agent must cross patrol path to reach goal
 - No combat option - must avoid detection
 **Learning Objectives:**
 - Behavior observation ("guard moves clockwise")
 - Prediction ("guard will be at X in 2 turns")
 - Timing ("I should move now while guard faces away")
 **Success Metric:** Agent reaches goal without triggering alert
 ---
 ### Scenario 3: Cooperative Unlock
 **Setup:**
 - Agent A in Room 1 with key
 - Agent B in Room 2 near locked door
 - Door blocks Agent B's goal
 **Learning Objectives:**
 - Situation communication ("I have the key")
 - Coordination ("Wait, I'm coming to unlock it")
 - Theory of mind ("Agent B needs the door opened")
 **Success Metric:** Both agents reach their respective goals
 ---
 ### Scenario 4: Combat Decision
 **Setup:**
 - Weak enemy blocks direct path to goal
 - Alternative route available (longer, safe)
 - Agent has limited HP
 **Learning Objectives:**
 - Risk assessment ("Can I win this fight?")
 - Resource management ("Is the shortcut worth the HP?")
 - Alternative planning ("I could go around instead")
 **Success Metric:** Agent reaches goal (either path)
 ---
 ## Development Phases
 ### Phase 1: ActionLog Foundation
 **Goal:** Instrument entity interactions without breaking existing CoS gameplay.
 **Tasks:**
 1. Create `ActionRecord` dataclass
 2. Create `ActionLog` class with basic recording
 3. Modify `COSEntity.bump()` to return `ActionRecord`
 4. Add `ActionRecord` returns to `ev_enter()`, `ev_exit()`
 5. Wire `ActionLog` into game loop
 **Validation:** CoS plays normally; ActionLog captures all events
 ---
 ### Phase 2: TurnContext & Perception
 **Goal:** Generate per-entity perception from ActionLog.
 **Tasks:**
 1. Create `TurnContext` dataclass
 2. Implement FOV-filtered action retrieval
 3. Implement `to_prose()` for LLM text generation
 4. Add `SpeechChannel` with FOV-based reception
 5. Create `TurnOrchestrator` that generates TurnContext per turn
 **Validation:** Print TurnContext.to_prose() alongside gameplay; verify accuracy
 ---
 ### Phase 3: Clean Entity Set
 **Goal:** Implement academically-presentable entities.
 **Tasks:**
 1. `KeyEntity` - takeable item
 2. `LockedDoorEntity` - conditional passage
 3. `GuardEntity` - patrol behavior with FOV detection
 4. `CombatantEntity` - basic melee combat
 5. Level loader for scenario definitions
 **Validation:** Human can play through all 4 scenarios
 ---
 ### Phase 4: Agent Integration
 **Goal:** Connect LLM agents to the game world.
 **Tasks:**
 1. Adapt existing `TurnOrchestrator` from vllm_demo
 2. Connect `TurnContext.to_prose()` to LLM prompts
 3. Parse LLM responses to `ActionRecord`
 4. Execute actions through entity system
 5. Screenshot + text context packaging
 **Validation:** Single agent completes Scenario 1 (Key Hunt)
 ---
 ### Phase 5: Multi-Agent & Speech
 **Goal:** Enable agent-to-agent communication.
 **Tasks:**
 1. Multi-agent turn sequencing
 2. Speech action execution via `SpeechChannel`
 3. Speech reception in `TurnContext`
 4. Test Scenario 3 (Cooperative Unlock)
 **Validation:** Two agents coordinate via speech to solve puzzle
 ---
 ### Phase 6: Evaluation & Analysis
 **Goal:** Systematic evaluation of agent capabilities.
 **Tasks:**
 1. Simulation logging (full ActionLog export)
 2. Success/failure metrics per scenario
 3. Behavior analysis tools
 4. Comparison across LLM models
 **Deliverable:** Paper-ready evaluation results
 ---
 ## File Structure
 ```
 tests/
 ├── vllm_demo/                    # Existing demo code
 │   ├── action_parser.py          # LLM response parsing
 │   ├── action_executor.py        # → Refactor to use ActionLog
 │   ├── world_graph.py            # → Keep for room descriptions
 │   ├── turn_orchestrator.py      # → Enhance with TurnContext
 │   └── scenarios/                # New: scenario definitions
 │       ├── key_hunt.py
 │       ├── guard_patrol.py
 │       ├── cooperative_unlock.py
 │       └── combat_decision.py
 │
 src/scripts/
 ├── cos_entities.py               # → Add ActionRecord returns
 ├── game_entities.py              # New: clean academic entities
 ├── action_log.py                 # New: ActionLog system
 ├── speech_channel.py             # New: agent communication
 └── turn_context.py               # New: perception generation
 ```
 ---
 ## Related Issues
 - #154 - Grounded Multi-Agent Testbed (parent)
 - #155 - Deterministic Text Descriptions (closed)
 - #156 - Turn-based LLM Agent Orchestration (in progress)
 - #153 - Separate render loop from game state loop (complete)
 ---
 ## References
 - Crypt of Sokoban entity system: `src/scripts/cos_entities.py`
 - Existing VLLM demos: `tests/vllm_demo/`
 - FOV/Perspective system: Issue #154 comment (2025-12-01)