3 LLM Agent Testbed Architecture
John McCardle edited this page 2026-02-07 23:49:24 +00:00

LLM Agent Testbed Architecture

Status: Design Complete, Implementation Pending
Parent Issue: #154 - Grounded Multi-Agent Testbed
Last Updated: 2026-02-07

Overview

This document describes the architecture for running LLM agents in McRogueFace environments. The system serves dual purposes:

  1. Human-playable game with traditional roguelike mechanics
  2. Agent testbed for studying grounded language understanding

The key insight is that both modes share the same physics layer, but differ in their perception layer. An ActionLog bridges this gap, providing structured event data that humans can view as a combat log and agents receive as text context.

Headless Mode Available: McRogueFace now supports true headless execution (--headless --exec) with deterministic frame stepping via mcrfpy.step() (#157 completed). This enables agent testing without an X11 display server or GPU, making it suitable for CI pipelines, remote servers, and automated evaluation harnesses.


Architecture Layers

┌─────────────────────────────────────────────────────────────┐
│                    PHYSICS LAYER                             │
│         Entity interactions: bump, ev_enter, ev_exit         │
│         Deterministic, turn-based, grid-based                │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                  EPISTEMOLOGY LAYER                          │
│         ActionLog + SpeechChannel + TurnContext              │
│         "What happened and who perceived it"                 │
└─────────────────────────────────────────────────────────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│   HUMAN CLIENT    │ │   AGENT CLIENT    │ │  REPLAY CLIENT    │
│ • Renders grid    │ │ • Screenshot +    │ │ • Step through    │
│ • Keyboard input  │ │   TurnContext     │ │   ActionLog       │
│ • Optional log UI │ │ • LLM query/parse │ │ • Animate actions │
└───────────────────┘ └───────────────────┘ └───────────────────┘

Physics Layer: Entity Interaction Model

Adapted from Crypt of Sokoban's proven interaction system.

Core Events

Event Trigger Purpose
bump(other, dx, dy) Entity attempts to enter occupied tile Collision resolution, combat, interaction
ev_enter(other) Entity successfully enters tile Triggers (pressure plates, pickups)
ev_exit(other) Entity leaves tile State reversal (plate release)

Resolution Order

The draw_order property determines which entity handles bump first when multiple occupy a tile:

draw_order Entity Type Rationale
10 Player/Agent Highest priority - always respond to bumps
7 Enemies Combat entities
5 Items Can be picked up
2 Doors Conditional passage
1 Floor triggers Lowest - checked last, allows overlap

Entity Base Class

class GameEntity:
    """Base class for all interactive entities."""
    
    draw_order: int = 5
    description: str = "an entity"
    detailed_description: str = "You see an entity."
    
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        """Called when another entity tries to move into this tile."""
        raise NotImplementedError
    
    def ev_enter(self, other) -> Optional[ActionRecord]:
        """Called when another entity enters this tile."""
        pass
    
    def ev_exit(self, other) -> Optional[ActionRecord]:
        """Called when another entity leaves this tile."""
        pass
    
    def act(self) -> Optional[ActionRecord]:
        """Called on this entity's turn (for NPCs)."""
        pass

Epistemology Layer: ActionLog

The central event bus that records all game actions for perception filtering.

ActionRecord

@dataclass
class ActionRecord:
    """A single recorded action in the game world."""
    
    turn: int                    # Game turn number
    actor_id: str                # Unique entity identifier
    actor_description: str       # "the knight", "a guard"
    action_type: str             # "move", "take", "unlock", "attack", "speak"
    
    # Action details
    args: Dict[str, Any]         # {"direction": "north", "item": "brass_key"}
    target_id: Optional[str]     # Entity acted upon
    target_description: Optional[str]
    
    # Outcome
    result: str                  # "success", "blocked", "hit", "miss"
    result_message: str          # "The knight unlocks the door with the brass key."
    
    # Spatial info (for perception filtering)
    position: Tuple[int, int]    # Where it happened
    sound_radius: int            # How far the sound travels (0 = silent)

ActionLog Class

class ActionLog:
    """Central record of all game actions."""
    
    def record(self, action: ActionRecord) -> None:
        """Record an action."""
        
    def get_visible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
        """Get actions the observer could SEE (in FOV when they happened)."""
        
    def get_audible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
        """Get actions the observer could HEAR.
        
        - In FOV: Full detail ("The guard says 'Halt!'")
        - Out of FOV, in range: Vague ("You hear a voice to the east")
        - Out of range: Nothing
        """
    
    def get_turn_summary(self, turn: int) -> List[ActionRecord]:
        """Get all actions from a specific turn (for replay)."""

SpeechChannel

Speech is a special action type with FOV-based reception:

class SpeechChannel:
    """Handles agent-to-agent communication."""
    
    def speak(self, speaker, message: str, volume: str = "normal") -> ActionRecord:
        """
        Broadcast speech from speaker.
        
        volume:
        - "whisper": Adjacent tiles only (radius 1)
        - "normal": FOV range (same as sight)
        - "shout": Entire room
        
        Returns ActionRecord with sound_radius set appropriately.
        """
    
    def get_heard_speech(self, listener, grid, since_turn: int) -> List[SpeechRecord]:
        """
        Get speech heard by listener.
        
        Returns:
        - Full text if speaker was in listener's FOV
        - "You hear indistinct speech to the {direction}" if out of FOV but in range
        """

TurnContext

What an entity perceives when it's their turn:

@dataclass
class TurnContext:
    """Complete perception state for an entity's turn."""
    
    # Identity
    actor_id: str
    actor_description: str
    position: Tuple[int, int]
    current_room: str
    
    # Visual perception (current FOV)
    visible_entities: List[EntitySnapshot]
    visible_terrain: List[TileSnapshot]
    
    # Auditory perception (since last turn)
    heard_speech: List[SpeechRecord]
    heard_sounds: List[str]  # "footsteps to the north", "a door creaking"
    
    # Observed actions (in FOV since last turn)
    observed_actions: List[ActionRecord]
    
    # Available actions
    available_actions: List[str]  # ["GO NORTH", "TAKE brass_key", "SPEAK '...'"]
    
    # Inventory
    inventory: List[str]
    
    def to_prose(self) -> str:
        """Generate natural language description for LLM context."""
        parts = []
        
        # Location
        parts.append(f"You are in {self.current_room}.")
        
        # What you see
        if self.visible_entities:
            parts.append(self._describe_visible())
        
        # What happened (observed actions)
        if self.observed_actions:
            parts.append("Since your last turn:")
            for action in self.observed_actions:
                parts.append(f"  - {action.result_message}")
        
        # What you heard
        if self.heard_speech:
            for speech in self.heard_speech:
                if speech.in_fov:
                    parts.append(f'{speech.speaker} says: "{speech.message}"')
                else:
                    parts.append(f"You hear someone speaking to the {speech.direction}.")
        
        # Available actions
        parts.append(f"Available actions: {', '.join(self.available_actions)}")
        
        return "\n".join(parts)

Game Entities

Academically-presentable entity types for agent learning scenarios.

KeyEntity

class KeyEntity(GameEntity):
    """A key that can be picked up and used on matching doors."""
    
    draw_order = 5
    
    def __init__(self, x, y, key_id: str, display_name: str = "a brass key"):
        self.key_id = key_id
        self.description = display_name
        self.detailed_description = f"{display_name}. It might unlock something."
    
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        # Actor picks up the key
        if hasattr(actor, 'inventory'):
            if not test:
                actor.inventory.append(self.key_id)
                self.die()  # Remove from world
            return ActionRecord(
                action_type="take",
                result="success",
                result_message=f"{actor.description} picks up {self.description}.",
                sound_radius=2
            )
        return ActionRecord(
            action_type="take",
            result="failure",
            result_message=f"{self.description} lies on the ground.",
            sound_radius=0
        )

LockedDoorEntity

class LockedDoorEntity(GameEntity):
    """A door that requires a specific key to open."""
    
    draw_order = 2
    
    def __init__(self, x, y, key_id: str, destination_room: str):
        self.key_id = key_id
        self.destination_room = destination_room
        self.locked = True
        self.description = "a locked door"
        self.detailed_description = "A sturdy wooden door. It has a brass keyhole."
    
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        if self.locked:
            # Check if actor has the key
            if hasattr(actor, 'inventory') and self.key_id in actor.inventory:
                if not test:
                    self.unlock()
                return ActionRecord(
                    action_type="unlock",
                    result="success",
                    result_message=f"{actor.description} unlocks the door.",
                    sound_radius=5  # Loud click
                )
            else:
                return ActionRecord(
                    action_type="open",
                    result="blocked",
                    result_message="The door is locked.",
                    sound_radius=1
                )
        else:
            # Door is open, allow passage
            if not test:
                actor.do_move(actor.x + dx, actor.y + dy)
            return ActionRecord(
                action_type="move",
                result="success",
                result_message=f"{actor.description} passes through the doorway.",
                sound_radius=2
            )
    
    def unlock(self):
        self.locked = False
        self.sprite_number = OPEN_DOOR_SPRITE
        self.description = "an open doorway"

GuardEntity

class GuardEntity(GameEntity):
    """An NPC that patrols a fixed route and reacts to intruders."""
    
    draw_order = 7
    
    def __init__(self, x, y, patrol_route: List[Tuple[int, int]], 
                 behavior: str = "patrol"):
        self.patrol_route = patrol_route
        self.patrol_index = 0
        self.behavior = behavior  # "patrol", "stationary", "chase"
        self.alert = False
        self.description = "a guard"
        self.detailed_description = "An armored guard. They look vigilant."
        self.sight_range = 6
    
    def act(self) -> Optional[ActionRecord]:
        """Called on guard's turn."""
        
        if self.behavior == "patrol":
            return self._patrol_step()
        elif self.behavior == "chase" and self.target:
            return self._chase_step()
        
        return ActionRecord(
            action_type="wait",
            result="success",
            result_message=f"{self.description} stands watch.",
            sound_radius=0
        )
    
    def _patrol_step(self) -> ActionRecord:
        """Move to next point on patrol route."""
        next_pos = self.patrol_route[self.patrol_index]
        self.patrol_index = (self.patrol_index + 1) % len(self.patrol_route)
        
        # Move toward next patrol point
        dx = sign(next_pos[0] - self.x)
        dy = sign(next_pos[1] - self.y)
        
        if self.try_move(dx, dy):
            return ActionRecord(
                action_type="move",
                result="success", 
                result_message=f"{self.description} continues their patrol.",
                sound_radius=3  # Footsteps
            )
    
    def check_fov_for_intruders(self, entities, grid) -> Optional[ActionRecord]:
        """Check if any player/agent is visible."""
        for entity in entities:
            if isinstance(entity, PlayerEntity) and grid.is_in_fov(entity.x, entity.y):
                self.alert = True
                self.target = entity
                self.behavior = "chase"
                return ActionRecord(
                    action_type="speak",
                    result="success",
                    result_message=f'{self.description} shouts: "Halt! Intruder!"',
                    sound_radius=10  # Shout carries far
                )
        return None

CombatantEntity

class CombatantEntity(GameEntity):
    """Basic enemy that engages in melee combat."""
    
    draw_order = 7
    
    def __init__(self, x, y, hp: int = 3, damage: int = 1, 
                 description: str = "a hostile creature"):
        self.hp = hp
        self.max_hp = hp
        self.damage = damage
        self.description = description
    
    def bump(self, actor, dx, dy, test=False) -> ActionRecord:
        """Handle being bumped (attacked)."""
        
        if self.hp <= 0:
            # Dead - allow walking over
            if not test:
                actor.do_move(self.x, self.y)
            return ActionRecord(
                action_type="move",
                result="success",
                result_message=f"{actor.description} steps over the fallen {self.description}.",
                sound_radius=1
            )
        
        # Combat!
        if hasattr(actor, 'damage'):
            damage_dealt = actor.damage
            if not test:
                self.hp -= damage_dealt
            
            if self.hp <= 0:
                return ActionRecord(
                    action_type="attack",
                    result="kill",
                    result_message=f"{actor.description} defeats {self.description}!",
                    sound_radius=5
                )
            else:
                return ActionRecord(
                    action_type="attack",
                    result="hit",
                    result_message=f"{actor.description} strikes {self.description}.",
                    sound_radius=4
                )
        
        return ActionRecord(
            action_type="bump",
            result="blocked",
            result_message=f"{self.description} blocks the path.",
            sound_radius=1
        )

Agent Learning Scenarios

Progressive scenarios for evaluating agent capabilities.

Scenario 1: Key Hunt

Setup:

  • Agent spawns in Room A
  • Key in Room B (visible from A through doorway)
  • Locked door to Goal Room C

Learning Objectives:

  • Object recognition ("I see a key")
  • Cause-effect reasoning ("key unlocks door")
  • Sequential planning (go to B → get key → return to door → unlock → enter C)

Success Metric: Agent reaches Goal Room C


Scenario 2: Guard Patrol

Setup:

  • Single guard on 4-point rectangular patrol
  • Agent must cross patrol path to reach goal
  • No combat option - must avoid detection

Learning Objectives:

  • Behavior observation ("guard moves clockwise")
  • Prediction ("guard will be at X in 2 turns")
  • Timing ("I should move now while guard faces away")

Success Metric: Agent reaches goal without triggering alert


Scenario 3: Cooperative Unlock

Setup:

  • Agent A in Room 1 with key
  • Agent B in Room 2 near locked door
  • Door blocks Agent B's goal

Learning Objectives:

  • Situation communication ("I have the key")
  • Coordination ("Wait, I'm coming to unlock it")
  • Theory of mind ("Agent B needs the door opened")

Success Metric: Both agents reach their respective goals


Scenario 4: Combat Decision

Setup:

  • Weak enemy blocks direct path to goal
  • Alternative route available (longer, safe)
  • Agent has limited HP

Learning Objectives:

  • Risk assessment ("Can I win this fight?")
  • Resource management ("Is the shortcut worth the HP?")
  • Alternative planning ("I could go around instead")

Success Metric: Agent reaches goal (either path)


Development Phases

Phase 1: ActionLog Foundation

Goal: Instrument entity interactions without breaking existing CoS gameplay.

Tasks:

  1. Create ActionRecord dataclass
  2. Create ActionLog class with basic recording
  3. Modify COSEntity.bump() to return ActionRecord
  4. Add ActionRecord returns to ev_enter(), ev_exit()
  5. Wire ActionLog into game loop

Validation: CoS plays normally; ActionLog captures all events


Phase 2: TurnContext & Perception

Goal: Generate per-entity perception from ActionLog.

Tasks:

  1. Create TurnContext dataclass
  2. Implement FOV-filtered action retrieval
  3. Implement to_prose() for LLM text generation
  4. Add SpeechChannel with FOV-based reception
  5. Create TurnOrchestrator that generates TurnContext per turn

Validation: Print TurnContext.to_prose() alongside gameplay; verify accuracy


Phase 3: Clean Entity Set

Goal: Implement academically-presentable entities.

Tasks:

  1. KeyEntity - takeable item
  2. LockedDoorEntity - conditional passage
  3. GuardEntity - patrol behavior with FOV detection
  4. CombatantEntity - basic melee combat
  5. Level loader for scenario definitions

Validation: Human can play through all 4 scenarios


Phase 4: Agent Integration

Goal: Connect LLM agents to the game world.

Tasks:

  1. Adapt existing TurnOrchestrator from vllm_demo
  2. Connect TurnContext.to_prose() to LLM prompts
  3. Parse LLM responses to ActionRecord
  4. Execute actions through entity system
  5. Screenshot + text context packaging

Validation: Single agent completes Scenario 1 (Key Hunt)


Phase 5: Multi-Agent & Speech

Goal: Enable agent-to-agent communication.

Tasks:

  1. Multi-agent turn sequencing
  2. Speech action execution via SpeechChannel
  3. Speech reception in TurnContext
  4. Test Scenario 3 (Cooperative Unlock)

Validation: Two agents coordinate via speech to solve puzzle


Phase 6: Evaluation & Analysis

Goal: Systematic evaluation of agent capabilities.

Tasks:

  1. Simulation logging (full ActionLog export)
  2. Success/failure metrics per scenario
  3. Behavior analysis tools
  4. Comparison across LLM models

Deliverable: Paper-ready evaluation results


File Structure

tests/
├── vllm_demo/                    # Existing demo code
│   ├── action_parser.py          # LLM response parsing
│   ├── action_executor.py        # → Refactor to use ActionLog
│   ├── world_graph.py            # → Keep for room descriptions
│   ├── turn_orchestrator.py      # → Enhance with TurnContext
│   └── scenarios/                # New: scenario definitions
│       ├── key_hunt.py
│       ├── guard_patrol.py
│       ├── cooperative_unlock.py
│       └── combat_decision.py
│
src/scripts/
├── cos_entities.py               # → Add ActionRecord returns
├── game_entities.py              # New: clean academic entities
├── action_log.py                 # New: ActionLog system
├── speech_channel.py             # New: agent communication
└── turn_context.py               # New: perception generation

  • #153 - Separate render loop from game state loop (complete)
  • #154 - Grounded Multi-Agent Testbed (parent, open, 2 comments)
  • #155 - Deterministic Text Descriptions (closed)
  • #156 - Turn-based LLM Agent Orchestration (open, 4 comments, in progress)
  • #157 - Headless mode (complete - enables display-free agent testing)

References

  • Crypt of Sokoban entity system: src/scripts/cos_entities.py
  • Existing VLLM demos: tests/vllm_demo/
  • FOV/Perspective system: Issue #154 comment (2025-12-01)