Table of Contents
- LLM Agent Testbed Architecture
- Overview
- Architecture Layers
- Physics Layer: Entity Interaction Model
- Epistemology Layer: ActionLog
- Game Entities
- Agent Learning Scenarios
- Scenario 1: Key Hunt
- Scenario 2: Guard Patrol
- Scenario 3: Cooperative Unlock
- Scenario 4: Combat Decision
- Development Phases
- Phase 1: ActionLog Foundation
- Phase 2: TurnContext & Perception
- Phase 3: Clean Entity Set
- Phase 4: Agent Integration
- Phase 5: Multi-Agent & Speech
- Phase 6: Evaluation & Analysis
- File Structure
- Related Issues
- References
LLM Agent Testbed Architecture
Status: Design Complete, Implementation Pending
Parent Issue: #154 - Grounded Multi-Agent Testbed
Last Updated: 2026-02-07
Overview
This document describes the architecture for running LLM agents in McRogueFace environments. The system serves dual purposes:
- Human-playable game with traditional roguelike mechanics
- Agent testbed for studying grounded language understanding
The key insight is that both modes share the same physics layer, but differ in their perception layer. An ActionLog bridges this gap, providing structured event data that humans can view as a combat log and agents receive as text context.
Headless Mode Available: McRogueFace now supports true headless execution (
--headless --exec) with deterministic frame stepping viamcrfpy.step()(#157 completed). This enables agent testing without an X11 display server or GPU, making it suitable for CI pipelines, remote servers, and automated evaluation harnesses.
Architecture Layers
┌─────────────────────────────────────────────────────────────┐
│ PHYSICS LAYER │
│ Entity interactions: bump, ev_enter, ev_exit │
│ Deterministic, turn-based, grid-based │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ EPISTEMOLOGY LAYER │
│ ActionLog + SpeechChannel + TurnContext │
│ "What happened and who perceived it" │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ HUMAN CLIENT │ │ AGENT CLIENT │ │ REPLAY CLIENT │
│ • Renders grid │ │ • Screenshot + │ │ • Step through │
│ • Keyboard input │ │ TurnContext │ │ ActionLog │
│ • Optional log UI │ │ • LLM query/parse │ │ • Animate actions │
└───────────────────┘ └───────────────────┘ └───────────────────┘
Physics Layer: Entity Interaction Model
Adapted from Crypt of Sokoban's proven interaction system.
Core Events
| Event | Trigger | Purpose |
|---|---|---|
bump(other, dx, dy) |
Entity attempts to enter occupied tile | Collision resolution, combat, interaction |
ev_enter(other) |
Entity successfully enters tile | Triggers (pressure plates, pickups) |
ev_exit(other) |
Entity leaves tile | State reversal (plate release) |
Resolution Order
The draw_order property determines which entity handles bump first when multiple occupy a tile:
| draw_order | Entity Type | Rationale |
|---|---|---|
| 10 | Player/Agent | Highest priority - always respond to bumps |
| 7 | Enemies | Combat entities |
| 5 | Items | Can be picked up |
| 2 | Doors | Conditional passage |
| 1 | Floor triggers | Lowest - checked last, allows overlap |
Entity Base Class
class GameEntity:
"""Base class for all interactive entities."""
draw_order: int = 5
description: str = "an entity"
detailed_description: str = "You see an entity."
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
"""Called when another entity tries to move into this tile."""
raise NotImplementedError
def ev_enter(self, other) -> Optional[ActionRecord]:
"""Called when another entity enters this tile."""
pass
def ev_exit(self, other) -> Optional[ActionRecord]:
"""Called when another entity leaves this tile."""
pass
def act(self) -> Optional[ActionRecord]:
"""Called on this entity's turn (for NPCs)."""
pass
Epistemology Layer: ActionLog
The central event bus that records all game actions for perception filtering.
ActionRecord
@dataclass
class ActionRecord:
"""A single recorded action in the game world."""
turn: int # Game turn number
actor_id: str # Unique entity identifier
actor_description: str # "the knight", "a guard"
action_type: str # "move", "take", "unlock", "attack", "speak"
# Action details
args: Dict[str, Any] # {"direction": "north", "item": "brass_key"}
target_id: Optional[str] # Entity acted upon
target_description: Optional[str]
# Outcome
result: str # "success", "blocked", "hit", "miss"
result_message: str # "The knight unlocks the door with the brass key."
# Spatial info (for perception filtering)
position: Tuple[int, int] # Where it happened
sound_radius: int # How far the sound travels (0 = silent)
ActionLog Class
class ActionLog:
"""Central record of all game actions."""
def record(self, action: ActionRecord) -> None:
"""Record an action."""
def get_visible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
"""Get actions the observer could SEE (in FOV when they happened)."""
def get_audible_to(self, observer, grid, since_turn: int) -> List[ActionRecord]:
"""Get actions the observer could HEAR.
- In FOV: Full detail ("The guard says 'Halt!'")
- Out of FOV, in range: Vague ("You hear a voice to the east")
- Out of range: Nothing
"""
def get_turn_summary(self, turn: int) -> List[ActionRecord]:
"""Get all actions from a specific turn (for replay)."""
SpeechChannel
Speech is a special action type with FOV-based reception:
class SpeechChannel:
"""Handles agent-to-agent communication."""
def speak(self, speaker, message: str, volume: str = "normal") -> ActionRecord:
"""
Broadcast speech from speaker.
volume:
- "whisper": Adjacent tiles only (radius 1)
- "normal": FOV range (same as sight)
- "shout": Entire room
Returns ActionRecord with sound_radius set appropriately.
"""
def get_heard_speech(self, listener, grid, since_turn: int) -> List[SpeechRecord]:
"""
Get speech heard by listener.
Returns:
- Full text if speaker was in listener's FOV
- "You hear indistinct speech to the {direction}" if out of FOV but in range
"""
TurnContext
What an entity perceives when it's their turn:
@dataclass
class TurnContext:
"""Complete perception state for an entity's turn."""
# Identity
actor_id: str
actor_description: str
position: Tuple[int, int]
current_room: str
# Visual perception (current FOV)
visible_entities: List[EntitySnapshot]
visible_terrain: List[TileSnapshot]
# Auditory perception (since last turn)
heard_speech: List[SpeechRecord]
heard_sounds: List[str] # "footsteps to the north", "a door creaking"
# Observed actions (in FOV since last turn)
observed_actions: List[ActionRecord]
# Available actions
available_actions: List[str] # ["GO NORTH", "TAKE brass_key", "SPEAK '...'"]
# Inventory
inventory: List[str]
def to_prose(self) -> str:
"""Generate natural language description for LLM context."""
parts = []
# Location
parts.append(f"You are in {self.current_room}.")
# What you see
if self.visible_entities:
parts.append(self._describe_visible())
# What happened (observed actions)
if self.observed_actions:
parts.append("Since your last turn:")
for action in self.observed_actions:
parts.append(f" - {action.result_message}")
# What you heard
if self.heard_speech:
for speech in self.heard_speech:
if speech.in_fov:
parts.append(f'{speech.speaker} says: "{speech.message}"')
else:
parts.append(f"You hear someone speaking to the {speech.direction}.")
# Available actions
parts.append(f"Available actions: {', '.join(self.available_actions)}")
return "\n".join(parts)
Game Entities
Academically-presentable entity types for agent learning scenarios.
KeyEntity
class KeyEntity(GameEntity):
"""A key that can be picked up and used on matching doors."""
draw_order = 5
def __init__(self, x, y, key_id: str, display_name: str = "a brass key"):
self.key_id = key_id
self.description = display_name
self.detailed_description = f"{display_name}. It might unlock something."
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
# Actor picks up the key
if hasattr(actor, 'inventory'):
if not test:
actor.inventory.append(self.key_id)
self.die() # Remove from world
return ActionRecord(
action_type="take",
result="success",
result_message=f"{actor.description} picks up {self.description}.",
sound_radius=2
)
return ActionRecord(
action_type="take",
result="failure",
result_message=f"{self.description} lies on the ground.",
sound_radius=0
)
LockedDoorEntity
class LockedDoorEntity(GameEntity):
"""A door that requires a specific key to open."""
draw_order = 2
def __init__(self, x, y, key_id: str, destination_room: str):
self.key_id = key_id
self.destination_room = destination_room
self.locked = True
self.description = "a locked door"
self.detailed_description = "A sturdy wooden door. It has a brass keyhole."
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
if self.locked:
# Check if actor has the key
if hasattr(actor, 'inventory') and self.key_id in actor.inventory:
if not test:
self.unlock()
return ActionRecord(
action_type="unlock",
result="success",
result_message=f"{actor.description} unlocks the door.",
sound_radius=5 # Loud click
)
else:
return ActionRecord(
action_type="open",
result="blocked",
result_message="The door is locked.",
sound_radius=1
)
else:
# Door is open, allow passage
if not test:
actor.do_move(actor.x + dx, actor.y + dy)
return ActionRecord(
action_type="move",
result="success",
result_message=f"{actor.description} passes through the doorway.",
sound_radius=2
)
def unlock(self):
self.locked = False
self.sprite_number = OPEN_DOOR_SPRITE
self.description = "an open doorway"
GuardEntity
class GuardEntity(GameEntity):
"""An NPC that patrols a fixed route and reacts to intruders."""
draw_order = 7
def __init__(self, x, y, patrol_route: List[Tuple[int, int]],
behavior: str = "patrol"):
self.patrol_route = patrol_route
self.patrol_index = 0
self.behavior = behavior # "patrol", "stationary", "chase"
self.alert = False
self.description = "a guard"
self.detailed_description = "An armored guard. They look vigilant."
self.sight_range = 6
def act(self) -> Optional[ActionRecord]:
"""Called on guard's turn."""
if self.behavior == "patrol":
return self._patrol_step()
elif self.behavior == "chase" and self.target:
return self._chase_step()
return ActionRecord(
action_type="wait",
result="success",
result_message=f"{self.description} stands watch.",
sound_radius=0
)
def _patrol_step(self) -> ActionRecord:
"""Move to next point on patrol route."""
next_pos = self.patrol_route[self.patrol_index]
self.patrol_index = (self.patrol_index + 1) % len(self.patrol_route)
# Move toward next patrol point
dx = sign(next_pos[0] - self.x)
dy = sign(next_pos[1] - self.y)
if self.try_move(dx, dy):
return ActionRecord(
action_type="move",
result="success",
result_message=f"{self.description} continues their patrol.",
sound_radius=3 # Footsteps
)
def check_fov_for_intruders(self, entities, grid) -> Optional[ActionRecord]:
"""Check if any player/agent is visible."""
for entity in entities:
if isinstance(entity, PlayerEntity) and grid.is_in_fov(entity.x, entity.y):
self.alert = True
self.target = entity
self.behavior = "chase"
return ActionRecord(
action_type="speak",
result="success",
result_message=f'{self.description} shouts: "Halt! Intruder!"',
sound_radius=10 # Shout carries far
)
return None
CombatantEntity
class CombatantEntity(GameEntity):
"""Basic enemy that engages in melee combat."""
draw_order = 7
def __init__(self, x, y, hp: int = 3, damage: int = 1,
description: str = "a hostile creature"):
self.hp = hp
self.max_hp = hp
self.damage = damage
self.description = description
def bump(self, actor, dx, dy, test=False) -> ActionRecord:
"""Handle being bumped (attacked)."""
if self.hp <= 0:
# Dead - allow walking over
if not test:
actor.do_move(self.x, self.y)
return ActionRecord(
action_type="move",
result="success",
result_message=f"{actor.description} steps over the fallen {self.description}.",
sound_radius=1
)
# Combat!
if hasattr(actor, 'damage'):
damage_dealt = actor.damage
if not test:
self.hp -= damage_dealt
if self.hp <= 0:
return ActionRecord(
action_type="attack",
result="kill",
result_message=f"{actor.description} defeats {self.description}!",
sound_radius=5
)
else:
return ActionRecord(
action_type="attack",
result="hit",
result_message=f"{actor.description} strikes {self.description}.",
sound_radius=4
)
return ActionRecord(
action_type="bump",
result="blocked",
result_message=f"{self.description} blocks the path.",
sound_radius=1
)
Agent Learning Scenarios
Progressive scenarios for evaluating agent capabilities.
Scenario 1: Key Hunt
Setup:
- Agent spawns in Room A
- Key in Room B (visible from A through doorway)
- Locked door to Goal Room C
Learning Objectives:
- Object recognition ("I see a key")
- Cause-effect reasoning ("key unlocks door")
- Sequential planning (go to B → get key → return to door → unlock → enter C)
Success Metric: Agent reaches Goal Room C
Scenario 2: Guard Patrol
Setup:
- Single guard on 4-point rectangular patrol
- Agent must cross patrol path to reach goal
- No combat option - must avoid detection
Learning Objectives:
- Behavior observation ("guard moves clockwise")
- Prediction ("guard will be at X in 2 turns")
- Timing ("I should move now while guard faces away")
Success Metric: Agent reaches goal without triggering alert
Scenario 3: Cooperative Unlock
Setup:
- Agent A in Room 1 with key
- Agent B in Room 2 near locked door
- Door blocks Agent B's goal
Learning Objectives:
- Situation communication ("I have the key")
- Coordination ("Wait, I'm coming to unlock it")
- Theory of mind ("Agent B needs the door opened")
Success Metric: Both agents reach their respective goals
Scenario 4: Combat Decision
Setup:
- Weak enemy blocks direct path to goal
- Alternative route available (longer, safe)
- Agent has limited HP
Learning Objectives:
- Risk assessment ("Can I win this fight?")
- Resource management ("Is the shortcut worth the HP?")
- Alternative planning ("I could go around instead")
Success Metric: Agent reaches goal (either path)
Development Phases
Phase 1: ActionLog Foundation
Goal: Instrument entity interactions without breaking existing CoS gameplay.
Tasks:
- Create
ActionRecorddataclass - Create
ActionLogclass with basic recording - Modify
COSEntity.bump()to returnActionRecord - Add
ActionRecordreturns toev_enter(),ev_exit() - Wire
ActionLoginto game loop
Validation: CoS plays normally; ActionLog captures all events
Phase 2: TurnContext & Perception
Goal: Generate per-entity perception from ActionLog.
Tasks:
- Create
TurnContextdataclass - Implement FOV-filtered action retrieval
- Implement
to_prose()for LLM text generation - Add
SpeechChannelwith FOV-based reception - Create
TurnOrchestratorthat generates TurnContext per turn
Validation: Print TurnContext.to_prose() alongside gameplay; verify accuracy
Phase 3: Clean Entity Set
Goal: Implement academically-presentable entities.
Tasks:
KeyEntity- takeable itemLockedDoorEntity- conditional passageGuardEntity- patrol behavior with FOV detectionCombatantEntity- basic melee combat- Level loader for scenario definitions
Validation: Human can play through all 4 scenarios
Phase 4: Agent Integration
Goal: Connect LLM agents to the game world.
Tasks:
- Adapt existing
TurnOrchestratorfrom vllm_demo - Connect
TurnContext.to_prose()to LLM prompts - Parse LLM responses to
ActionRecord - Execute actions through entity system
- Screenshot + text context packaging
Validation: Single agent completes Scenario 1 (Key Hunt)
Phase 5: Multi-Agent & Speech
Goal: Enable agent-to-agent communication.
Tasks:
- Multi-agent turn sequencing
- Speech action execution via
SpeechChannel - Speech reception in
TurnContext - Test Scenario 3 (Cooperative Unlock)
Validation: Two agents coordinate via speech to solve puzzle
Phase 6: Evaluation & Analysis
Goal: Systematic evaluation of agent capabilities.
Tasks:
- Simulation logging (full ActionLog export)
- Success/failure metrics per scenario
- Behavior analysis tools
- Comparison across LLM models
Deliverable: Paper-ready evaluation results
File Structure
tests/
├── vllm_demo/ # Existing demo code
│ ├── action_parser.py # LLM response parsing
│ ├── action_executor.py # → Refactor to use ActionLog
│ ├── world_graph.py # → Keep for room descriptions
│ ├── turn_orchestrator.py # → Enhance with TurnContext
│ └── scenarios/ # New: scenario definitions
│ ├── key_hunt.py
│ ├── guard_patrol.py
│ ├── cooperative_unlock.py
│ └── combat_decision.py
│
src/scripts/
├── cos_entities.py # → Add ActionRecord returns
├── game_entities.py # New: clean academic entities
├── action_log.py # New: ActionLog system
├── speech_channel.py # New: agent communication
└── turn_context.py # New: perception generation
Related Issues
- #153 - Separate render loop from game state loop (complete)
- #154 - Grounded Multi-Agent Testbed (parent, open, 2 comments)
- #155 - Deterministic Text Descriptions (closed)
- #156 - Turn-based LLM Agent Orchestration (open, 4 comments, in progress)
- #157 - Headless mode (complete - enables display-free agent testing)
References
- Crypt of Sokoban entity system:
src/scripts/cos_entities.py - Existing VLLM demos:
tests/vllm_demo/ - FOV/Perspective system: Issue #154 comment (2025-12-01)