Grounded Multi-Agent Testbed: LLM Agents in Discrete Simulated Environments #154

Open
opened 2025-12-01 15:59:23 +00:00 by john · 2 comments
Owner

Grounded Multi-Agent Testbed

Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.

Overview

This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations.

Three-Level Architecture

Level 1: Environment Comprehension (Base Physics)

  • Grid movement, walls, walkability
  • Field of view computation
  • Pathfinding (A*, Dijkstra)
  • Line of sight for interactions

Engine support: Mostly complete via mcrfpy.libtcod

Level 2: Game Behavior Comprehension (Middle Level)

  • Object affordances: pushable, openable, lockable
  • NPC behaviors: patrol routes, flee thresholds, scripted responses
  • Cause-effect chains: button→door, key→lock
  • Environmental rules: shop hours, trap triggers

Engine support: Entity event system exists (bump, ev_enter, ev_exit), needs formalization

Level 3: Equal Agent Comprehension (High Level)

  • LLM-driven agents with natural language capability
  • Per-agent perspective rendering for VLM input
  • Memory/observation history per agent
  • Speech affordances: "announce" (room-wide), "speak" (proximity-based)
  • Emergent behaviors: barter, negotiation, team formation

Engine support: Requires new infrastructure

Two Operating Modes

Headless Simulation Mode

  • No framerate requirements
  • Sequential agent turns with full LLM API calls
  • All randomized NPC behaviors stored as actual values
  • Entities teleport (no animation)
  • Output: Simulation log with all decisions and world states

Animated Demo Mode

  • Replay stored simulation outputs
  • Full animation of movement paths
  • Step through simulation as animations complete
  • Deterministic playback (same random seeds)

Research Questions (from proposal)

  1. Does grounding improve compositional generalization?
  2. Does physics randomization induce investigative behavior?
  3. What transfers between cognitive domains?
  4. Can agents develop accurate theories of mind for scripted NPCs?
  5. What emergent social behaviors arise from multi-agent interaction?
  6. How do agents resolve conflicts between visual, textual, and queried information?

Development Phases

Phase 1: Two Agents in a Room

  • 10x10 room with locked door
  • Two LLM agents, one button, one door
  • Agent A can reach button; Agent B's goal blocked by door
  • Success: Agents solve coordination through emergent communication

Phase 2: Learning Middle-Level Behaviors

  • Add scripted NPCs (cat, guard, shop door)
  • Agents must predict and exploit NPC behavior patterns

Phase 3: Affordance Learning and Puzzles

  • ConceptNet-based object generation with varied properties
  • Physics randomization (same puzzle, different rules)
  • Compositional generalization testing

Phase 4: Economic Reasoning

  • Shops, quest givers, inter-agent trade
  • Asymmetric information scenarios

Phase 5: Town and Dungeon Integration

  • Full scenario combining all elements
  • Emergent dynamics: reputation, party formation, betrayal

Blocking Issues

Engine infrastructure required:

  • #153 - Separate render loop from game state loop (critical for two-mode system)
  • #16 - Entity knowledge contents / per-entity perspective
  • #113 - Batch Operations for Grid (efficient state manipulation)
  • #114 - CellView API (convenient grid interaction)

Project-specific issues:

  • #155 - Deterministic Text Descriptions From Room Graph
  • #156 - Turn-based LLM Agent Orchestration
  • #55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early)
  • #67 - Grid Stitching (future: infinite world scenarios)

References

  • Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757
  • Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023
  • Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883
  • Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020
# Grounded Multi-Agent Testbed **Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.** ## Overview This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations. ## Three-Level Architecture ### Level 1: Environment Comprehension (Base Physics) - Grid movement, walls, walkability - Field of view computation - Pathfinding (A*, Dijkstra) - Line of sight for interactions **Engine support**: Mostly complete via `mcrfpy.libtcod` ### Level 2: Game Behavior Comprehension (Middle Level) - Object affordances: pushable, openable, lockable - NPC behaviors: patrol routes, flee thresholds, scripted responses - Cause-effect chains: button→door, key→lock - Environmental rules: shop hours, trap triggers **Engine support**: Entity event system exists (`bump`, `ev_enter`, `ev_exit`), needs formalization ### Level 3: Equal Agent Comprehension (High Level) - LLM-driven agents with natural language capability - Per-agent perspective rendering for VLM input - Memory/observation history per agent - Speech affordances: "announce" (room-wide), "speak" (proximity-based) - Emergent behaviors: barter, negotiation, team formation **Engine support**: Requires new infrastructure ## Two Operating Modes ### Headless Simulation Mode - No framerate requirements - Sequential agent turns with full LLM API calls - All randomized NPC behaviors stored as actual values - Entities teleport (no animation) - Output: Simulation log with all decisions and world states ### Animated Demo Mode - Replay stored simulation outputs - Full animation of movement paths - Step through simulation as animations complete - Deterministic playback (same random seeds) ## Research Questions (from proposal) 1. Does grounding improve compositional generalization? 2. Does physics randomization induce investigative behavior? 3. What transfers between cognitive domains? 4. Can agents develop accurate theories of mind for scripted NPCs? 5. What emergent social behaviors arise from multi-agent interaction? 6. How do agents resolve conflicts between visual, textual, and queried information? ## Development Phases ### Phase 1: Two Agents in a Room - 10x10 room with locked door - Two LLM agents, one button, one door - Agent A can reach button; Agent B's goal blocked by door - Success: Agents solve coordination through emergent communication ### Phase 2: Learning Middle-Level Behaviors - Add scripted NPCs (cat, guard, shop door) - Agents must predict and exploit NPC behavior patterns ### Phase 3: Affordance Learning and Puzzles - ConceptNet-based object generation with varied properties - Physics randomization (same puzzle, different rules) - Compositional generalization testing ### Phase 4: Economic Reasoning - Shops, quest givers, inter-agent trade - Asymmetric information scenarios ### Phase 5: Town and Dungeon Integration - Full scenario combining all elements - Emergent dynamics: reputation, party formation, betrayal ## Blocking Issues Engine infrastructure required: - [ ] #153 - Separate render loop from game state loop (critical for two-mode system) - [ ] #16 - Entity knowledge contents / per-entity perspective - [ ] #113 - Batch Operations for Grid (efficient state manipulation) - [ ] #114 - CellView API (convenient grid interaction) Project-specific issues: - [ ] #155 - Deterministic Text Descriptions From Room Graph - [ ] #156 - Turn-based LLM Agent Orchestration ## Related Issues - #55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early) - #67 - Grid Stitching (future: infinite world scenarios) ## References - Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757 - Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023 - Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883 - Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020
Author
Owner

Progress Update: FOV/Perspective System Complete

Commits c5b4200 and a529e5e implement the per-agent perspective rendering infrastructure:

New API

FOV Configuration:

  • mcrfpy.FOV enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST)
  • mcrfpy.default_fov module property
  • grid.fov and grid.fov_radius properties

ColorLayer Perspective Methods:

  • fill_rect(x, y, w, h, color) - Fill rectangular region
  • draw_fov(source, radius, fov, visible, discovered, unknown) - One-time FOV visualization
  • apply_perspective(entity, visible, discovered, unknown) - Bind layer to entity
  • update_perspective() - Refresh from bound entity's gridstate
  • clear_perspective() - Remove binding

Entity Methods:

  • entity.update_visibility() - Updates gridstate AND all bound ColorLayers
  • entity.visible_entities(fov=None, radius=None) - Get list of visible entities

Demo

tests/demo/perspective_patrol_demo.py - Interactive fog of war demonstration

Relevance to Phase 1

This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and visible_entities() enables AI decision-making based on line-of-sight.

## Progress Update: FOV/Perspective System Complete Commits `c5b4200` and `a529e5e` implement the per-agent perspective rendering infrastructure: ### New API **FOV Configuration:** - `mcrfpy.FOV` enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST) - `mcrfpy.default_fov` module property - `grid.fov` and `grid.fov_radius` properties **ColorLayer Perspective Methods:** - `fill_rect(x, y, w, h, color)` - Fill rectangular region - `draw_fov(source, radius, fov, visible, discovered, unknown)` - One-time FOV visualization - `apply_perspective(entity, visible, discovered, unknown)` - Bind layer to entity - `update_perspective()` - Refresh from bound entity's gridstate - `clear_perspective()` - Remove binding **Entity Methods:** - `entity.update_visibility()` - Updates gridstate AND all bound ColorLayers - `entity.visible_entities(fov=None, radius=None)` - Get list of visible entities ### Demo `tests/demo/perspective_patrol_demo.py` - Interactive fog of war demonstration ### Relevance to Phase 1 This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and `visible_entities()` enables AI decision-making based on line-of-sight.
Author
Owner

Architecture Plan Documented

Created wiki page: LLM Agent Testbed Architecture

Key Architectural Decisions

  1. Three-Layer Architecture

    • Physics Layer: Entity interactions via bump(), ev_enter(), ev_exit() (adapted from CoS)
    • Epistemology Layer: ActionLog + SpeechChannel + TurnContext - bridges physics to perception
    • Client Layer: Human UI, Agent API, or Replay tool - all consume the same event stream
  2. ActionLog as Central Event Bus

    • Every game action returns an ActionRecord with rich metadata
    • FOV-filtered retrieval: agents only perceive what they could see/hear
    • Speech uses same system: full text if speaker in FOV, "indistinct speech to the east" if not
  3. Clean Academic Entity Set

    • KeyEntity - takeable item for unlock puzzles
    • LockedDoorEntity - conditional passage
    • GuardEntity - patrol behavior with FOV-based detection
    • CombatantEntity - basic melee combat

Learning Scenarios

Scenario Learning Objective
Key Hunt Cause-effect reasoning, sequential planning
Guard Patrol Behavior observation, prediction, timing
Cooperative Unlock Communication, coordination, theory of mind
Combat Decision Risk assessment, alternative planning

Development Phases

  1. ActionLog Foundation - Instrument CoS without breaking it
  2. TurnContext & Perception - Generate per-entity perception
  3. Clean Entity Set - Implement academic-friendly entities
  4. Agent Integration - Connect LLM to game world
  5. Multi-Agent & Speech - Enable agent communication
  6. Evaluation & Analysis - Paper-ready results

Relationship to Existing Work

  • Builds on #155 (WorldGraph) and #156 (TurnOrchestrator)
  • Keeps CoS interaction patterns, replaces game-specific objects
  • ActionLog serves both human gameplay (combat log) and agent context

## Architecture Plan Documented Created wiki page: **[LLM Agent Testbed Architecture](https://gamedev.ffwf.net/gitea/john/McRogueFace/wiki/LLM-Agent-Testbed-Architecture)** ### Key Architectural Decisions 1. **Three-Layer Architecture** - **Physics Layer**: Entity interactions via `bump()`, `ev_enter()`, `ev_exit()` (adapted from CoS) - **Epistemology Layer**: `ActionLog` + `SpeechChannel` + `TurnContext` - bridges physics to perception - **Client Layer**: Human UI, Agent API, or Replay tool - all consume the same event stream 2. **ActionLog as Central Event Bus** - Every game action returns an `ActionRecord` with rich metadata - FOV-filtered retrieval: agents only perceive what they could see/hear - Speech uses same system: full text if speaker in FOV, "indistinct speech to the east" if not 3. **Clean Academic Entity Set** - `KeyEntity` - takeable item for unlock puzzles - `LockedDoorEntity` - conditional passage - `GuardEntity` - patrol behavior with FOV-based detection - `CombatantEntity` - basic melee combat ### Learning Scenarios | Scenario | Learning Objective | |----------|-------------------| | Key Hunt | Cause-effect reasoning, sequential planning | | Guard Patrol | Behavior observation, prediction, timing | | Cooperative Unlock | Communication, coordination, theory of mind | | Combat Decision | Risk assessment, alternative planning | ### Development Phases 1. **ActionLog Foundation** - Instrument CoS without breaking it 2. **TurnContext & Perception** - Generate per-entity perception 3. **Clean Entity Set** - Implement academic-friendly entities 4. **Agent Integration** - Connect LLM to game world 5. **Multi-Agent & Speech** - Enable agent communication 6. **Evaluation & Analysis** - Paper-ready results ### Relationship to Existing Work - Builds on #155 (WorldGraph) and #156 (TurnOrchestrator) - Keeps CoS interaction patterns, replaces game-specific objects - ActionLog serves both human gameplay (combat log) and agent context ---
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
john/McRogueFace#154
No description provided.