john/McRogueFace

Fork 0

Grounded Multi-Agent Testbed: LLM Agents in Discrete Simulated Environments #154

New issue

Open

opened 2025-12-01 15:59:23 +00:00 by john · 2 comments

john commented

2025-12-01 15:59:23 +00:00

Owner

Grounded Multi-Agent Testbed

Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.

Overview

This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations.

Three-Level Architecture

Level 1: Environment Comprehension (Base Physics)

Grid movement, walls, walkability
Field of view computation
Pathfinding (A*, Dijkstra)
Line of sight for interactions

Engine support: Mostly complete via mcrfpy.libtcod

Level 2: Game Behavior Comprehension (Middle Level)

Object affordances: pushable, openable, lockable
NPC behaviors: patrol routes, flee thresholds, scripted responses
Cause-effect chains: button→door, key→lock
Environmental rules: shop hours, trap triggers

Engine support: Entity event system exists (bump, ev_enter, ev_exit), needs formalization

Level 3: Equal Agent Comprehension (High Level)

LLM-driven agents with natural language capability
Per-agent perspective rendering for VLM input
Memory/observation history per agent
Speech affordances: "announce" (room-wide), "speak" (proximity-based)
Emergent behaviors: barter, negotiation, team formation

Engine support: Requires new infrastructure

Two Operating Modes

Headless Simulation Mode

No framerate requirements
Sequential agent turns with full LLM API calls
All randomized NPC behaviors stored as actual values
Entities teleport (no animation)
Output: Simulation log with all decisions and world states

Animated Demo Mode

Replay stored simulation outputs
Full animation of movement paths
Step through simulation as animations complete
Deterministic playback (same random seeds)

Research Questions (from proposal)

Does grounding improve compositional generalization?
Does physics randomization induce investigative behavior?
What transfers between cognitive domains?
Can agents develop accurate theories of mind for scripted NPCs?
What emergent social behaviors arise from multi-agent interaction?
How do agents resolve conflicts between visual, textual, and queried information?

Development Phases

Phase 1: Two Agents in a Room

10x10 room with locked door
Two LLM agents, one button, one door
Agent A can reach button; Agent B's goal blocked by door
Success: Agents solve coordination through emergent communication

Phase 2: Learning Middle-Level Behaviors

Add scripted NPCs (cat, guard, shop door)
Agents must predict and exploit NPC behavior patterns

Phase 3: Affordance Learning and Puzzles

ConceptNet-based object generation with varied properties
Physics randomization (same puzzle, different rules)
Compositional generalization testing

Phase 4: Economic Reasoning

Shops, quest givers, inter-agent trade
Asymmetric information scenarios

Phase 5: Town and Dungeon Integration

Full scenario combining all elements
Emergent dynamics: reputation, party formation, betrayal

Blocking Issues

Engine infrastructure required:

#153 - Separate render loop from game state loop (critical for two-mode system)
#16 - Entity knowledge contents / per-entity perspective
#113 - Batch Operations for Grid (efficient state manipulation)
#114 - CellView API (convenient grid interaction)

Project-specific issues:

#155 - Deterministic Text Descriptions From Room Graph
#156 - Turn-based LLM Agent Orchestration

#55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early)
#67 - Grid Stitching (future: infinite world scenarios)

References

Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757
Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023
Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883
Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020

# Grounded Multi-Agent Testbed **Umbrella issue for research infrastructure enabling LLM agents to operate in McRogueFace environments.** ## Overview This project implements a testbed for studying grounded language understanding in AI systems. Based on cognitive neuroscience research suggesting that language understanding requires integration of perceptual simulation, world knowledge, and situation modeling (Casto et al., 2025), we use McRogueFace's discrete roguelike environment to study how language agents learn affordances, develop theories of mind, and generalize across novel situations. ## Three-Level Architecture ### Level 1: Environment Comprehension (Base Physics) - Grid movement, walls, walkability - Field of view computation - Pathfinding (A*, Dijkstra) - Line of sight for interactions **Engine support**: Mostly complete via `mcrfpy.libtcod` ### Level 2: Game Behavior Comprehension (Middle Level) - Object affordances: pushable, openable, lockable - NPC behaviors: patrol routes, flee thresholds, scripted responses - Cause-effect chains: button→door, key→lock - Environmental rules: shop hours, trap triggers **Engine support**: Entity event system exists (`bump`, `ev_enter`, `ev_exit`), needs formalization ### Level 3: Equal Agent Comprehension (High Level) - LLM-driven agents with natural language capability - Per-agent perspective rendering for VLM input - Memory/observation history per agent - Speech affordances: "announce" (room-wide), "speak" (proximity-based) - Emergent behaviors: barter, negotiation, team formation **Engine support**: Requires new infrastructure ## Two Operating Modes ### Headless Simulation Mode - No framerate requirements - Sequential agent turns with full LLM API calls - All randomized NPC behaviors stored as actual values - Entities teleport (no animation) - Output: Simulation log with all decisions and world states ### Animated Demo Mode - Replay stored simulation outputs - Full animation of movement paths - Step through simulation as animations complete - Deterministic playback (same random seeds) ## Research Questions (from proposal) 1. Does grounding improve compositional generalization? 2. Does physics randomization induce investigative behavior? 3. What transfers between cognitive domains? 4. Can agents develop accurate theories of mind for scripted NPCs? 5. What emergent social behaviors arise from multi-agent interaction? 6. How do agents resolve conflicts between visual, textual, and queried information? ## Development Phases ### Phase 1: Two Agents in a Room - 10x10 room with locked door - Two LLM agents, one button, one door - Agent A can reach button; Agent B's goal blocked by door - Success: Agents solve coordination through emergent communication ### Phase 2: Learning Middle-Level Behaviors - Add scripted NPCs (cat, guard, shop door) - Agents must predict and exploit NPC behavior patterns ### Phase 3: Affordance Learning and Puzzles - ConceptNet-based object generation with varied properties - Physics randomization (same puzzle, different rules) - Compositional generalization testing ### Phase 4: Economic Reasoning - Shops, quest givers, inter-agent trade - Asymmetric information scenarios ### Phase 5: Town and Dungeon Integration - Full scenario combining all elements - Emergent dynamics: reputation, party formation, betrayal ## Blocking Issues Engine infrastructure required: - [ ] #153 - Separate render loop from game state loop (critical for two-mode system) - [ ] #16 - Entity knowledge contents / per-entity perspective - [ ] #113 - Batch Operations for Grid (efficient state manipulation) - [ ] #114 - CellView API (convenient grid interaction) Project-specific issues: - [ ] #155 - Deterministic Text Descriptions From Room Graph - [ ] #156 - Turn-based LLM Agent Orchestration ## Related Issues - #55 - McRogueFace as Agent Simulation Environment (predecessor, will be closed early) - #67 - Grid Stitching (future: infinite world scenarios) ## References - Casto, C., Ivanova, A., Fedorenko, E., & Kanwisher, N. (2025). What does it mean to understand language? arXiv:2511.19757 - Park, J. S., et al. (2023). Generative agents: Interactive simulacra of human behavior. UIST 2023 - Leike, J., et al. (2017). AI safety gridworlds. arXiv:1711.09883 - Küttler, H., et al. (2020). The NetHack Learning Environment. NeurIPS 2020

john referenced this issue

2025-12-01 16:00:33 +00:00

Deterministic Text Descriptions From Room Graph #155

john referenced this issue

2025-12-01 16:00:34 +00:00

Turn-based LLM Agent Orchestration #156

john added the

Major Feature

Demo Target

priority:tier1-active

labels

2025-12-01 16:15:05 +00:00

john commented

2025-12-01 21:27:22 +00:00

Author

Owner

Progress Update: FOV/Perspective System Complete

Commits c5b4200 and a529e5e implement the per-agent perspective rendering infrastructure:

New API

FOV Configuration:

mcrfpy.FOV enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST)
mcrfpy.default_fov module property
grid.fov and grid.fov_radius properties

ColorLayer Perspective Methods:

fill_rect(x, y, w, h, color) - Fill rectangular region
draw_fov(source, radius, fov, visible, discovered, unknown) - One-time FOV visualization
apply_perspective(entity, visible, discovered, unknown) - Bind layer to entity
update_perspective() - Refresh from bound entity's gridstate
clear_perspective() - Remove binding

Entity Methods:

entity.update_visibility() - Updates gridstate AND all bound ColorLayers
entity.visible_entities(fov=None, radius=None) - Get list of visible entities

Demo

tests/demo/perspective_patrol_demo.py - Interactive fog of war demonstration

Relevance to Phase 1

This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and visible_entities() enables AI decision-making based on line-of-sight.

## Progress Update: FOV/Perspective System Complete Commits `c5b4200` and `a529e5e` implement the per-agent perspective rendering infrastructure: ### New API **FOV Configuration:** - `mcrfpy.FOV` enum (BASIC, DIAMOND, SHADOW, PERMISSIVE_0-8, RESTRICTIVE, SYMMETRIC_SHADOWCAST) - `mcrfpy.default_fov` module property - `grid.fov` and `grid.fov_radius` properties **ColorLayer Perspective Methods:** - `fill_rect(x, y, w, h, color)` - Fill rectangular region - `draw_fov(source, radius, fov, visible, discovered, unknown)` - One-time FOV visualization - `apply_perspective(entity, visible, discovered, unknown)` - Bind layer to entity - `update_perspective()` - Refresh from bound entity's gridstate - `clear_perspective()` - Remove binding **Entity Methods:** - `entity.update_visibility()` - Updates gridstate AND all bound ColorLayers - `entity.visible_entities(fov=None, radius=None)` - Get list of visible entities ### Demo `tests/demo/perspective_patrol_demo.py` - Interactive fog of war demonstration ### Relevance to Phase 1 This provides the foundation for "per-agent perspective rendering for VLM input" mentioned in Level 3. Each agent can have its own ColorLayer showing what it can see, and `visible_entities()` enables AI decision-making based on line-of-sight.

john referenced this issue

2025-12-02 00:57:54 +00:00

Separate render loop from game state loop #153

john commented

2025-12-14 19:51:31 +00:00

Author

Owner

Architecture Plan Documented

Created wiki page: LLM Agent Testbed Architecture

Key Architectural Decisions

Three-Layer Architecture
- Physics Layer: Entity interactions via bump(), ev_enter(), ev_exit() (adapted from CoS)
- Epistemology Layer: ActionLog + SpeechChannel + TurnContext - bridges physics to perception
- Client Layer: Human UI, Agent API, or Replay tool - all consume the same event stream
ActionLog as Central Event Bus
- Every game action returns an ActionRecord with rich metadata
- FOV-filtered retrieval: agents only perceive what they could see/hear
- Speech uses same system: full text if speaker in FOV, "indistinct speech to the east" if not
Clean Academic Entity Set
- KeyEntity - takeable item for unlock puzzles
- LockedDoorEntity - conditional passage
- GuardEntity - patrol behavior with FOV-based detection
- CombatantEntity - basic melee combat

Learning Scenarios

Scenario	Learning Objective
Key Hunt	Cause-effect reasoning, sequential planning
Guard Patrol	Behavior observation, prediction, timing
Cooperative Unlock	Communication, coordination, theory of mind
Combat Decision	Risk assessment, alternative planning

Development Phases

ActionLog Foundation - Instrument CoS without breaking it
TurnContext & Perception - Generate per-entity perception
Clean Entity Set - Implement academic-friendly entities
Agent Integration - Connect LLM to game world
Multi-Agent & Speech - Enable agent communication
Evaluation & Analysis - Paper-ready results

Relationship to Existing Work

Builds on #155 (WorldGraph) and #156 (TurnOrchestrator)
Keeps CoS interaction patterns, replaces game-specific objects
ActionLog serves both human gameplay (combat log) and agent context

## Architecture Plan Documented Created wiki page: **[LLM Agent Testbed Architecture](https://gamedev.ffwf.net/gitea/john/McRogueFace/wiki/LLM-Agent-Testbed-Architecture)** ### Key Architectural Decisions 1. **Three-Layer Architecture** - **Physics Layer**: Entity interactions via `bump()`, `ev_enter()`, `ev_exit()` (adapted from CoS) - **Epistemology Layer**: `ActionLog` + `SpeechChannel` + `TurnContext` - bridges physics to perception - **Client Layer**: Human UI, Agent API, or Replay tool - all consume the same event stream 2. **ActionLog as Central Event Bus** - Every game action returns an `ActionRecord` with rich metadata - FOV-filtered retrieval: agents only perceive what they could see/hear - Speech uses same system: full text if speaker in FOV, "indistinct speech to the east" if not 3. **Clean Academic Entity Set** - `KeyEntity` - takeable item for unlock puzzles - `LockedDoorEntity` - conditional passage - `GuardEntity` - patrol behavior with FOV-based detection - `CombatantEntity` - basic melee combat ### Learning Scenarios | Scenario | Learning Objective | |----------|-------------------| | Key Hunt | Cause-effect reasoning, sequential planning | | Guard Patrol | Behavior observation, prediction, timing | | Cooperative Unlock | Communication, coordination, theory of mind | | Combat Decision | Risk assessment, alternative planning | ### Development Phases 1. **ActionLog Foundation** - Instrument CoS without breaking it 2. **TurnContext & Perception** - Generate per-entity perception 3. **Clean Entity Set** - Implement academic-friendly entities 4. **Agent Integration** - Connect LLM to game world 5. **Multi-Agent & Speech** - Enable agent communication 6. **Evaluation & Analysis** - Paper-ready results ### Relationship to Existing Work - Builds on #155 (WorldGraph) and #156 (TurnOrchestrator) - Keeps CoS interaction patterns, replaces game-specific objects - ActionLog serves both human gameplay (combat log) and agent context ---

john added

priority:tier2-foundation

and removed

priority:tier1-active

labels

2025-12-29 03:53:38 +00:00

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

john/McRogueFace#154

No description provided.

Rows
Columns