Memory Pool for Entities #117

Open
opened 2025-07-12 19:04:58 +00:00 by john · 2 comments
Owner

Implement memory pooling for entity allocation/deallocation.
Definition of Done:

  • Pre-allocated pool for common entity counts
  • Reduced allocation overhead
  • No memory fragmentation
  • Configurable pool sizes
Implement memory pooling for entity allocation/deallocation. Definition of Done: - Pre-allocated pool for common entity counts - Reduced allocation overhead - No memory fragmentation - Configurable pool sizes
Author
Owner

Benchmark Results (commit fcc0376)

Added tests/benchmarks/entity_scale_benchmark.py to establish baseline metrics.

Current Performance (2,000 entities on 100×100 grid)

Operation Time Rate
Entity Creation 26.4ms 75,738/sec
Direct Iteration 0.23ms 8.5M reads/sec
EntityCollection Iteration 13.6ms 147k reads/sec

Key Finding: EntityCollection Overhead

Direct C++ iteration (via list(grid.entities) then loop): 0.23ms
EntityCollection iteration (via for e in grid.entities): 13.6ms

The 60× slowdown comes from Python wrapper object creation on each access. Each call to __next__ in the iterator:

  1. Advances the C++ iterator
  2. Allocates a new PyUIEntityObject
  3. Copies the shared_ptr
  4. Returns to Python

Projected Gains with Memory Pool

Scenario Current With Pool Speedup
Creation (10k entities) ~130ms ~20ms
Iteration (direct) 2.3ms ~0.5ms

Mechanism: Pre-allocate entity memory in contiguous blocks. Reduces malloc overhead and improves CPU cache utilization.

Additional Finding

The EntityCollection overhead suggests a secondary optimization: batch access APIs that return data without per-entity Python object creation. For example:

  • grid.entity_positions() → numpy array of (x, y) pairs
  • grid.entities_in_rect(x, y, w, h) → filtered list

This would complement the memory pool by reducing Python/C++ boundary crossings.

## Benchmark Results (commit fcc0376) Added `tests/benchmarks/entity_scale_benchmark.py` to establish baseline metrics. ### Current Performance (2,000 entities on 100×100 grid) | Operation | Time | Rate | |-----------|------|------| | Entity Creation | 26.4ms | 75,738/sec | | Direct Iteration | 0.23ms | 8.5M reads/sec | | EntityCollection Iteration | 13.6ms | 147k reads/sec | ### Key Finding: EntityCollection Overhead **Direct C++ iteration** (via `list(grid.entities)` then loop): 0.23ms **EntityCollection iteration** (via `for e in grid.entities`): 13.6ms The **60× slowdown** comes from Python wrapper object creation on each access. Each call to `__next__` in the iterator: 1. Advances the C++ iterator 2. Allocates a new `PyUIEntityObject` 3. Copies the `shared_ptr` 4. Returns to Python ### Projected Gains with Memory Pool | Scenario | Current | With Pool | Speedup | |----------|---------|-----------|---------| | Creation (10k entities) | ~130ms | ~20ms | **6×** | | Iteration (direct) | 2.3ms | ~0.5ms | **5×** | **Mechanism**: Pre-allocate entity memory in contiguous blocks. Reduces malloc overhead and improves CPU cache utilization. ### Additional Finding The EntityCollection overhead suggests a secondary optimization: **batch access APIs** that return data without per-entity Python object creation. For example: - `grid.entity_positions()` → numpy array of (x, y) pairs - `grid.entities_in_rect(x, y, w, h)` → filtered list This would complement the memory pool by reducing Python/C++ boundary crossings.
Author
Owner

Current Performance Assessment

After implementing #159 (iterator fix) and #115 (SpatialHash), the benchmark shows:

Metric Current Performance
Entity Creation 80,000 entities/sec (25ms for 2000)
Iteration 9.6M reads/sec
Range Query 0.003ms (via SpatialHash)
N×N Visibility 1ms for 2000 entities

Analysis

The entity creation rate of 80k/sec is likely sufficient for roguelike games where:

  • Level generation creates hundreds, not thousands, of entities
  • Runtime spawning is infrequent (enemies spawn one at a time)
  • The 25ms for 2000 entities is a one-time level load cost

Bottleneck Investigation

The entity creation overhead comes from:

  1. std::make_shared<UIEntity>() - heap allocation
  2. Python wrapper allocation via tp_alloc
  3. PythonObjectCache registration
  4. UISprite construction

A C++ memory pool would help with #1 but not #2-4 which dominate.

Recommendation

Given the significant gains from #115 (104× speedup on N×N) and the acceptable creation rate, I recommend:

  1. Deferring full pool implementation unless specific bottleneck is identified
  2. Considering bulk API (grid.create_entities(positions)) for level generation if needed
  3. Re-evaluate after real-world profiling in actual game scenarios
## Current Performance Assessment After implementing #159 (iterator fix) and #115 (SpatialHash), the benchmark shows: | Metric | Current Performance | |--------|---------------------| | Entity Creation | 80,000 entities/sec (25ms for 2000) | | Iteration | 9.6M reads/sec | | Range Query | 0.003ms (via SpatialHash) | | N×N Visibility | 1ms for 2000 entities | ### Analysis The entity creation rate of 80k/sec is likely sufficient for roguelike games where: - Level generation creates hundreds, not thousands, of entities - Runtime spawning is infrequent (enemies spawn one at a time) - The 25ms for 2000 entities is a one-time level load cost ### Bottleneck Investigation The entity creation overhead comes from: 1. `std::make_shared<UIEntity>()` - heap allocation 2. Python wrapper allocation via `tp_alloc` 3. PythonObjectCache registration 4. UISprite construction A C++ memory pool would help with #1 but not #2-4 which dominate. ### Recommendation Given the significant gains from #115 (104× speedup on N×N) and the acceptable creation rate, I recommend: 1. **Deferring full pool implementation** unless specific bottleneck is identified 2. **Considering bulk API** (`grid.create_entities(positions)`) for level generation if needed 3. **Re-evaluate after real-world profiling** in actual game scenarios
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
john/McRogueFace#117
No description provided.