Update "Performance-and-Profiling.-"

2026-02-07 22:33:59 +00:00 · 2026-02-07 22:33:59 +00:00 · 9c742d4f72
commit 9c742d4f72
parent 070d5bb813
1 changed files with 225 additions and 226 deletions
--- a/Performance-and-Profiling.-.md
+++ b/Performance-and-Profiling.-.md
@ -8,9 +8,8 @@ Performance monitoring and optimization infrastructure for McRogueFace. Press F3
 - [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
 - [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
 - [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
- [#115](../issues/115) - SpatialHash Implementation (Open - Tier 1)
- [#113](../issues/113) - Batch Operations for Grid (Open - Tier 1)
- [#117](../issues/117) - Memory Pool for Entities (Open - Tier 1)
+- [#115](../issues/115) - SpatialHash Implementation (Open)
+- [#113](../issues/113) - Batch Operations for Grid (Open)

 **Key Files:**
 - `src/Profiler.h` - ScopedTimer RAII helper
@ -21,7 +20,7 @@ Performance monitoring and optimization infrastructure for McRogueFace. Press F3

 ## Benchmark API

-The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.
+The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward.

 ### Basic Usage

@ -36,7 +35,6 @@ mcrfpy.start_benchmark()
 # Stop and get the output filename
 filename = mcrfpy.end_benchmark()
 print(f"Benchmark saved to: {filename}")
-# e.g., "benchmark_12345_20250528_143022.json"
 ```

 ### Adding Log Messages
@ -55,7 +53,20 @@ mcrfpy.log_benchmark("Combat started")
 filename = mcrfpy.end_benchmark()
 ```

-Log messages appear in the `logs` array of each frame in the output JSON.
+### Headless Mode Note
+
+In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module:
+
+```python
+import time
+
+start = time.perf_counter()
+# ... operation to measure ...
+elapsed = time.perf_counter() - start
+print(f"Operation took {elapsed*1000:.2f}ms")
+```
+
+The benchmark API works best with the normal game loop (non-headless mode).

 ### Output Format

@ -71,8 +82,7 @@ The JSON file contains per-frame data:
      "entity_render_time_ms": 2.1,
      "python_time_ms": 1.8,
      "logs": ["Player spawned"]
-    },
-    ...
+    }
  ],
  "summary": {
    "total_frames": 1000,
@ -85,8 +95,6 @@ The JSON file contains per-frame data:

 ### Processing Results

-Since Python processes results *after* capture, timing overhead doesn't affect measurements:
-
 ```python
 import json

@ -101,7 +109,6 @@ def analyze_benchmark(filename):
    print(f"Slow frames (>16.67ms): {len(slow_frames)}")
    print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")

-    # Find what was happening during slow frames
    for frame in slow_frames[:5]:
        print(f"  Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
        if frame.get("logs"):
@ -128,7 +135,6 @@ def analyze_benchmark(filename):
 - Per-frame counts:
  - Grid cells rendered
  - Entities rendered (visible/total)
-  - Draw calls

 **Implementation:** `src/ProfilerOverlay.cpp`

@ -139,7 +145,7 @@ def analyze_benchmark(filename):
 ### Implemented Optimizations

 **Chunk-based Rendering** ([#123](../issues/123)):
- Large grids divided into chunks (~256 cells each)
+- Large grids divided into chunks
 - Only visible chunks processed
 - 1000x1000+ grids render efficiently

@ -154,19 +160,14 @@ def analyze_benchmark(filename):

 ### Current Bottlenecks

-**Entity Spatial Queries** - O(n) iteration:
- Finding entities at position requires checking all entities
- Becomes noticeable at 500+ entities
- **Solution:** [#115](../issues/115) SpatialHash
+**Entity Spatial Queries** - O(n) iteration for large counts:
+- Use `grid.entities_in_radius()` for proximity queries
+- SpatialHash planned in [#115](../issues/115)

 **Bulk Grid Updates** - Python/C++ boundary:
 - Many individual `layer.set()` calls are slower than batch operations
- Each call crosses the Python/C++ boundary
- **Solution:** [#113](../issues/113) Batch Operations
-
-**Entity Allocation** - Memory fragmentation:
- Frequent spawn/destroy cycles fragment memory
- **Solution:** [#117](../issues/117) Memory Pool
+- Use `layer.fill()` for uniform values
+- Batch operations planned in [#113](../issues/113)

 ---

@ -177,7 +178,8 @@ def analyze_benchmark(filename):
 3. **Analyze**: Process JSON to find patterns in slow frames
 4. **Optimize**: Make targeted changes
 5. **Verify**: Re-run benchmark, compare results
-6. **Iterate**: Repeat until acceptable performance
+
+See [[Performance-Optimization-Workflow]] for the full methodology.

 ### Performance Targets

@ -218,9 +220,6 @@ void expensiveFunction() {
 ## Related Systems

 - [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
+- [[Performance-Optimization-Workflow]] - Full optimization methodology
 - [[Entity-Management]] - Entity performance considerations
 - [[Writing-Tests]] - Performance test creation
-
---
-
-*Last updated: 2025-11-29*