Update "Performance-and-Profiling.-"

2026-02-07 22:33:59 +00:00 · 2026-02-07 22:33:59 +00:00 · 9c742d4f72
commit 9c742d4f72
parent 070d5bb813
1 changed files with 225 additions and 226 deletions
--- a/Performance-and-Profiling.-.md
+++ b/Performance-and-Profiling.-.md
@ -1,226 +1,225 @@
-# Performance and Profiling
+# Performance and Profiling
-
+
-Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
+Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
-
+
-## Quick Reference
+## Quick Reference
-
+
-**Related Issues:**
+**Related Issues:**
- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
+- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
+- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
+- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
- [#115](../issues/115) - SpatialHash Implementation (Open - Tier 1)
+- [#115](../issues/115) - SpatialHash Implementation (Open)
- [#113](../issues/113) - Batch Operations for Grid (Open - Tier 1)
+- [#113](../issues/113) - Batch Operations for Grid (Open)
- [#117](../issues/117) - Memory Pool for Entities (Open - Tier 1)
+
-
+**Key Files:**
-**Key Files:**
+- `src/Profiler.h` - ScopedTimer RAII helper
- `src/Profiler.h` - ScopedTimer RAII helper
+- `src/ProfilerOverlay.cpp` - F3 overlay visualization
- `src/ProfilerOverlay.cpp` - F3 overlay visualization
+- `src/GameEngine.h` - ProfilingMetrics struct
- `src/GameEngine.h` - ProfilingMetrics struct
+
-
+---
---
+
-
+## Benchmark API
-## Benchmark API
+
-
+The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward.
-The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.
+
-
+### Basic Usage
-### Basic Usage
+
-
+```python
-```python
+import mcrfpy
-import mcrfpy
+
-
+# Start capturing benchmark data
-# Start capturing benchmark data
+mcrfpy.start_benchmark()
-mcrfpy.start_benchmark()
+
-
+# ... run your test scenario ...
-# ... run your test scenario ...
+
-
+# Stop and get the output filename
-# Stop and get the output filename
+filename = mcrfpy.end_benchmark()
-filename = mcrfpy.end_benchmark()
+print(f"Benchmark saved to: {filename}")
-print(f"Benchmark saved to: {filename}")
+```
-# e.g., "benchmark_12345_20250528_143022.json"
+
-```
+### Adding Log Messages
-
+
-### Adding Log Messages
+Mark specific events within the benchmark:
-
+
-Mark specific events within the benchmark:
+```python
-
+mcrfpy.start_benchmark()
-```python
+
-mcrfpy.start_benchmark()
+# Your code...
-
+mcrfpy.log_benchmark("Player spawned")
-# Your code...
+
-mcrfpy.log_benchmark("Player spawned")
+# More code...
-
+mcrfpy.log_benchmark("Combat started")
-# More code...
+
-mcrfpy.log_benchmark("Combat started")
+filename = mcrfpy.end_benchmark()
-
+```
-filename = mcrfpy.end_benchmark()
+
-```
+### Headless Mode Note
-
+
-Log messages appear in the `logs` array of each frame in the output JSON.
+In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module:
-
+
-### Output Format
+```python
-
+import time
-The JSON file contains per-frame data:
+
-
+start = time.perf_counter()
-```json
+# ... operation to measure ...
-{
+elapsed = time.perf_counter() - start
-  "frames": [
+print(f"Operation took {elapsed*1000:.2f}ms")
-    {
+```
-      "frame_number": 1,
+
-      "frame_time_ms": 12.5,
+The benchmark API works best with the normal game loop (non-headless mode).
-      "grid_render_time_ms": 8.2,
+
-      "entity_render_time_ms": 2.1,
+### Output Format
-      "python_time_ms": 1.8,
+
-      "logs": ["Player spawned"]
+The JSON file contains per-frame data:
-    },
+
-    ...
+```json
-  ],
+{
-  "summary": {
+  "frames": [
-    "total_frames": 1000,
+    {
-    "avg_frame_time_ms": 14.2,
+      "frame_number": 1,
-    "max_frame_time_ms": 28.5,
+      "frame_time_ms": 12.5,
-    "min_frame_time_ms": 8.1
+      "grid_render_time_ms": 8.2,
-  }
+      "entity_render_time_ms": 2.1,
-}
+      "python_time_ms": 1.8,
-```
+      "logs": ["Player spawned"]
-
+    }
-### Processing Results
+  ],
-
+  "summary": {
-Since Python processes results *after* capture, timing overhead doesn't affect measurements:
+    "total_frames": 1000,
-
+    "avg_frame_time_ms": 14.2,
-```python
+    "max_frame_time_ms": 28.5,
-import json
+    "min_frame_time_ms": 8.1
-
+  }
-def analyze_benchmark(filename):
+}
-    with open(filename) as f:
+```
-        data = json.load(f)
+
-
+### Processing Results
-    frames = data["frames"]
+
-    slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
+```python
-
+import json
-    print(f"Total frames: {len(frames)}")
+
-    print(f"Slow frames (>16.67ms): {len(slow_frames)}")
+def analyze_benchmark(filename):
-    print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
+    with open(filename) as f:
-
+        data = json.load(f)
-    # Find what was happening during slow frames
+
-    for frame in slow_frames[:5]:
+    frames = data["frames"]
-        print(f"  Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
+    slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
-        if frame.get("logs"):
+
-            print(f"    Logs: {frame['logs']}")
+    print(f"Total frames: {len(frames)}")
-```
+    print(f"Slow frames (>16.67ms): {len(slow_frames)}")
-
+    print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
---
+
-
+    for frame in slow_frames[:5]:
-## F3 Profiler Overlay
+        print(f"  Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
-
+        if frame.get("logs"):
-**Activation:** Press F3 during gameplay
+            print(f"    Logs: {frame['logs']}")
-
+```
-**Displays:**
+
- Frame time (ms) with color coding:
+---
-  - Green: < 16ms (60+ FPS)
+
-  - Yellow: 16-33ms (30-60 FPS)
+## F3 Profiler Overlay
-  - Red: > 33ms (< 30 FPS)
+
- FPS (averaged over 60 frames)
+**Activation:** Press F3 during gameplay
- Detailed breakdowns:
+
-  - Grid rendering time
+**Displays:**
-  - Entity rendering time
+- Frame time (ms) with color coding:
-  - Python script time
+  - Green: < 16ms (60+ FPS)
-  - Animation update time
+  - Yellow: 16-33ms (30-60 FPS)
- Per-frame counts:
+  - Red: > 33ms (< 30 FPS)
-  - Grid cells rendered
+- FPS (averaged over 60 frames)
-  - Entities rendered (visible/total)
+- Detailed breakdowns:
-  - Draw calls
+  - Grid rendering time
-
+  - Entity rendering time
-**Implementation:** `src/ProfilerOverlay.cpp`
+  - Python script time
-
+  - Animation update time
---
+- Per-frame counts:
-
+  - Grid cells rendered
-## Current Performance
+  - Entities rendered (visible/total)
-
+
-### Implemented Optimizations
+**Implementation:** `src/ProfilerOverlay.cpp`
-
+
-**Chunk-based Rendering** ([#123](../issues/123)):
+---
- Large grids divided into chunks (~256 cells each)
+
- Only visible chunks processed
+## Current Performance
- 1000x1000+ grids render efficiently
+
-
+### Implemented Optimizations
-**Dirty Flag Caching** ([#148](../issues/148)):
+
- Layers track changes per-chunk
+**Chunk-based Rendering** ([#123](../issues/123)):
- Unchanged chunks reuse cached RenderTexture
+- Large grids divided into chunks
- Static scenes: near-zero CPU cost after initial render
+- Only visible chunks processed
-
+- 1000x1000+ grids render efficiently
-**Viewport Culling:**
+
- Only cells/entities within viewport processed
+**Dirty Flag Caching** ([#148](../issues/148)):
- Camera position and zoom respected
+- Layers track changes per-chunk
-
+- Unchanged chunks reuse cached RenderTexture
-### Current Bottlenecks
+- Static scenes: near-zero CPU cost after initial render
-
+
-**Entity Spatial Queries** - O(n) iteration:
+**Viewport Culling:**
- Finding entities at position requires checking all entities
+- Only cells/entities within viewport processed
- Becomes noticeable at 500+ entities
+- Camera position and zoom respected
- **Solution:** [#115](../issues/115) SpatialHash
+
-
+### Current Bottlenecks
-**Bulk Grid Updates** - Python/C++ boundary:
+
- Many individual `layer.set()` calls are slower than batch operations
+**Entity Spatial Queries** - O(n) iteration for large counts:
- Each call crosses the Python/C++ boundary
+- Use `grid.entities_in_radius()` for proximity queries
- **Solution:** [#113](../issues/113) Batch Operations
+- SpatialHash planned in [#115](../issues/115)
-
+
-**Entity Allocation** - Memory fragmentation:
+**Bulk Grid Updates** - Python/C++ boundary:
- Frequent spawn/destroy cycles fragment memory
+- Many individual `layer.set()` calls are slower than batch operations
- **Solution:** [#117](../issues/117) Memory Pool
+- Use `layer.fill()` for uniform values
-
+- Batch operations planned in [#113](../issues/113)
---
+
-
+---
-## Optimization Workflow
+
-
+## Optimization Workflow
-1. **Profile**: Press F3, identify which metric is high
+
-2. **Benchmark**: Use `start_benchmark()` to capture detailed data
+1. **Profile**: Press F3, identify which metric is high
-3. **Analyze**: Process JSON to find patterns in slow frames
+2. **Benchmark**: Use `start_benchmark()` to capture detailed data
-4. **Optimize**: Make targeted changes
+3. **Analyze**: Process JSON to find patterns in slow frames
-5. **Verify**: Re-run benchmark, compare results
+4. **Optimize**: Make targeted changes
-6. **Iterate**: Repeat until acceptable performance
+5. **Verify**: Re-run benchmark, compare results
-
+
-### Performance Targets
+See [[Performance-Optimization-Workflow]] for the full methodology.
-
+
-| Metric | Target | Notes |
+### Performance Targets
-|--------|--------|-------|
+
-| Frame time | < 16.67ms | 60 FPS |
+| Metric | Target | Notes |
-| Grid render | < 5ms | For typical game grids |
+|--------|--------|-------|
-| Entity render | < 2ms | For < 200 entities |
+| Frame time | < 16.67ms | 60 FPS |
-| Python callbacks | < 2ms | Keep logic light |
+| Grid render | < 5ms | For typical game grids |
-
+| Entity render | < 2ms | For < 200 entities |
---
+| Python callbacks | < 2ms | Keep logic light |
-
+
-## C++ Instrumentation
+---
-
+
-### ScopedTimer (RAII)
+## C++ Instrumentation
-
+
-Automatic timing for code blocks:
+### ScopedTimer (RAII)
-
+
-```cpp
+Automatic timing for code blocks:
-#include "Profiler.h"
+
-
+```cpp
-void expensiveFunction() {
+#include "Profiler.h"
-    ScopedTimer timer(Resources::game->metrics.functionTime);
+
-    // ... code to measure ...
+void expensiveFunction() {
-    // Timer automatically records duration on destruction
+    ScopedTimer timer(Resources::game->metrics.functionTime);
-}
+    // ... code to measure ...
-```
+    // Timer automatically records duration on destruction
-
+}
-### Adding New Metrics
+```
-
+
-1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
+### Adding New Metrics
-2. Reset in `resetPerFrame()` if per-frame counter
+
-3. Display in `src/ProfilerOverlay.cpp::update()`
+1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
-4. Instrument code with ScopedTimer
+2. Reset in `resetPerFrame()` if per-frame counter
-
+3. Display in `src/ProfilerOverlay.cpp::update()`
---
+4. Instrument code with ScopedTimer
-
+
-## Related Systems
+---
-
+
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
+## Related Systems
- [[Entity-Management]] - Entity performance considerations
+
- [[Writing-Tests]] - Performance test creation
+- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
-
+- [[Performance-Optimization-Workflow]] - Full optimization methodology
---
+- [[Entity-Management]] - Entity performance considerations
-
+- [[Writing-Tests]] - Performance test creation
 *Last updated: 2025-11-29*