From 0a8c349bae0b80c4fabce3672cf5c4b6015b3bb4 Mon Sep 17 00:00:00 2001 From: John McCardle Date: Sat, 7 Feb 2026 23:50:04 +0000 Subject: [PATCH] Update Performance and Profiling --- ...iling.-.md => Performance-and-Profiling.md | 448 +++++++++--------- 1 file changed, 224 insertions(+), 224 deletions(-) rename Performance-and-Profiling.-.md => Performance-and-Profiling.md (95%) diff --git a/Performance-and-Profiling.-.md b/Performance-and-Profiling.md similarity index 95% rename from Performance-and-Profiling.-.md rename to Performance-and-Profiling.md index de48253..95fdb0e 100644 --- a/Performance-and-Profiling.-.md +++ b/Performance-and-Profiling.md @@ -1,225 +1,225 @@ -# Performance and Profiling - -Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk. - -## Quick Reference - -**Related Issues:** -- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented) -- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented) -- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented) -- [#115](../issues/115) - SpatialHash Implementation (Open) -- [#113](../issues/113) - Batch Operations for Grid (Open) - -**Key Files:** -- `src/Profiler.h` - ScopedTimer RAII helper -- `src/ProfilerOverlay.cpp` - F3 overlay visualization -- `src/GameEngine.h` - ProfilingMetrics struct - ---- - -## Benchmark API - -The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward. - -### Basic Usage - -```python -import mcrfpy - -# Start capturing benchmark data -mcrfpy.start_benchmark() - -# ... run your test scenario ... - -# Stop and get the output filename -filename = mcrfpy.end_benchmark() -print(f"Benchmark saved to: {filename}") -``` - -### Adding Log Messages - -Mark specific events within the benchmark: - -```python -mcrfpy.start_benchmark() - -# Your code... -mcrfpy.log_benchmark("Player spawned") - -# More code... -mcrfpy.log_benchmark("Combat started") - -filename = mcrfpy.end_benchmark() -``` - -### Headless Mode Note - -In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module: - -```python -import time - -start = time.perf_counter() -# ... operation to measure ... -elapsed = time.perf_counter() - start -print(f"Operation took {elapsed*1000:.2f}ms") -``` - -The benchmark API works best with the normal game loop (non-headless mode). - -### Output Format - -The JSON file contains per-frame data: - -```json -{ - "frames": [ - { - "frame_number": 1, - "frame_time_ms": 12.5, - "grid_render_time_ms": 8.2, - "entity_render_time_ms": 2.1, - "python_time_ms": 1.8, - "logs": ["Player spawned"] - } - ], - "summary": { - "total_frames": 1000, - "avg_frame_time_ms": 14.2, - "max_frame_time_ms": 28.5, - "min_frame_time_ms": 8.1 - } -} -``` - -### Processing Results - -```python -import json - -def analyze_benchmark(filename): - with open(filename) as f: - data = json.load(f) - - frames = data["frames"] - slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67] - - print(f"Total frames: {len(frames)}") - print(f"Slow frames (>16.67ms): {len(slow_frames)}") - print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms") - - for frame in slow_frames[:5]: - print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms") - if frame.get("logs"): - print(f" Logs: {frame['logs']}") -``` - ---- - -## F3 Profiler Overlay - -**Activation:** Press F3 during gameplay - -**Displays:** -- Frame time (ms) with color coding: - - Green: < 16ms (60+ FPS) - - Yellow: 16-33ms (30-60 FPS) - - Red: > 33ms (< 30 FPS) -- FPS (averaged over 60 frames) -- Detailed breakdowns: - - Grid rendering time - - Entity rendering time - - Python script time - - Animation update time -- Per-frame counts: - - Grid cells rendered - - Entities rendered (visible/total) - -**Implementation:** `src/ProfilerOverlay.cpp` - ---- - -## Current Performance - -### Implemented Optimizations - -**Chunk-based Rendering** ([#123](../issues/123)): -- Large grids divided into chunks -- Only visible chunks processed -- 1000x1000+ grids render efficiently - -**Dirty Flag Caching** ([#148](../issues/148)): -- Layers track changes per-chunk -- Unchanged chunks reuse cached RenderTexture -- Static scenes: near-zero CPU cost after initial render - -**Viewport Culling:** -- Only cells/entities within viewport processed -- Camera position and zoom respected - -### Current Bottlenecks - -**Entity Spatial Queries** - O(n) iteration for large counts: -- Use `grid.entities_in_radius()` for proximity queries -- SpatialHash planned in [#115](../issues/115) - -**Bulk Grid Updates** - Python/C++ boundary: -- Many individual `layer.set()` calls are slower than batch operations -- Use `layer.fill()` for uniform values -- Batch operations planned in [#113](../issues/113) - ---- - -## Optimization Workflow - -1. **Profile**: Press F3, identify which metric is high -2. **Benchmark**: Use `start_benchmark()` to capture detailed data -3. **Analyze**: Process JSON to find patterns in slow frames -4. **Optimize**: Make targeted changes -5. **Verify**: Re-run benchmark, compare results - -See [[Performance-Optimization-Workflow]] for the full methodology. - -### Performance Targets - -| Metric | Target | Notes | -|--------|--------|-------| -| Frame time | < 16.67ms | 60 FPS | -| Grid render | < 5ms | For typical game grids | -| Entity render | < 2ms | For < 200 entities | -| Python callbacks | < 2ms | Keep logic light | - ---- - -## C++ Instrumentation - -### ScopedTimer (RAII) - -Automatic timing for code blocks: - -```cpp -#include "Profiler.h" - -void expensiveFunction() { - ScopedTimer timer(Resources::game->metrics.functionTime); - // ... code to measure ... - // Timer automatically records duration on destruction -} -``` - -### Adding New Metrics - -1. Add field to `ProfilingMetrics` in `src/GameEngine.h` -2. Reset in `resetPerFrame()` if per-frame counter -3. Display in `src/ProfilerOverlay.cpp::update()` -4. Instrument code with ScopedTimer - ---- - -## Related Systems - -- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags -- [[Performance-Optimization-Workflow]] - Full optimization methodology -- [[Entity-Management]] - Entity performance considerations +# Performance and Profiling + +Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk. + +## Quick Reference + +**Related Issues:** +- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented) +- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented) +- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented) +- [#115](../issues/115) - SpatialHash Implementation (Open) +- [#113](../issues/113) - Batch Operations for Grid (Open) + +**Key Files:** +- `src/Profiler.h` - ScopedTimer RAII helper +- `src/ProfilerOverlay.cpp` - F3 overlay visualization +- `src/GameEngine.h` - ProfilingMetrics struct + +--- + +## Benchmark API + +The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward. + +### Basic Usage + +```python +import mcrfpy + +# Start capturing benchmark data +mcrfpy.start_benchmark() + +# ... run your test scenario ... + +# Stop and get the output filename +filename = mcrfpy.end_benchmark() +print(f"Benchmark saved to: {filename}") +``` + +### Adding Log Messages + +Mark specific events within the benchmark: + +```python +mcrfpy.start_benchmark() + +# Your code... +mcrfpy.log_benchmark("Player spawned") + +# More code... +mcrfpy.log_benchmark("Combat started") + +filename = mcrfpy.end_benchmark() +``` + +### Headless Mode Note + +In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module: + +```python +import time + +start = time.perf_counter() +# ... operation to measure ... +elapsed = time.perf_counter() - start +print(f"Operation took {elapsed*1000:.2f}ms") +``` + +The benchmark API works best with the normal game loop (non-headless mode). + +### Output Format + +The JSON file contains per-frame data: + +```json +{ + "frames": [ + { + "frame_number": 1, + "frame_time_ms": 12.5, + "grid_render_time_ms": 8.2, + "entity_render_time_ms": 2.1, + "python_time_ms": 1.8, + "logs": ["Player spawned"] + } + ], + "summary": { + "total_frames": 1000, + "avg_frame_time_ms": 14.2, + "max_frame_time_ms": 28.5, + "min_frame_time_ms": 8.1 + } +} +``` + +### Processing Results + +```python +import json + +def analyze_benchmark(filename): + with open(filename) as f: + data = json.load(f) + + frames = data["frames"] + slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67] + + print(f"Total frames: {len(frames)}") + print(f"Slow frames (>16.67ms): {len(slow_frames)}") + print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms") + + for frame in slow_frames[:5]: + print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms") + if frame.get("logs"): + print(f" Logs: {frame['logs']}") +``` + +--- + +## F3 Profiler Overlay + +**Activation:** Press F3 during gameplay + +**Displays:** +- Frame time (ms) with color coding: + - Green: < 16ms (60+ FPS) + - Yellow: 16-33ms (30-60 FPS) + - Red: > 33ms (< 30 FPS) +- FPS (averaged over 60 frames) +- Detailed breakdowns: + - Grid rendering time + - Entity rendering time + - Python script time + - Animation update time +- Per-frame counts: + - Grid cells rendered + - Entities rendered (visible/total) + +**Implementation:** `src/ProfilerOverlay.cpp` + +--- + +## Current Performance + +### Implemented Optimizations + +**Chunk-based Rendering** ([#123](../issues/123)): +- Large grids divided into chunks +- Only visible chunks processed +- 1000x1000+ grids render efficiently + +**Dirty Flag Caching** ([#148](../issues/148)): +- Layers track changes per-chunk +- Unchanged chunks reuse cached RenderTexture +- Static scenes: near-zero CPU cost after initial render + +**Viewport Culling:** +- Only cells/entities within viewport processed +- Camera position and zoom respected + +### Current Bottlenecks + +**Entity Spatial Queries** - O(n) iteration for large counts: +- Use `grid.entities_in_radius()` for proximity queries +- SpatialHash planned in [#115](../issues/115) + +**Bulk Grid Updates** - Python/C++ boundary: +- Many individual `layer.set()` calls are slower than batch operations +- Use `layer.fill()` for uniform values +- Batch operations planned in [#113](../issues/113) + +--- + +## Optimization Workflow + +1. **Profile**: Press F3, identify which metric is high +2. **Benchmark**: Use `start_benchmark()` to capture detailed data +3. **Analyze**: Process JSON to find patterns in slow frames +4. **Optimize**: Make targeted changes +5. **Verify**: Re-run benchmark, compare results + +See [[Performance-Optimization-Workflow]] for the full methodology. + +### Performance Targets + +| Metric | Target | Notes | +|--------|--------|-------| +| Frame time | < 16.67ms | 60 FPS | +| Grid render | < 5ms | For typical game grids | +| Entity render | < 2ms | For < 200 entities | +| Python callbacks | < 2ms | Keep logic light | + +--- + +## C++ Instrumentation + +### ScopedTimer (RAII) + +Automatic timing for code blocks: + +```cpp +#include "Profiler.h" + +void expensiveFunction() { + ScopedTimer timer(Resources::game->metrics.functionTime); + // ... code to measure ... + // Timer automatically records duration on destruction +} +``` + +### Adding New Metrics + +1. Add field to `ProfilingMetrics` in `src/GameEngine.h` +2. Reset in `resetPerFrame()` if per-frame counter +3. Display in `src/ProfilerOverlay.cpp::update()` +4. Instrument code with ScopedTimer + +--- + +## Related Systems + +- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags +- [[Performance-Optimization-Workflow]] - Full optimization methodology +- [[Entity-Management]] - Entity performance considerations - [[Writing-Tests]] - Performance test creation \ No newline at end of file