Update Performance and Profiling

John McCardle 2026-02-07 23:50:04 +00:00
commit 0a8c349bae

@ -1,225 +1,225 @@
# Performance and Profiling # Performance and Profiling
Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk. Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
## Quick Reference ## Quick Reference
**Related Issues:** **Related Issues:**
- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented) - [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented) - [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented) - [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
- [#115](../issues/115) - SpatialHash Implementation (Open) - [#115](../issues/115) - SpatialHash Implementation (Open)
- [#113](../issues/113) - Batch Operations for Grid (Open) - [#113](../issues/113) - Batch Operations for Grid (Open)
**Key Files:** **Key Files:**
- `src/Profiler.h` - ScopedTimer RAII helper - `src/Profiler.h` - ScopedTimer RAII helper
- `src/ProfilerOverlay.cpp` - F3 overlay visualization - `src/ProfilerOverlay.cpp` - F3 overlay visualization
- `src/GameEngine.h` - ProfilingMetrics struct - `src/GameEngine.h` - ProfilingMetrics struct
--- ---
## Benchmark API ## Benchmark API
The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward. The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward.
### Basic Usage ### Basic Usage
```python ```python
import mcrfpy import mcrfpy
# Start capturing benchmark data # Start capturing benchmark data
mcrfpy.start_benchmark() mcrfpy.start_benchmark()
# ... run your test scenario ... # ... run your test scenario ...
# Stop and get the output filename # Stop and get the output filename
filename = mcrfpy.end_benchmark() filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}") print(f"Benchmark saved to: {filename}")
``` ```
### Adding Log Messages ### Adding Log Messages
Mark specific events within the benchmark: Mark specific events within the benchmark:
```python ```python
mcrfpy.start_benchmark() mcrfpy.start_benchmark()
# Your code... # Your code...
mcrfpy.log_benchmark("Player spawned") mcrfpy.log_benchmark("Player spawned")
# More code... # More code...
mcrfpy.log_benchmark("Combat started") mcrfpy.log_benchmark("Combat started")
filename = mcrfpy.end_benchmark() filename = mcrfpy.end_benchmark()
``` ```
### Headless Mode Note ### Headless Mode Note
In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module: In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module:
```python ```python
import time import time
start = time.perf_counter() start = time.perf_counter()
# ... operation to measure ... # ... operation to measure ...
elapsed = time.perf_counter() - start elapsed = time.perf_counter() - start
print(f"Operation took {elapsed*1000:.2f}ms") print(f"Operation took {elapsed*1000:.2f}ms")
``` ```
The benchmark API works best with the normal game loop (non-headless mode). The benchmark API works best with the normal game loop (non-headless mode).
### Output Format ### Output Format
The JSON file contains per-frame data: The JSON file contains per-frame data:
```json ```json
{ {
"frames": [ "frames": [
{ {
"frame_number": 1, "frame_number": 1,
"frame_time_ms": 12.5, "frame_time_ms": 12.5,
"grid_render_time_ms": 8.2, "grid_render_time_ms": 8.2,
"entity_render_time_ms": 2.1, "entity_render_time_ms": 2.1,
"python_time_ms": 1.8, "python_time_ms": 1.8,
"logs": ["Player spawned"] "logs": ["Player spawned"]
} }
], ],
"summary": { "summary": {
"total_frames": 1000, "total_frames": 1000,
"avg_frame_time_ms": 14.2, "avg_frame_time_ms": 14.2,
"max_frame_time_ms": 28.5, "max_frame_time_ms": 28.5,
"min_frame_time_ms": 8.1 "min_frame_time_ms": 8.1
} }
} }
``` ```
### Processing Results ### Processing Results
```python ```python
import json import json
def analyze_benchmark(filename): def analyze_benchmark(filename):
with open(filename) as f: with open(filename) as f:
data = json.load(f) data = json.load(f)
frames = data["frames"] frames = data["frames"]
slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67] slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
print(f"Total frames: {len(frames)}") print(f"Total frames: {len(frames)}")
print(f"Slow frames (>16.67ms): {len(slow_frames)}") print(f"Slow frames (>16.67ms): {len(slow_frames)}")
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms") print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
for frame in slow_frames[:5]: for frame in slow_frames[:5]:
print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms") print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
if frame.get("logs"): if frame.get("logs"):
print(f" Logs: {frame['logs']}") print(f" Logs: {frame['logs']}")
``` ```
--- ---
## F3 Profiler Overlay ## F3 Profiler Overlay
**Activation:** Press F3 during gameplay **Activation:** Press F3 during gameplay
**Displays:** **Displays:**
- Frame time (ms) with color coding: - Frame time (ms) with color coding:
- Green: < 16ms (60+ FPS) - Green: < 16ms (60+ FPS)
- Yellow: 16-33ms (30-60 FPS) - Yellow: 16-33ms (30-60 FPS)
- Red: > 33ms (< 30 FPS) - Red: > 33ms (< 30 FPS)
- FPS (averaged over 60 frames) - FPS (averaged over 60 frames)
- Detailed breakdowns: - Detailed breakdowns:
- Grid rendering time - Grid rendering time
- Entity rendering time - Entity rendering time
- Python script time - Python script time
- Animation update time - Animation update time
- Per-frame counts: - Per-frame counts:
- Grid cells rendered - Grid cells rendered
- Entities rendered (visible/total) - Entities rendered (visible/total)
**Implementation:** `src/ProfilerOverlay.cpp` **Implementation:** `src/ProfilerOverlay.cpp`
--- ---
## Current Performance ## Current Performance
### Implemented Optimizations ### Implemented Optimizations
**Chunk-based Rendering** ([#123](../issues/123)): **Chunk-based Rendering** ([#123](../issues/123)):
- Large grids divided into chunks - Large grids divided into chunks
- Only visible chunks processed - Only visible chunks processed
- 1000x1000+ grids render efficiently - 1000x1000+ grids render efficiently
**Dirty Flag Caching** ([#148](../issues/148)): **Dirty Flag Caching** ([#148](../issues/148)):
- Layers track changes per-chunk - Layers track changes per-chunk
- Unchanged chunks reuse cached RenderTexture - Unchanged chunks reuse cached RenderTexture
- Static scenes: near-zero CPU cost after initial render - Static scenes: near-zero CPU cost after initial render
**Viewport Culling:** **Viewport Culling:**
- Only cells/entities within viewport processed - Only cells/entities within viewport processed
- Camera position and zoom respected - Camera position and zoom respected
### Current Bottlenecks ### Current Bottlenecks
**Entity Spatial Queries** - O(n) iteration for large counts: **Entity Spatial Queries** - O(n) iteration for large counts:
- Use `grid.entities_in_radius()` for proximity queries - Use `grid.entities_in_radius()` for proximity queries
- SpatialHash planned in [#115](../issues/115) - SpatialHash planned in [#115](../issues/115)
**Bulk Grid Updates** - Python/C++ boundary: **Bulk Grid Updates** - Python/C++ boundary:
- Many individual `layer.set()` calls are slower than batch operations - Many individual `layer.set()` calls are slower than batch operations
- Use `layer.fill()` for uniform values - Use `layer.fill()` for uniform values
- Batch operations planned in [#113](../issues/113) - Batch operations planned in [#113](../issues/113)
--- ---
## Optimization Workflow ## Optimization Workflow
1. **Profile**: Press F3, identify which metric is high 1. **Profile**: Press F3, identify which metric is high
2. **Benchmark**: Use `start_benchmark()` to capture detailed data 2. **Benchmark**: Use `start_benchmark()` to capture detailed data
3. **Analyze**: Process JSON to find patterns in slow frames 3. **Analyze**: Process JSON to find patterns in slow frames
4. **Optimize**: Make targeted changes 4. **Optimize**: Make targeted changes
5. **Verify**: Re-run benchmark, compare results 5. **Verify**: Re-run benchmark, compare results
See [[Performance-Optimization-Workflow]] for the full methodology. See [[Performance-Optimization-Workflow]] for the full methodology.
### Performance Targets ### Performance Targets
| Metric | Target | Notes | | Metric | Target | Notes |
|--------|--------|-------| |--------|--------|-------|
| Frame time | < 16.67ms | 60 FPS | | Frame time | < 16.67ms | 60 FPS |
| Grid render | < 5ms | For typical game grids | | Grid render | < 5ms | For typical game grids |
| Entity render | < 2ms | For < 200 entities | | Entity render | < 2ms | For < 200 entities |
| Python callbacks | < 2ms | Keep logic light | | Python callbacks | < 2ms | Keep logic light |
--- ---
## C++ Instrumentation ## C++ Instrumentation
### ScopedTimer (RAII) ### ScopedTimer (RAII)
Automatic timing for code blocks: Automatic timing for code blocks:
```cpp ```cpp
#include "Profiler.h" #include "Profiler.h"
void expensiveFunction() { void expensiveFunction() {
ScopedTimer timer(Resources::game->metrics.functionTime); ScopedTimer timer(Resources::game->metrics.functionTime);
// ... code to measure ... // ... code to measure ...
// Timer automatically records duration on destruction // Timer automatically records duration on destruction
} }
``` ```
### Adding New Metrics ### Adding New Metrics
1. Add field to `ProfilingMetrics` in `src/GameEngine.h` 1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
2. Reset in `resetPerFrame()` if per-frame counter 2. Reset in `resetPerFrame()` if per-frame counter
3. Display in `src/ProfilerOverlay.cpp::update()` 3. Display in `src/ProfilerOverlay.cpp::update()`
4. Instrument code with ScopedTimer 4. Instrument code with ScopedTimer
--- ---
## Related Systems ## Related Systems
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags - [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
- [[Performance-Optimization-Workflow]] - Full optimization methodology - [[Performance-Optimization-Workflow]] - Full optimization methodology
- [[Entity-Management]] - Entity performance considerations - [[Entity-Management]] - Entity performance considerations
- [[Writing-Tests]] - Performance test creation - [[Writing-Tests]] - Performance test creation