Update "Performance-and-Profiling.-"

John McCardle 2026-02-07 22:33:59 +00:00
commit 9c742d4f72

@ -1,226 +1,225 @@
# Performance and Profiling # Performance and Profiling
Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk. Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
## Quick Reference ## Quick Reference
**Related Issues:** **Related Issues:**
- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented) - [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented) - [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented) - [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
- [#115](../issues/115) - SpatialHash Implementation (Open - Tier 1) - [#115](../issues/115) - SpatialHash Implementation (Open)
- [#113](../issues/113) - Batch Operations for Grid (Open - Tier 1) - [#113](../issues/113) - Batch Operations for Grid (Open)
- [#117](../issues/117) - Memory Pool for Entities (Open - Tier 1)
**Key Files:**
**Key Files:** - `src/Profiler.h` - ScopedTimer RAII helper
- `src/Profiler.h` - ScopedTimer RAII helper - `src/ProfilerOverlay.cpp` - F3 overlay visualization
- `src/ProfilerOverlay.cpp` - F3 overlay visualization - `src/GameEngine.h` - ProfilingMetrics struct
- `src/GameEngine.h` - ProfilingMetrics struct
---
---
## Benchmark API
## Benchmark API
The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward.
The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.
### Basic Usage
### Basic Usage
```python
```python import mcrfpy
import mcrfpy
# Start capturing benchmark data
# Start capturing benchmark data mcrfpy.start_benchmark()
mcrfpy.start_benchmark()
# ... run your test scenario ...
# ... run your test scenario ...
# Stop and get the output filename
# Stop and get the output filename filename = mcrfpy.end_benchmark()
filename = mcrfpy.end_benchmark() print(f"Benchmark saved to: {filename}")
print(f"Benchmark saved to: {filename}") ```
# e.g., "benchmark_12345_20250528_143022.json"
``` ### Adding Log Messages
### Adding Log Messages Mark specific events within the benchmark:
Mark specific events within the benchmark: ```python
mcrfpy.start_benchmark()
```python
mcrfpy.start_benchmark() # Your code...
mcrfpy.log_benchmark("Player spawned")
# Your code...
mcrfpy.log_benchmark("Player spawned") # More code...
mcrfpy.log_benchmark("Combat started")
# More code...
mcrfpy.log_benchmark("Combat started") filename = mcrfpy.end_benchmark()
```
filename = mcrfpy.end_benchmark()
``` ### Headless Mode Note
Log messages appear in the `logs` array of each frame in the output JSON. In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module:
### Output Format ```python
import time
The JSON file contains per-frame data:
start = time.perf_counter()
```json # ... operation to measure ...
{ elapsed = time.perf_counter() - start
"frames": [ print(f"Operation took {elapsed*1000:.2f}ms")
{ ```
"frame_number": 1,
"frame_time_ms": 12.5, The benchmark API works best with the normal game loop (non-headless mode).
"grid_render_time_ms": 8.2,
"entity_render_time_ms": 2.1, ### Output Format
"python_time_ms": 1.8,
"logs": ["Player spawned"] The JSON file contains per-frame data:
},
... ```json
], {
"summary": { "frames": [
"total_frames": 1000, {
"avg_frame_time_ms": 14.2, "frame_number": 1,
"max_frame_time_ms": 28.5, "frame_time_ms": 12.5,
"min_frame_time_ms": 8.1 "grid_render_time_ms": 8.2,
} "entity_render_time_ms": 2.1,
} "python_time_ms": 1.8,
``` "logs": ["Player spawned"]
}
### Processing Results ],
"summary": {
Since Python processes results *after* capture, timing overhead doesn't affect measurements: "total_frames": 1000,
"avg_frame_time_ms": 14.2,
```python "max_frame_time_ms": 28.5,
import json "min_frame_time_ms": 8.1
}
def analyze_benchmark(filename): }
with open(filename) as f: ```
data = json.load(f)
### Processing Results
frames = data["frames"]
slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67] ```python
import json
print(f"Total frames: {len(frames)}")
print(f"Slow frames (>16.67ms): {len(slow_frames)}") def analyze_benchmark(filename):
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms") with open(filename) as f:
data = json.load(f)
# Find what was happening during slow frames
for frame in slow_frames[:5]: frames = data["frames"]
print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms") slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
if frame.get("logs"):
print(f" Logs: {frame['logs']}") print(f"Total frames: {len(frames)}")
``` print(f"Slow frames (>16.67ms): {len(slow_frames)}")
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
---
for frame in slow_frames[:5]:
## F3 Profiler Overlay print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
if frame.get("logs"):
**Activation:** Press F3 during gameplay print(f" Logs: {frame['logs']}")
```
**Displays:**
- Frame time (ms) with color coding: ---
- Green: < 16ms (60+ FPS)
- Yellow: 16-33ms (30-60 FPS) ## F3 Profiler Overlay
- Red: > 33ms (< 30 FPS)
- FPS (averaged over 60 frames) **Activation:** Press F3 during gameplay
- Detailed breakdowns:
- Grid rendering time **Displays:**
- Entity rendering time - Frame time (ms) with color coding:
- Python script time - Green: < 16ms (60+ FPS)
- Animation update time - Yellow: 16-33ms (30-60 FPS)
- Per-frame counts: - Red: > 33ms (< 30 FPS)
- Grid cells rendered - FPS (averaged over 60 frames)
- Entities rendered (visible/total) - Detailed breakdowns:
- Draw calls - Grid rendering time
- Entity rendering time
**Implementation:** `src/ProfilerOverlay.cpp` - Python script time
- Animation update time
--- - Per-frame counts:
- Grid cells rendered
## Current Performance - Entities rendered (visible/total)
### Implemented Optimizations **Implementation:** `src/ProfilerOverlay.cpp`
**Chunk-based Rendering** ([#123](../issues/123)): ---
- Large grids divided into chunks (~256 cells each)
- Only visible chunks processed ## Current Performance
- 1000x1000+ grids render efficiently
### Implemented Optimizations
**Dirty Flag Caching** ([#148](../issues/148)):
- Layers track changes per-chunk **Chunk-based Rendering** ([#123](../issues/123)):
- Unchanged chunks reuse cached RenderTexture - Large grids divided into chunks
- Static scenes: near-zero CPU cost after initial render - Only visible chunks processed
- 1000x1000+ grids render efficiently
**Viewport Culling:**
- Only cells/entities within viewport processed **Dirty Flag Caching** ([#148](../issues/148)):
- Camera position and zoom respected - Layers track changes per-chunk
- Unchanged chunks reuse cached RenderTexture
### Current Bottlenecks - Static scenes: near-zero CPU cost after initial render
**Entity Spatial Queries** - O(n) iteration: **Viewport Culling:**
- Finding entities at position requires checking all entities - Only cells/entities within viewport processed
- Becomes noticeable at 500+ entities - Camera position and zoom respected
- **Solution:** [#115](../issues/115) SpatialHash
### Current Bottlenecks
**Bulk Grid Updates** - Python/C++ boundary:
- Many individual `layer.set()` calls are slower than batch operations **Entity Spatial Queries** - O(n) iteration for large counts:
- Each call crosses the Python/C++ boundary - Use `grid.entities_in_radius()` for proximity queries
- **Solution:** [#113](../issues/113) Batch Operations - SpatialHash planned in [#115](../issues/115)
**Entity Allocation** - Memory fragmentation: **Bulk Grid Updates** - Python/C++ boundary:
- Frequent spawn/destroy cycles fragment memory - Many individual `layer.set()` calls are slower than batch operations
- **Solution:** [#117](../issues/117) Memory Pool - Use `layer.fill()` for uniform values
- Batch operations planned in [#113](../issues/113)
---
---
## Optimization Workflow
## Optimization Workflow
1. **Profile**: Press F3, identify which metric is high
2. **Benchmark**: Use `start_benchmark()` to capture detailed data 1. **Profile**: Press F3, identify which metric is high
3. **Analyze**: Process JSON to find patterns in slow frames 2. **Benchmark**: Use `start_benchmark()` to capture detailed data
4. **Optimize**: Make targeted changes 3. **Analyze**: Process JSON to find patterns in slow frames
5. **Verify**: Re-run benchmark, compare results 4. **Optimize**: Make targeted changes
6. **Iterate**: Repeat until acceptable performance 5. **Verify**: Re-run benchmark, compare results
### Performance Targets See [[Performance-Optimization-Workflow]] for the full methodology.
| Metric | Target | Notes | ### Performance Targets
|--------|--------|-------|
| Frame time | < 16.67ms | 60 FPS | | Metric | Target | Notes |
| Grid render | < 5ms | For typical game grids | |--------|--------|-------|
| Entity render | < 2ms | For < 200 entities | | Frame time | < 16.67ms | 60 FPS |
| Python callbacks | < 2ms | Keep logic light | | Grid render | < 5ms | For typical game grids |
| Entity render | < 2ms | For < 200 entities |
--- | Python callbacks | < 2ms | Keep logic light |
## C++ Instrumentation ---
### ScopedTimer (RAII) ## C++ Instrumentation
Automatic timing for code blocks: ### ScopedTimer (RAII)
```cpp Automatic timing for code blocks:
#include "Profiler.h"
```cpp
void expensiveFunction() { #include "Profiler.h"
ScopedTimer timer(Resources::game->metrics.functionTime);
// ... code to measure ... void expensiveFunction() {
// Timer automatically records duration on destruction ScopedTimer timer(Resources::game->metrics.functionTime);
} // ... code to measure ...
``` // Timer automatically records duration on destruction
}
### Adding New Metrics ```
1. Add field to `ProfilingMetrics` in `src/GameEngine.h` ### Adding New Metrics
2. Reset in `resetPerFrame()` if per-frame counter
3. Display in `src/ProfilerOverlay.cpp::update()` 1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
4. Instrument code with ScopedTimer 2. Reset in `resetPerFrame()` if per-frame counter
3. Display in `src/ProfilerOverlay.cpp::update()`
--- 4. Instrument code with ScopedTimer
## Related Systems ---
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags ## Related Systems
- [[Entity-Management]] - Entity performance considerations
- [[Writing-Tests]] - Performance test creation - [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
- [[Performance-Optimization-Workflow]] - Full optimization methodology
--- - [[Entity-Management]] - Entity performance considerations
- [[Writing-Tests]] - Performance test creation
*Last updated: 2025-11-29*