Update Performance Optimization Workflow

John McCardle 2026-02-07 23:49:36 +00:00
commit 7f22636ca7

@ -1,255 +1,255 @@
# Performance Optimization Workflow # Performance Optimization Workflow
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
## Quick Reference ## Quick Reference
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] **Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
**Tools:** **Tools:**
- F3 profiler overlay (in-game) - F3 profiler overlay (in-game)
- `src/Profiler.h` - ScopedTimer - `src/Profiler.h` - ScopedTimer
- Benchmark API (`mcrfpy.start_benchmark()`) - Benchmark API (`mcrfpy.start_benchmark()`)
## The Optimization Cycle ## The Optimization Cycle
``` ```
1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY 1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
^ | ^ |
+---------------------------------------------------------+ +---------------------------------------------------------+
``` ```
--- ---
## Step 1: Profile - Find the Bottleneck ## Step 1: Profile - Find the Bottleneck
### Using F3 Overlay ### Using F3 Overlay
**Start the game and press F3:** **Start the game and press F3:**
Look for: Look for:
- **Red frame times** (>33ms) - Unacceptable performance - **Red frame times** (>33ms) - Unacceptable performance
- **Yellow frame times** (16-33ms) - Marginal performance - **Yellow frame times** (16-33ms) - Marginal performance
- **High subsystem times** - Which system is slow? - **High subsystem times** - Which system is slow?
- Grid rendering > 10ms? Grid optimization needed - Grid rendering > 10ms? Grid optimization needed
- Entity rendering > 5ms? Entity culling needed - Entity rendering > 5ms? Entity culling needed
- Python script time > 5ms? Python callback optimization - Python script time > 5ms? Python callback optimization
### Running Benchmarks ### Running Benchmarks
Use the benchmark API for detailed data capture: Use the benchmark API for detailed data capture:
```python ```python
import mcrfpy import mcrfpy
mcrfpy.start_benchmark() mcrfpy.start_benchmark()
# ... run test scenario ... # ... run test scenario ...
filename = mcrfpy.end_benchmark() filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}") print(f"Benchmark saved to: {filename}")
``` ```
See [[Performance-and-Profiling]] for benchmark output format and analysis. See [[Performance-and-Profiling]] for benchmark output format and analysis.
--- ---
## Step 2: Identify - Understand the Problem ## Step 2: Identify - Understand the Problem
### Common Performance Issues ### Common Performance Issues
**Issue: High Grid Render Time on Static Screens** **Issue: High Grid Render Time on Static Screens**
- **Symptom:** 20-40ms grid render, nothing changing - **Symptom:** 20-40ms grid render, nothing changing
- **Cause:** Redrawing unchanged cells every frame - **Cause:** Redrawing unchanged cells every frame
- **Status:** Solved by chunk-based dirty flag system - **Status:** Solved by chunk-based dirty flag system
**Issue: High Entity Render Time with Many Entities** **Issue: High Entity Render Time with Many Entities**
- **Symptom:** 10-20ms entity render with 500+ entities - **Symptom:** 10-20ms entity render with 500+ entities
- **Cause:** O(n) iteration, no spatial indexing - **Cause:** O(n) iteration, no spatial indexing
- **Solution:** [#115](../issues/115) SpatialHash (planned) - **Solution:** [#115](../issues/115) SpatialHash (planned)
**Issue: Slow Bulk Grid Updates from Python** **Issue: Slow Bulk Grid Updates from Python**
- **Symptom:** Frame drops when updating many cells - **Symptom:** Frame drops when updating many cells
- **Cause:** Python/C++ boundary crossings for each cell - **Cause:** Python/C++ boundary crossings for each cell
- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data - **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
**Issue: High Python Script Time** **Issue: High Python Script Time**
- **Symptom:** 10-50ms in Python callbacks - **Symptom:** 10-50ms in Python callbacks
- **Cause:** Heavy computation in Python update loops - **Cause:** Heavy computation in Python update loops
- **Solution:** Move hot paths to C++ or optimize Python - **Solution:** Move hot paths to C++ or optimize Python
--- ---
## Step 3: Instrument - Measure Precisely ## Step 3: Instrument - Measure Precisely
### Adding ScopedTimer (C++) ### Adding ScopedTimer (C++)
Wrap slow functions with timing: Wrap slow functions with timing:
```cpp ```cpp
#include "Profiler.h" #include "Profiler.h"
void MySystem::slowFunction() { void MySystem::slowFunction() {
ScopedTimer timer(Resources::game->metrics.mySystemTime); ScopedTimer timer(Resources::game->metrics.mySystemTime);
// ... code to measure ... // ... code to measure ...
} }
``` ```
### Adding Custom Metrics (C++) ### Adding Custom Metrics (C++)
1. Add field to `ProfilingMetrics` in `src/GameEngine.h` 1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
2. Reset in `resetPerFrame()` 2. Reset in `resetPerFrame()`
3. Display in `src/ProfilerOverlay.cpp::update()` 3. Display in `src/ProfilerOverlay.cpp::update()`
4. Instrument with ScopedTimer 4. Instrument with ScopedTimer
5. Rebuild and press F3 5. Rebuild and press F3
### Creating Python Benchmarks ### Creating Python Benchmarks
```python ```python
import mcrfpy import mcrfpy
import sys import sys
def benchmark(): def benchmark():
scene = mcrfpy.Scene("bench") scene = mcrfpy.Scene("bench")
grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600)) grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
scene.children.append(grid) scene.children.append(grid)
mcrfpy.current_scene = scene mcrfpy.current_scene = scene
frame_times = [] frame_times = []
def measure(timer, runtime): def measure(timer, runtime):
frame_times.append(runtime) frame_times.append(runtime)
if len(frame_times) >= 300: if len(frame_times) >= 300:
avg = sum(frame_times) / len(frame_times) avg = sum(frame_times) / len(frame_times)
print(f"Average: {avg:.2f}ms") print(f"Average: {avg:.2f}ms")
print(f"Min: {min(frame_times):.2f}ms") print(f"Min: {min(frame_times):.2f}ms")
print(f"Max: {max(frame_times):.2f}ms") print(f"Max: {max(frame_times):.2f}ms")
print(f"FPS: {1000/avg:.1f}") print(f"FPS: {1000/avg:.1f}")
timer.stop() timer.stop()
sys.exit(0) sys.exit(0)
mcrfpy.Timer("benchmark", measure, 16) mcrfpy.Timer("benchmark", measure, 16)
benchmark() benchmark()
``` ```
**Run:** **Run:**
```bash ```bash
cd build cd build
./mcrogueface --exec ../tests/benchmark_mysystem.py ./mcrogueface --exec ../tests/benchmark_mysystem.py
``` ```
For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop. For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
--- ---
## Step 4: Optimize - Make It Faster ## Step 4: Optimize - Make It Faster
### Strategy 1: Reduce Work ### Strategy 1: Reduce Work
**Example: Dirty Flags (already implemented)** **Example: Dirty Flags (already implemented)**
Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers. Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
### Strategy 2: Reduce Complexity ### Strategy 2: Reduce Complexity
**Example: Spatial queries** **Example: Spatial queries**
Instead of O(n) search through all entities, use `entities_in_radius()`: Instead of O(n) search through all entities, use `entities_in_radius()`:
```python ```python
# O(1) spatial query # O(1) spatial query
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0) nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
``` ```
### Strategy 3: Batch Operations ### Strategy 3: Batch Operations
**Minimize Python/C++ boundary crossings:** **Minimize Python/C++ boundary crossings:**
```python ```python
# Less efficient: many individual layer.set() calls # Less efficient: many individual layer.set() calls
for x in range(100): for x in range(100):
for y in range(100): for y in range(100):
layer.set((x, y), mcrfpy.Color(0, 0, 0, 0)) layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
# More efficient: single fill operation # More efficient: single fill operation
layer.fill(mcrfpy.Color(0, 0, 0, 0)) layer.fill(mcrfpy.Color(0, 0, 0, 0))
``` ```
### Strategy 4: Cache Results ### Strategy 4: Cache Results
**Example: Path caching** **Example: Path caching**
```python ```python
cached_path = [None] cached_path = [None]
last_target = [None] last_target = [None]
def get_path_to(grid, start, target): def get_path_to(grid, start, target):
if last_target[0] != target: if last_target[0] != target:
cached_path[0] = grid.find_path(start, target) cached_path[0] = grid.find_path(start, target)
last_target[0] = target last_target[0] = target
return cached_path[0] return cached_path[0]
``` ```
### Optimization Checklist ### Optimization Checklist
Before optimizing: Before optimizing:
- [ ] Profiled and identified real bottleneck - [ ] Profiled and identified real bottleneck
- [ ] Measured baseline performance - [ ] Measured baseline performance
- [ ] Understood root cause - [ ] Understood root cause
After optimization: After optimization:
- [ ] Measured improvement - [ ] Measured improvement
- [ ] Verified correctness - [ ] Verified correctness
- [ ] Updated tests if needed - [ ] Updated tests if needed
--- ---
## Step 5: Verify - Measure Improvement ## Step 5: Verify - Measure Improvement
### Re-run Benchmarks ### Re-run Benchmarks
Compare before and after measurements. Compare before and after measurements.
### Check Correctness ### Check Correctness
**Visual testing:** **Visual testing:**
1. Run game normally (not headless) 1. Run game normally (not headless)
2. Verify visual output unchanged 2. Verify visual output unchanged
3. Test edge cases 3. Test edge cases
**Automated testing:** **Automated testing:**
```bash ```bash
cd build cd build
./mcrogueface --headless --exec ../tests/unit/my_test.py ./mcrogueface --headless --exec ../tests/unit/my_test.py
``` ```
### Document Results ### Document Results
Add findings to the relevant Gitea issue with: Add findings to the relevant Gitea issue with:
- Baseline numbers - Baseline numbers
- Optimized numbers - Optimized numbers
- Improvement factor - Improvement factor
- Test script name - Test script name
- Commit hash - Commit hash
--- ---
## When NOT to Optimize ## When NOT to Optimize
**Don't optimize if:** **Don't optimize if:**
- Performance is already acceptable (< 16ms frame time) - Performance is already acceptable (< 16ms frame time)
- Optimization makes code significantly more complex - Optimization makes code significantly more complex
- You haven't profiled yet (no guessing!) - You haven't profiled yet (no guessing!)
- The bottleneck is elsewhere - The bottleneck is elsewhere
Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths. Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
--- ---
## Related Documentation ## Related Documentation
- [[Performance-and-Profiling]] - Profiling tools reference - [[Performance-and-Profiling]] - Profiling tools reference
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags - [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
- [[Grid-System]] - Grid optimization opportunities - [[Grid-System]] - Grid optimization opportunities
- [[Writing-Tests]] - Creating performance tests - [[Writing-Tests]] - Creating performance tests