Update Performance Optimization Workflow

John McCardle 2026-02-07 23:49:36 +00:00
commit 7f22636ca7

@ -1,255 +1,255 @@
# Performance Optimization Workflow
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
## Quick Reference
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
**Tools:**
- F3 profiler overlay (in-game)
- `src/Profiler.h` - ScopedTimer
- Benchmark API (`mcrfpy.start_benchmark()`)
## The Optimization Cycle
```
1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
^ |
+---------------------------------------------------------+
```
---
## Step 1: Profile - Find the Bottleneck
### Using F3 Overlay
**Start the game and press F3:**
Look for:
- **Red frame times** (>33ms) - Unacceptable performance
- **Yellow frame times** (16-33ms) - Marginal performance
- **High subsystem times** - Which system is slow?
- Grid rendering > 10ms? Grid optimization needed
- Entity rendering > 5ms? Entity culling needed
- Python script time > 5ms? Python callback optimization
### Running Benchmarks
Use the benchmark API for detailed data capture:
```python
import mcrfpy
mcrfpy.start_benchmark()
# ... run test scenario ...
filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")
```
See [[Performance-and-Profiling]] for benchmark output format and analysis.
---
## Step 2: Identify - Understand the Problem
### Common Performance Issues
**Issue: High Grid Render Time on Static Screens**
- **Symptom:** 20-40ms grid render, nothing changing
- **Cause:** Redrawing unchanged cells every frame
- **Status:** Solved by chunk-based dirty flag system
**Issue: High Entity Render Time with Many Entities**
- **Symptom:** 10-20ms entity render with 500+ entities
- **Cause:** O(n) iteration, no spatial indexing
- **Solution:** [#115](../issues/115) SpatialHash (planned)
**Issue: Slow Bulk Grid Updates from Python**
- **Symptom:** Frame drops when updating many cells
- **Cause:** Python/C++ boundary crossings for each cell
- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
**Issue: High Python Script Time**
- **Symptom:** 10-50ms in Python callbacks
- **Cause:** Heavy computation in Python update loops
- **Solution:** Move hot paths to C++ or optimize Python
---
## Step 3: Instrument - Measure Precisely
### Adding ScopedTimer (C++)
Wrap slow functions with timing:
```cpp
#include "Profiler.h"
void MySystem::slowFunction() {
ScopedTimer timer(Resources::game->metrics.mySystemTime);
// ... code to measure ...
}
```
### Adding Custom Metrics (C++)
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
2. Reset in `resetPerFrame()`
3. Display in `src/ProfilerOverlay.cpp::update()`
4. Instrument with ScopedTimer
5. Rebuild and press F3
### Creating Python Benchmarks
```python
import mcrfpy
import sys
def benchmark():
scene = mcrfpy.Scene("bench")
grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
scene.children.append(grid)
mcrfpy.current_scene = scene
frame_times = []
def measure(timer, runtime):
frame_times.append(runtime)
if len(frame_times) >= 300:
avg = sum(frame_times) / len(frame_times)
print(f"Average: {avg:.2f}ms")
print(f"Min: {min(frame_times):.2f}ms")
print(f"Max: {max(frame_times):.2f}ms")
print(f"FPS: {1000/avg:.1f}")
timer.stop()
sys.exit(0)
mcrfpy.Timer("benchmark", measure, 16)
benchmark()
```
**Run:**
```bash
cd build
./mcrogueface --exec ../tests/benchmark_mysystem.py
```
For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
---
## Step 4: Optimize - Make It Faster
### Strategy 1: Reduce Work
**Example: Dirty Flags (already implemented)**
Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
### Strategy 2: Reduce Complexity
**Example: Spatial queries**
Instead of O(n) search through all entities, use `entities_in_radius()`:
```python
# O(1) spatial query
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
```
### Strategy 3: Batch Operations
**Minimize Python/C++ boundary crossings:**
```python
# Less efficient: many individual layer.set() calls
for x in range(100):
for y in range(100):
layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
# More efficient: single fill operation
layer.fill(mcrfpy.Color(0, 0, 0, 0))
```
### Strategy 4: Cache Results
**Example: Path caching**
```python
cached_path = [None]
last_target = [None]
def get_path_to(grid, start, target):
if last_target[0] != target:
cached_path[0] = grid.find_path(start, target)
last_target[0] = target
return cached_path[0]
```
### Optimization Checklist
Before optimizing:
- [ ] Profiled and identified real bottleneck
- [ ] Measured baseline performance
- [ ] Understood root cause
After optimization:
- [ ] Measured improvement
- [ ] Verified correctness
- [ ] Updated tests if needed
---
## Step 5: Verify - Measure Improvement
### Re-run Benchmarks
Compare before and after measurements.
### Check Correctness
**Visual testing:**
1. Run game normally (not headless)
2. Verify visual output unchanged
3. Test edge cases
**Automated testing:**
```bash
cd build
./mcrogueface --headless --exec ../tests/unit/my_test.py
```
### Document Results
Add findings to the relevant Gitea issue with:
- Baseline numbers
- Optimized numbers
- Improvement factor
- Test script name
- Commit hash
---
## When NOT to Optimize
**Don't optimize if:**
- Performance is already acceptable (< 16ms frame time)
- Optimization makes code significantly more complex
- You haven't profiled yet (no guessing!)
- The bottleneck is elsewhere
Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
---
## Related Documentation
- [[Performance-and-Profiling]] - Profiling tools reference
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
- [[Grid-System]] - Grid optimization opportunities
# Performance Optimization Workflow
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
## Quick Reference
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
**Tools:**
- F3 profiler overlay (in-game)
- `src/Profiler.h` - ScopedTimer
- Benchmark API (`mcrfpy.start_benchmark()`)
## The Optimization Cycle
```
1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
^ |
+---------------------------------------------------------+
```
---
## Step 1: Profile - Find the Bottleneck
### Using F3 Overlay
**Start the game and press F3:**
Look for:
- **Red frame times** (>33ms) - Unacceptable performance
- **Yellow frame times** (16-33ms) - Marginal performance
- **High subsystem times** - Which system is slow?
- Grid rendering > 10ms? Grid optimization needed
- Entity rendering > 5ms? Entity culling needed
- Python script time > 5ms? Python callback optimization
### Running Benchmarks
Use the benchmark API for detailed data capture:
```python
import mcrfpy
mcrfpy.start_benchmark()
# ... run test scenario ...
filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")
```
See [[Performance-and-Profiling]] for benchmark output format and analysis.
---
## Step 2: Identify - Understand the Problem
### Common Performance Issues
**Issue: High Grid Render Time on Static Screens**
- **Symptom:** 20-40ms grid render, nothing changing
- **Cause:** Redrawing unchanged cells every frame
- **Status:** Solved by chunk-based dirty flag system
**Issue: High Entity Render Time with Many Entities**
- **Symptom:** 10-20ms entity render with 500+ entities
- **Cause:** O(n) iteration, no spatial indexing
- **Solution:** [#115](../issues/115) SpatialHash (planned)
**Issue: Slow Bulk Grid Updates from Python**
- **Symptom:** Frame drops when updating many cells
- **Cause:** Python/C++ boundary crossings for each cell
- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
**Issue: High Python Script Time**
- **Symptom:** 10-50ms in Python callbacks
- **Cause:** Heavy computation in Python update loops
- **Solution:** Move hot paths to C++ or optimize Python
---
## Step 3: Instrument - Measure Precisely
### Adding ScopedTimer (C++)
Wrap slow functions with timing:
```cpp
#include "Profiler.h"
void MySystem::slowFunction() {
ScopedTimer timer(Resources::game->metrics.mySystemTime);
// ... code to measure ...
}
```
### Adding Custom Metrics (C++)
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
2. Reset in `resetPerFrame()`
3. Display in `src/ProfilerOverlay.cpp::update()`
4. Instrument with ScopedTimer
5. Rebuild and press F3
### Creating Python Benchmarks
```python
import mcrfpy
import sys
def benchmark():
scene = mcrfpy.Scene("bench")
grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
scene.children.append(grid)
mcrfpy.current_scene = scene
frame_times = []
def measure(timer, runtime):
frame_times.append(runtime)
if len(frame_times) >= 300:
avg = sum(frame_times) / len(frame_times)
print(f"Average: {avg:.2f}ms")
print(f"Min: {min(frame_times):.2f}ms")
print(f"Max: {max(frame_times):.2f}ms")
print(f"FPS: {1000/avg:.1f}")
timer.stop()
sys.exit(0)
mcrfpy.Timer("benchmark", measure, 16)
benchmark()
```
**Run:**
```bash
cd build
./mcrogueface --exec ../tests/benchmark_mysystem.py
```
For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
---
## Step 4: Optimize - Make It Faster
### Strategy 1: Reduce Work
**Example: Dirty Flags (already implemented)**
Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
### Strategy 2: Reduce Complexity
**Example: Spatial queries**
Instead of O(n) search through all entities, use `entities_in_radius()`:
```python
# O(1) spatial query
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
```
### Strategy 3: Batch Operations
**Minimize Python/C++ boundary crossings:**
```python
# Less efficient: many individual layer.set() calls
for x in range(100):
for y in range(100):
layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
# More efficient: single fill operation
layer.fill(mcrfpy.Color(0, 0, 0, 0))
```
### Strategy 4: Cache Results
**Example: Path caching**
```python
cached_path = [None]
last_target = [None]
def get_path_to(grid, start, target):
if last_target[0] != target:
cached_path[0] = grid.find_path(start, target)
last_target[0] = target
return cached_path[0]
```
### Optimization Checklist
Before optimizing:
- [ ] Profiled and identified real bottleneck
- [ ] Measured baseline performance
- [ ] Understood root cause
After optimization:
- [ ] Measured improvement
- [ ] Verified correctness
- [ ] Updated tests if needed
---
## Step 5: Verify - Measure Improvement
### Re-run Benchmarks
Compare before and after measurements.
### Check Correctness
**Visual testing:**
1. Run game normally (not headless)
2. Verify visual output unchanged
3. Test edge cases
**Automated testing:**
```bash
cd build
./mcrogueface --headless --exec ../tests/unit/my_test.py
```
### Document Results
Add findings to the relevant Gitea issue with:
- Baseline numbers
- Optimized numbers
- Improvement factor
- Test script name
- Commit hash
---
## When NOT to Optimize
**Don't optimize if:**
- Performance is already acceptable (< 16ms frame time)
- Optimization makes code significantly more complex
- You haven't profiled yet (no guessing!)
- The bottleneck is elsewhere
Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
---
## Related Documentation
- [[Performance-and-Profiling]] - Profiling tools reference
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
- [[Grid-System]] - Grid optimization opportunities
- [[Writing-Tests]] - Creating performance tests