Table of Contents
- Performance Optimization Workflow
- Quick Reference
- The Optimization Cycle
- Step 1: Profile - Find the Bottleneck
- Step 2: Identify - Understand the Problem
- Step 3: Instrument - Measure Precisely
- Step 4: Optimize - Make It Faster
- Strategy 1: Reduce Work
- Strategy 2: Reduce Complexity
- Strategy 3: Batch Operations
- Strategy 4: Cache Results
- Optimization Checklist
- Step 5: Verify - Measure Improvement
- When NOT to Optimize
- Related Documentation
Performance Optimization Workflow
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
Quick Reference
Related Systems: Performance-and-Profiling, Grid-System
Tools:
- F3 profiler overlay (in-game)
src/Profiler.h- ScopedTimer- Benchmark API (
mcrfpy.start_benchmark())
The Optimization Cycle
1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
^ |
+---------------------------------------------------------+
Step 1: Profile - Find the Bottleneck
Using F3 Overlay
Start the game and press F3:
Look for:
- Red frame times (>33ms) - Unacceptable performance
- Yellow frame times (16-33ms) - Marginal performance
- High subsystem times - Which system is slow?
- Grid rendering > 10ms? Grid optimization needed
- Entity rendering > 5ms? Entity culling needed
- Python script time > 5ms? Python callback optimization
Running Benchmarks
Use the benchmark API for detailed data capture:
import mcrfpy
mcrfpy.start_benchmark()
# ... run test scenario ...
filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")
See Performance-and-Profiling for benchmark output format and analysis.
Step 2: Identify - Understand the Problem
Common Performance Issues
Issue: High Grid Render Time on Static Screens
- Symptom: 20-40ms grid render, nothing changing
- Cause: Redrawing unchanged cells every frame
- Status: Solved by chunk-based dirty flag system
Issue: High Entity Render Time with Many Entities
- Symptom: 10-20ms entity render with 500+ entities
- Cause: O(n) iteration, no spatial indexing
- Solution: #115 SpatialHash (planned)
Issue: Slow Bulk Grid Updates from Python
- Symptom: Frame drops when updating many cells
- Cause: Python/C++ boundary crossings for each cell
- Workaround: Minimize individual
layer.set()calls; uselayer.fill()for uniform data
Issue: High Python Script Time
- Symptom: 10-50ms in Python callbacks
- Cause: Heavy computation in Python update loops
- Solution: Move hot paths to C++ or optimize Python
Step 3: Instrument - Measure Precisely
Adding ScopedTimer (C++)
Wrap slow functions with timing:
#include "Profiler.h"
void MySystem::slowFunction() {
ScopedTimer timer(Resources::game->metrics.mySystemTime);
// ... code to measure ...
}
Adding Custom Metrics (C++)
- Add field to
ProfilingMetricsinsrc/GameEngine.h - Reset in
resetPerFrame() - Display in
src/ProfilerOverlay.cpp::update() - Instrument with ScopedTimer
- Rebuild and press F3
Creating Python Benchmarks
import mcrfpy
import sys
def benchmark():
scene = mcrfpy.Scene("bench")
grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
scene.children.append(grid)
mcrfpy.current_scene = scene
frame_times = []
def measure(timer, runtime):
frame_times.append(runtime)
if len(frame_times) >= 300:
avg = sum(frame_times) / len(frame_times)
print(f"Average: {avg:.2f}ms")
print(f"Min: {min(frame_times):.2f}ms")
print(f"Max: {max(frame_times):.2f}ms")
print(f"FPS: {1000/avg:.1f}")
timer.stop()
sys.exit(0)
mcrfpy.Timer("benchmark", measure, 16)
benchmark()
Run:
cd build
./mcrogueface --exec ../tests/benchmark_mysystem.py
For headless benchmarks, use Python's time module instead of the Timer API since step() bypasses the game loop.
Step 4: Optimize - Make It Faster
Strategy 1: Reduce Work
Example: Dirty Flags (already implemented)
Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
Strategy 2: Reduce Complexity
Example: Spatial queries
Instead of O(n) search through all entities, use entities_in_radius():
# O(1) spatial query
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
Strategy 3: Batch Operations
Minimize Python/C++ boundary crossings:
# Less efficient: many individual layer.set() calls
for x in range(100):
for y in range(100):
layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
# More efficient: single fill operation
layer.fill(mcrfpy.Color(0, 0, 0, 0))
Strategy 4: Cache Results
Example: Path caching
cached_path = [None]
last_target = [None]
def get_path_to(grid, start, target):
if last_target[0] != target:
cached_path[0] = grid.find_path(start, target)
last_target[0] = target
return cached_path[0]
Optimization Checklist
Before optimizing:
- Profiled and identified real bottleneck
- Measured baseline performance
- Understood root cause
After optimization:
- Measured improvement
- Verified correctness
- Updated tests if needed
Step 5: Verify - Measure Improvement
Re-run Benchmarks
Compare before and after measurements.
Check Correctness
Visual testing:
- Run game normally (not headless)
- Verify visual output unchanged
- Test edge cases
Automated testing:
cd build
./mcrogueface --headless --exec ../tests/unit/my_test.py
Document Results
Add findings to the relevant Gitea issue with:
- Baseline numbers
- Optimized numbers
- Improvement factor
- Test script name
- Commit hash
When NOT to Optimize
Don't optimize if:
- Performance is already acceptable (< 16ms frame time)
- Optimization makes code significantly more complex
- You haven't profiled yet (no guessing!)
- The bottleneck is elsewhere
Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
Related Documentation
- Performance-and-Profiling - Profiling tools reference
- Grid-Rendering-Pipeline - Chunk caching and dirty flags
- Grid-System - Grid optimization opportunities
- Writing-Tests - Creating performance tests