From 7f22636ca7c37c74ba28c12f2aa9d49cef333dd9 Mon Sep 17 00:00:00 2001 From: John McCardle Date: Sat, 7 Feb 2026 23:49:36 +0000 Subject: [PATCH] Update Performance Optimization Workflow --- ...md => Performance-Optimization-Workflow.md | 508 +++++++++--------- 1 file changed, 254 insertions(+), 254 deletions(-) rename Performance-Optimization-Workflow.-.md => Performance-Optimization-Workflow.md (95%) diff --git a/Performance-Optimization-Workflow.-.md b/Performance-Optimization-Workflow.md similarity index 95% rename from Performance-Optimization-Workflow.-.md rename to Performance-Optimization-Workflow.md index d95a4b7..db54c31 100644 --- a/Performance-Optimization-Workflow.-.md +++ b/Performance-Optimization-Workflow.md @@ -1,255 +1,255 @@ -# Performance Optimization Workflow - -Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. - -## Quick Reference - -**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] - -**Tools:** -- F3 profiler overlay (in-game) -- `src/Profiler.h` - ScopedTimer -- Benchmark API (`mcrfpy.start_benchmark()`) - -## The Optimization Cycle - -``` -1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY - ^ | - +---------------------------------------------------------+ -``` - ---- - -## Step 1: Profile - Find the Bottleneck - -### Using F3 Overlay - -**Start the game and press F3:** - -Look for: -- **Red frame times** (>33ms) - Unacceptable performance -- **Yellow frame times** (16-33ms) - Marginal performance -- **High subsystem times** - Which system is slow? - - Grid rendering > 10ms? Grid optimization needed - - Entity rendering > 5ms? Entity culling needed - - Python script time > 5ms? Python callback optimization - -### Running Benchmarks - -Use the benchmark API for detailed data capture: - -```python -import mcrfpy - -mcrfpy.start_benchmark() - -# ... run test scenario ... - -filename = mcrfpy.end_benchmark() -print(f"Benchmark saved to: {filename}") -``` - -See [[Performance-and-Profiling]] for benchmark output format and analysis. - ---- - -## Step 2: Identify - Understand the Problem - -### Common Performance Issues - -**Issue: High Grid Render Time on Static Screens** -- **Symptom:** 20-40ms grid render, nothing changing -- **Cause:** Redrawing unchanged cells every frame -- **Status:** Solved by chunk-based dirty flag system - -**Issue: High Entity Render Time with Many Entities** -- **Symptom:** 10-20ms entity render with 500+ entities -- **Cause:** O(n) iteration, no spatial indexing -- **Solution:** [#115](../issues/115) SpatialHash (planned) - -**Issue: Slow Bulk Grid Updates from Python** -- **Symptom:** Frame drops when updating many cells -- **Cause:** Python/C++ boundary crossings for each cell -- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data - -**Issue: High Python Script Time** -- **Symptom:** 10-50ms in Python callbacks -- **Cause:** Heavy computation in Python update loops -- **Solution:** Move hot paths to C++ or optimize Python - ---- - -## Step 3: Instrument - Measure Precisely - -### Adding ScopedTimer (C++) - -Wrap slow functions with timing: - -```cpp -#include "Profiler.h" - -void MySystem::slowFunction() { - ScopedTimer timer(Resources::game->metrics.mySystemTime); - // ... code to measure ... -} -``` - -### Adding Custom Metrics (C++) - -1. Add field to `ProfilingMetrics` in `src/GameEngine.h` -2. Reset in `resetPerFrame()` -3. Display in `src/ProfilerOverlay.cpp::update()` -4. Instrument with ScopedTimer -5. Rebuild and press F3 - -### Creating Python Benchmarks - -```python -import mcrfpy -import sys - -def benchmark(): - scene = mcrfpy.Scene("bench") - grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600)) - scene.children.append(grid) - mcrfpy.current_scene = scene - - frame_times = [] - - def measure(timer, runtime): - frame_times.append(runtime) - if len(frame_times) >= 300: - avg = sum(frame_times) / len(frame_times) - print(f"Average: {avg:.2f}ms") - print(f"Min: {min(frame_times):.2f}ms") - print(f"Max: {max(frame_times):.2f}ms") - print(f"FPS: {1000/avg:.1f}") - timer.stop() - sys.exit(0) - - mcrfpy.Timer("benchmark", measure, 16) - -benchmark() -``` - -**Run:** -```bash -cd build -./mcrogueface --exec ../tests/benchmark_mysystem.py -``` - -For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop. - ---- - -## Step 4: Optimize - Make It Faster - -### Strategy 1: Reduce Work - -**Example: Dirty Flags (already implemented)** - -Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers. - -### Strategy 2: Reduce Complexity - -**Example: Spatial queries** - -Instead of O(n) search through all entities, use `entities_in_radius()`: - -```python -# O(1) spatial query -nearby = grid.entities_in_radius((target_x, target_y), radius=5.0) -``` - -### Strategy 3: Batch Operations - -**Minimize Python/C++ boundary crossings:** - -```python -# Less efficient: many individual layer.set() calls -for x in range(100): - for y in range(100): - layer.set((x, y), mcrfpy.Color(0, 0, 0, 0)) - -# More efficient: single fill operation -layer.fill(mcrfpy.Color(0, 0, 0, 0)) -``` - -### Strategy 4: Cache Results - -**Example: Path caching** - -```python -cached_path = [None] -last_target = [None] - -def get_path_to(grid, start, target): - if last_target[0] != target: - cached_path[0] = grid.find_path(start, target) - last_target[0] = target - return cached_path[0] -``` - -### Optimization Checklist - -Before optimizing: -- [ ] Profiled and identified real bottleneck -- [ ] Measured baseline performance -- [ ] Understood root cause - -After optimization: -- [ ] Measured improvement -- [ ] Verified correctness -- [ ] Updated tests if needed - ---- - -## Step 5: Verify - Measure Improvement - -### Re-run Benchmarks - -Compare before and after measurements. - -### Check Correctness - -**Visual testing:** -1. Run game normally (not headless) -2. Verify visual output unchanged -3. Test edge cases - -**Automated testing:** -```bash -cd build -./mcrogueface --headless --exec ../tests/unit/my_test.py -``` - -### Document Results - -Add findings to the relevant Gitea issue with: -- Baseline numbers -- Optimized numbers -- Improvement factor -- Test script name -- Commit hash - ---- - -## When NOT to Optimize - -**Don't optimize if:** -- Performance is already acceptable (< 16ms frame time) -- Optimization makes code significantly more complex -- You haven't profiled yet (no guessing!) -- The bottleneck is elsewhere - -Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths. - ---- - -## Related Documentation - -- [[Performance-and-Profiling]] - Profiling tools reference -- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags -- [[Grid-System]] - Grid optimization opportunities +# Performance Optimization Workflow + +Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. + +## Quick Reference + +**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] + +**Tools:** +- F3 profiler overlay (in-game) +- `src/Profiler.h` - ScopedTimer +- Benchmark API (`mcrfpy.start_benchmark()`) + +## The Optimization Cycle + +``` +1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY + ^ | + +---------------------------------------------------------+ +``` + +--- + +## Step 1: Profile - Find the Bottleneck + +### Using F3 Overlay + +**Start the game and press F3:** + +Look for: +- **Red frame times** (>33ms) - Unacceptable performance +- **Yellow frame times** (16-33ms) - Marginal performance +- **High subsystem times** - Which system is slow? + - Grid rendering > 10ms? Grid optimization needed + - Entity rendering > 5ms? Entity culling needed + - Python script time > 5ms? Python callback optimization + +### Running Benchmarks + +Use the benchmark API for detailed data capture: + +```python +import mcrfpy + +mcrfpy.start_benchmark() + +# ... run test scenario ... + +filename = mcrfpy.end_benchmark() +print(f"Benchmark saved to: {filename}") +``` + +See [[Performance-and-Profiling]] for benchmark output format and analysis. + +--- + +## Step 2: Identify - Understand the Problem + +### Common Performance Issues + +**Issue: High Grid Render Time on Static Screens** +- **Symptom:** 20-40ms grid render, nothing changing +- **Cause:** Redrawing unchanged cells every frame +- **Status:** Solved by chunk-based dirty flag system + +**Issue: High Entity Render Time with Many Entities** +- **Symptom:** 10-20ms entity render with 500+ entities +- **Cause:** O(n) iteration, no spatial indexing +- **Solution:** [#115](../issues/115) SpatialHash (planned) + +**Issue: Slow Bulk Grid Updates from Python** +- **Symptom:** Frame drops when updating many cells +- **Cause:** Python/C++ boundary crossings for each cell +- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data + +**Issue: High Python Script Time** +- **Symptom:** 10-50ms in Python callbacks +- **Cause:** Heavy computation in Python update loops +- **Solution:** Move hot paths to C++ or optimize Python + +--- + +## Step 3: Instrument - Measure Precisely + +### Adding ScopedTimer (C++) + +Wrap slow functions with timing: + +```cpp +#include "Profiler.h" + +void MySystem::slowFunction() { + ScopedTimer timer(Resources::game->metrics.mySystemTime); + // ... code to measure ... +} +``` + +### Adding Custom Metrics (C++) + +1. Add field to `ProfilingMetrics` in `src/GameEngine.h` +2. Reset in `resetPerFrame()` +3. Display in `src/ProfilerOverlay.cpp::update()` +4. Instrument with ScopedTimer +5. Rebuild and press F3 + +### Creating Python Benchmarks + +```python +import mcrfpy +import sys + +def benchmark(): + scene = mcrfpy.Scene("bench") + grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600)) + scene.children.append(grid) + mcrfpy.current_scene = scene + + frame_times = [] + + def measure(timer, runtime): + frame_times.append(runtime) + if len(frame_times) >= 300: + avg = sum(frame_times) / len(frame_times) + print(f"Average: {avg:.2f}ms") + print(f"Min: {min(frame_times):.2f}ms") + print(f"Max: {max(frame_times):.2f}ms") + print(f"FPS: {1000/avg:.1f}") + timer.stop() + sys.exit(0) + + mcrfpy.Timer("benchmark", measure, 16) + +benchmark() +``` + +**Run:** +```bash +cd build +./mcrogueface --exec ../tests/benchmark_mysystem.py +``` + +For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop. + +--- + +## Step 4: Optimize - Make It Faster + +### Strategy 1: Reduce Work + +**Example: Dirty Flags (already implemented)** + +Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers. + +### Strategy 2: Reduce Complexity + +**Example: Spatial queries** + +Instead of O(n) search through all entities, use `entities_in_radius()`: + +```python +# O(1) spatial query +nearby = grid.entities_in_radius((target_x, target_y), radius=5.0) +``` + +### Strategy 3: Batch Operations + +**Minimize Python/C++ boundary crossings:** + +```python +# Less efficient: many individual layer.set() calls +for x in range(100): + for y in range(100): + layer.set((x, y), mcrfpy.Color(0, 0, 0, 0)) + +# More efficient: single fill operation +layer.fill(mcrfpy.Color(0, 0, 0, 0)) +``` + +### Strategy 4: Cache Results + +**Example: Path caching** + +```python +cached_path = [None] +last_target = [None] + +def get_path_to(grid, start, target): + if last_target[0] != target: + cached_path[0] = grid.find_path(start, target) + last_target[0] = target + return cached_path[0] +``` + +### Optimization Checklist + +Before optimizing: +- [ ] Profiled and identified real bottleneck +- [ ] Measured baseline performance +- [ ] Understood root cause + +After optimization: +- [ ] Measured improvement +- [ ] Verified correctness +- [ ] Updated tests if needed + +--- + +## Step 5: Verify - Measure Improvement + +### Re-run Benchmarks + +Compare before and after measurements. + +### Check Correctness + +**Visual testing:** +1. Run game normally (not headless) +2. Verify visual output unchanged +3. Test edge cases + +**Automated testing:** +```bash +cd build +./mcrogueface --headless --exec ../tests/unit/my_test.py +``` + +### Document Results + +Add findings to the relevant Gitea issue with: +- Baseline numbers +- Optimized numbers +- Improvement factor +- Test script name +- Commit hash + +--- + +## When NOT to Optimize + +**Don't optimize if:** +- Performance is already acceptable (< 16ms frame time) +- Optimization makes code significantly more complex +- You haven't profiled yet (no guessing!) +- The bottleneck is elsewhere + +Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths. + +--- + +## Related Documentation + +- [[Performance-and-Profiling]] - Profiling tools reference +- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags +- [[Grid-System]] - Grid optimization opportunities - [[Writing-Tests]] - Creating performance tests \ No newline at end of file