From 070d5bb813e1046dba395a2fec4ed633897dd067 Mon Sep 17 00:00:00 2001 From: John McCardle Date: Sat, 7 Feb 2026 22:33:36 +0000 Subject: [PATCH] Update "Performance-Optimization-Workflow.-" --- Performance-Optimization-Workflow.-.md | 255 +++++++++++++ Performance-Optimization-Workflow.md | 494 ------------------------- 2 files changed, 255 insertions(+), 494 deletions(-) create mode 100644 Performance-Optimization-Workflow.-.md delete mode 100644 Performance-Optimization-Workflow.md diff --git a/Performance-Optimization-Workflow.-.md b/Performance-Optimization-Workflow.-.md new file mode 100644 index 0000000..d95a4b7 --- /dev/null +++ b/Performance-Optimization-Workflow.-.md @@ -0,0 +1,255 @@ +# Performance Optimization Workflow + +Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. + +## Quick Reference + +**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] + +**Tools:** +- F3 profiler overlay (in-game) +- `src/Profiler.h` - ScopedTimer +- Benchmark API (`mcrfpy.start_benchmark()`) + +## The Optimization Cycle + +``` +1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY + ^ | + +---------------------------------------------------------+ +``` + +--- + +## Step 1: Profile - Find the Bottleneck + +### Using F3 Overlay + +**Start the game and press F3:** + +Look for: +- **Red frame times** (>33ms) - Unacceptable performance +- **Yellow frame times** (16-33ms) - Marginal performance +- **High subsystem times** - Which system is slow? + - Grid rendering > 10ms? Grid optimization needed + - Entity rendering > 5ms? Entity culling needed + - Python script time > 5ms? Python callback optimization + +### Running Benchmarks + +Use the benchmark API for detailed data capture: + +```python +import mcrfpy + +mcrfpy.start_benchmark() + +# ... run test scenario ... + +filename = mcrfpy.end_benchmark() +print(f"Benchmark saved to: {filename}") +``` + +See [[Performance-and-Profiling]] for benchmark output format and analysis. + +--- + +## Step 2: Identify - Understand the Problem + +### Common Performance Issues + +**Issue: High Grid Render Time on Static Screens** +- **Symptom:** 20-40ms grid render, nothing changing +- **Cause:** Redrawing unchanged cells every frame +- **Status:** Solved by chunk-based dirty flag system + +**Issue: High Entity Render Time with Many Entities** +- **Symptom:** 10-20ms entity render with 500+ entities +- **Cause:** O(n) iteration, no spatial indexing +- **Solution:** [#115](../issues/115) SpatialHash (planned) + +**Issue: Slow Bulk Grid Updates from Python** +- **Symptom:** Frame drops when updating many cells +- **Cause:** Python/C++ boundary crossings for each cell +- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data + +**Issue: High Python Script Time** +- **Symptom:** 10-50ms in Python callbacks +- **Cause:** Heavy computation in Python update loops +- **Solution:** Move hot paths to C++ or optimize Python + +--- + +## Step 3: Instrument - Measure Precisely + +### Adding ScopedTimer (C++) + +Wrap slow functions with timing: + +```cpp +#include "Profiler.h" + +void MySystem::slowFunction() { + ScopedTimer timer(Resources::game->metrics.mySystemTime); + // ... code to measure ... +} +``` + +### Adding Custom Metrics (C++) + +1. Add field to `ProfilingMetrics` in `src/GameEngine.h` +2. Reset in `resetPerFrame()` +3. Display in `src/ProfilerOverlay.cpp::update()` +4. Instrument with ScopedTimer +5. Rebuild and press F3 + +### Creating Python Benchmarks + +```python +import mcrfpy +import sys + +def benchmark(): + scene = mcrfpy.Scene("bench") + grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600)) + scene.children.append(grid) + mcrfpy.current_scene = scene + + frame_times = [] + + def measure(timer, runtime): + frame_times.append(runtime) + if len(frame_times) >= 300: + avg = sum(frame_times) / len(frame_times) + print(f"Average: {avg:.2f}ms") + print(f"Min: {min(frame_times):.2f}ms") + print(f"Max: {max(frame_times):.2f}ms") + print(f"FPS: {1000/avg:.1f}") + timer.stop() + sys.exit(0) + + mcrfpy.Timer("benchmark", measure, 16) + +benchmark() +``` + +**Run:** +```bash +cd build +./mcrogueface --exec ../tests/benchmark_mysystem.py +``` + +For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop. + +--- + +## Step 4: Optimize - Make It Faster + +### Strategy 1: Reduce Work + +**Example: Dirty Flags (already implemented)** + +Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers. + +### Strategy 2: Reduce Complexity + +**Example: Spatial queries** + +Instead of O(n) search through all entities, use `entities_in_radius()`: + +```python +# O(1) spatial query +nearby = grid.entities_in_radius((target_x, target_y), radius=5.0) +``` + +### Strategy 3: Batch Operations + +**Minimize Python/C++ boundary crossings:** + +```python +# Less efficient: many individual layer.set() calls +for x in range(100): + for y in range(100): + layer.set((x, y), mcrfpy.Color(0, 0, 0, 0)) + +# More efficient: single fill operation +layer.fill(mcrfpy.Color(0, 0, 0, 0)) +``` + +### Strategy 4: Cache Results + +**Example: Path caching** + +```python +cached_path = [None] +last_target = [None] + +def get_path_to(grid, start, target): + if last_target[0] != target: + cached_path[0] = grid.find_path(start, target) + last_target[0] = target + return cached_path[0] +``` + +### Optimization Checklist + +Before optimizing: +- [ ] Profiled and identified real bottleneck +- [ ] Measured baseline performance +- [ ] Understood root cause + +After optimization: +- [ ] Measured improvement +- [ ] Verified correctness +- [ ] Updated tests if needed + +--- + +## Step 5: Verify - Measure Improvement + +### Re-run Benchmarks + +Compare before and after measurements. + +### Check Correctness + +**Visual testing:** +1. Run game normally (not headless) +2. Verify visual output unchanged +3. Test edge cases + +**Automated testing:** +```bash +cd build +./mcrogueface --headless --exec ../tests/unit/my_test.py +``` + +### Document Results + +Add findings to the relevant Gitea issue with: +- Baseline numbers +- Optimized numbers +- Improvement factor +- Test script name +- Commit hash + +--- + +## When NOT to Optimize + +**Don't optimize if:** +- Performance is already acceptable (< 16ms frame time) +- Optimization makes code significantly more complex +- You haven't profiled yet (no guessing!) +- The bottleneck is elsewhere + +Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths. + +--- + +## Related Documentation + +- [[Performance-and-Profiling]] - Profiling tools reference +- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags +- [[Grid-System]] - Grid optimization opportunities +- [[Writing-Tests]] - Creating performance tests \ No newline at end of file diff --git a/Performance-Optimization-Workflow.md b/Performance-Optimization-Workflow.md deleted file mode 100644 index e062eba..0000000 --- a/Performance-Optimization-Workflow.md +++ /dev/null @@ -1,494 +0,0 @@ -# Performance Optimization Workflow - -Systematic approach to identifying and resolving performance bottlenecks in McRogueFace. - -## Quick Reference - -**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]] - -**Tools:** -- F3 profiler overlay (in-game) -- `src/Profiler.h` - ScopedTimer -- `tests/benchmark_*.py` - Performance benchmarks - -## The Optimization Cycle - -``` -1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY - ↑ | - └─────────────────────────────────────────────────────────┘ -``` - ---- - -## Step 1: Profile - Find the Bottleneck - -### Using F3 Overlay - -**Start the game and press F3:** - -Look for: -- **Red frame times** (>33ms) - Unacceptable performance -- **Yellow frame times** (16-33ms) - Marginal performance -- **High subsystem times** - Which system is slow? - - Grid rendering > 10ms? → Grid optimization needed - - Entity rendering > 5ms? → Entity culling/SpatialHash needed - - Python script time > 5ms? → Python callback optimization - -**Example profiler output:** -``` -Frame: 45.2ms (RED) -FPS: 22 - -Grid Render: 32.1ms ← BOTTLENECK! -Entity Render: 8.5ms -Python Script: 2.1ms -Animation: 1.2ms - -Cells Rendered: 10000 -Entities: 150 (visible: 50) -``` - -**Analysis:** Grid rendering is the bottleneck (32ms of 45ms total). - -### Running Benchmarks - -**Static Grid Benchmark:** -```bash -cd build -./mcrogueface --headless --exec ../tests/benchmark_static_grid.py -``` - -**Moving Entities Benchmark:** -```bash -./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py -``` - -**Output:** Baseline performance metrics to measure improvement against. - ---- - -## Step 2: Identify - Understand the Problem - -### Common Performance Issues - -**Issue: High Grid Render Time on Static Screens** -- **Symptom:** 20-40ms grid render, nothing changing -- **Cause:** Redrawing unchanged cells every frame -- **Solution:** Implement dirty flag system ([#116](../issues/116)) - -**Issue: High Entity Render Time with Many Entities** -- **Symptom:** 10-20ms entity render with 500+ entities -- **Cause:** O(n) iteration, no spatial indexing -- **Solution:** Implement SpatialHash ([#115](../issues/115)) - -**Issue: Slow Bulk Grid Updates from Python** -- **Symptom:** Frame drops when updating many cells -- **Cause:** Python/C++ boundary crossings for each cell -- **Solution:** Implement batch operations ([#113](../issues/113)) - -**Issue: High Python Script Time** -- **Symptom:** 10-50ms in Python callbacks -- **Cause:** Heavy computation in Python update loops -- **Solution:** Move hot paths to C++ or optimize Python - -### Analyzing Call Stacks - -**When F3 overlay isn't enough:** - -```bash -# Build with debug symbols -make clean -cmake .. -DCMAKE_BUILD_TYPE=Debug -make - -# Profile with gdb -gdb ./mcrogueface -(gdb) run - -(gdb) info threads -(gdb) bt # Backtrace -``` - ---- - -## Step 3: Instrument - Measure Precisely - -### Adding ScopedTimer - -**Identify the slow function and wrap it:** - -```cpp -#include "Profiler.h" - -void MySystem::slowFunction() { - // Add timer - ScopedTimer timer(Resources::game->metrics.mySystemTime); - - // Your slow code here - for (auto& item : items) { - item->process(); - } -} -``` - -### Adding Custom Metrics - -**1. Add metric field to ProfilingMetrics:** - -`src/GameEngine.h`: -```cpp -struct ProfilingMetrics { - // ... existing fields ... - float mySystemTime = 0.0f; // Add this -}; -``` - -**2. Reset in resetPerFrame():** - -```cpp -void ProfilingMetrics::resetPerFrame() { - // ... existing resets ... - mySystemTime = 0.0f; // Add this -} -``` - -**3. Display in ProfilerOverlay:** - -`src/ProfilerOverlay.cpp`: -```cpp -void ProfilerOverlay::update(const ProfilingMetrics& metrics) { - // ... existing formatting ... - ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n"; -} -``` - -**4. Rebuild and test:** - -```bash -make -cd build -./mcrogueface -# Press F3 - see your metric! -``` - -### Creating Benchmarks - -**Create `tests/benchmark_mysystem.py`:** - -```python -import mcrfpy -import sys -import time - -def benchmark(): - # Setup - mcrfpy.createScene("bench") - # ... create test scenario ... - - frame_times = [] - - def measure(runtime_ms): - frame_times.append(runtime_ms) - if len(frame_times) >= 300: # 5 seconds at 60fps - # Report statistics - avg = sum(frame_times) / len(frame_times) - min_time = min(frame_times) - max_time = max(frame_times) - - print(f"Average: {avg:.2f}ms") - print(f"Min: {min_time:.2f}ms") - print(f"Max: {max_time:.2f}ms") - print(f"FPS: {1000/avg:.1f}") - - sys.exit(0) - - mcrfpy.setTimer("benchmark", measure, 16) # Every frame - -benchmark() -``` - -**Run:** -```bash -./mcrogueface --headless --exec ../tests/benchmark_mysystem.py -``` - ---- - -## Step 4: Optimize - Make It Faster - -### Optimization Strategies - -#### Strategy 1: Reduce Work - -**Example: Grid Dirty Flags** - -**Before:** Redraw all cells every frame -```cpp -for (int x = 0; x < grid_x; x++) { - for (int y = 0; y < grid_y; y++) { - renderCell(x, y); // Always renders - } -} -``` - -**After:** Only redraw when changed -```cpp -if (grid_dirty) { - for (int x = 0; x < grid_x; x++) { - for (int y = 0; y < grid_y; y++) { - renderCell(x, y); - } - } - grid_dirty = false; -} -``` - -**Expected:** 10-50x improvement for static scenes - -#### Strategy 2: Reduce Complexity - -**Example: SpatialHash for Entity Queries** - -**Before:** O(n) search through all entities -```cpp -for (auto& entity : entities) { - if (distanceTo(entity, target) < radius) { - nearby.push_back(entity); - } -} -``` - -**After:** O(1) hash lookup -```cpp -auto cell = spatialHash.getCell(target.x, target.y); -for (auto& entity : cell.entities) { - nearby.push_back(entity); -} -``` - -**Expected:** 100x+ improvement for large entity counts - -#### Strategy 3: Batch Operations - -**Example: Grid Batch Updates** - -**Before:** Multiple Python/C++ crossings -```python -for x in range(100): - for y in range(100): - grid.at((x, y)).tilesprite = 42 # 10,000 calls! -``` - -**After:** Single batch operation -```python -grid.fill_rect(0, 0, 100, 100, 42) # 1 call! -``` - -**Expected:** 10-100x improvement for bulk updates - -#### Strategy 4: Cache Results - -**Example: Path Caching in Entities** - -**Before:** Recompute path every frame -```cpp -void Entity::update() { - path = computePathTo(target); // Expensive! - followPath(path); -} -``` - -**After:** Cache and reuse -```cpp -void Entity::update() { - if (!cachedPath || targetMoved) { - cachedPath = computePathTo(target); - } - followPath(cachedPath); -} -``` - -**Expected:** 10x+ improvement for pathfinding-heavy scenarios - -### Optimization Checklist - -Before optimizing: -- [ ] Profiled and identified real bottleneck -- [ ] Measured baseline performance -- [ ] Understood root cause - -During optimization: -- [ ] Changed only one thing at a time -- [ ] Kept original code for comparison -- [ ] Added comments explaining optimization - -After optimization: -- [ ] Measured improvement -- [ ] Verified correctness (no bugs introduced) -- [ ] Updated tests if needed - ---- - -## Step 5: Verify - Measure Improvement - -### Re-run Benchmarks - -**Before optimization:** -``` -Average: 45.2ms -Min: 38.1ms -Max: 62.3ms -FPS: 22.1 -``` - -**After optimization:** -``` -Average: 8.5ms ← 5.3x improvement! -Min: 7.2ms -Max: 12.1ms -FPS: 117.6 -``` - -### Check Correctness - -**Visual testing:** -1. Run game normally (not headless) -2. Verify visual output unchanged -3. Test edge cases (empty grids, max entities, etc.) - -**Automated testing:** -```bash -# Run existing test suite -./mcrogueface --headless --exec tests/test_grid_operations.py -./mcrogueface --headless --exec tests/test_entity_movement.py -``` - -### Document Results - -**Create issue comment:** - -```markdown -## Performance Optimization Results - -**Issue:** #116 (Dirty Flag System) - -**Baseline:** -- Static grid (100x100): 32.1ms average -- FPS: 22 - -**After Optimization:** -- Static grid (100x100): 0.8ms average -- FPS: 118 - -**Improvement:** 40x faster for static scenes - -**Test:** `tests/benchmark_static_grid.py` - -**Commit:** abc123def -``` - ---- - -## Common Optimization Patterns - -### Pattern 1: Early Exit - -```cpp -// Check cheap conditions first -if (!visible) return; -if (opacity <= 0.0f) return; -if (!inViewport(bounds)) return; - -// Then do expensive work -render(); -``` - -### Pattern 2: Lazy Evaluation - -```cpp -// Don't compute until needed -mutable bool fovComputed = false; -mutable std::vector visibleCells; - -std::vector& getVisibleCells() { - if (!fovComputed) { - visibleCells = computeFOV(); - fovComputed = true; - } - return visibleCells; -} -``` - -### Pattern 3: Object Pooling - -```cpp -// Reuse instead of allocate/deallocate -class EntityPool { - std::vector pool; - std::vector active; - -public: - Entity* spawn() { - for (size_t i = 0; i < pool.size(); i++) { - if (!active[i]) { - active[i] = true; - return &pool[i]; - } - } - // Grow pool if needed - pool.emplace_back(); - active.push_back(true); - return &pool.back(); - } -}; -``` - -### Pattern 4: Space-Time Tradeoff - -```cpp -// Cache expensive computation -std::unordered_map> pathCache; - -std::vector getPathTo(int targetId) { - if (pathCache.contains(targetId)) { - return pathCache[targetId]; // O(1) lookup - } - - auto path = computeExpensivePath(targetId); - pathCache[targetId] = path; - return path; -} -``` - ---- - -## When NOT to Optimize - -**Don't optimize if:** -- Performance is already acceptable (< 16ms frame time) -- Optimization makes code significantly more complex -- You haven't profiled yet (no guessing!) -- The bottleneck is elsewhere (optimize hot paths first) - -**Premature optimization is the root of all evil** - Donald Knuth - -Focus on: -1. Correctness first -2. Profile to find real bottlenecks -3. Optimize hot paths only -4. Keep code maintainable - ---- - -## Related Documentation - -- [[Performance-and-Profiling]] - Profiling tools reference -- [[Grid-System]] - Grid optimization opportunities -- [[Writing-Tests]] - Creating performance tests - -**Open Issues:** -- [#115](../issues/115) - SpatialHash Implementation -- [#116](../issues/116) - Dirty Flag System -- [#113](../issues/113) - Batch Operations for Grid -- [#117](../issues/117) - Memory Pool for Entities \ No newline at end of file