Update Performance Optimization Workflow

2026-02-07 23:49:36 +00:00 · 2026-02-07 23:49:36 +00:00 · 7f22636ca7
commit 7f22636ca7
parent 1c4d82ccc9
1 changed files with 254 additions and 254 deletions
--- a/Performance-Optimization-Workflow.-.md
+++ b/Performance-Optimization-Workflow.-.md
@ -1,255 +1,255 @@
-# Performance Optimization Workflow
-
-Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
-
-## Quick Reference
-
-**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
-
-**Tools:**
- F3 profiler overlay (in-game)
- `src/Profiler.h` - ScopedTimer
- Benchmark API (`mcrfpy.start_benchmark()`)
-
-## The Optimization Cycle
-
-```
-1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
-     ^                                                         |
-     +---------------------------------------------------------+
-```
-
---
-
-## Step 1: Profile - Find the Bottleneck
-
-### Using F3 Overlay
-
-**Start the game and press F3:**
-
-Look for:
- **Red frame times** (>33ms) - Unacceptable performance
- **Yellow frame times** (16-33ms) - Marginal performance
- **High subsystem times** - Which system is slow?
-  - Grid rendering > 10ms? Grid optimization needed
-  - Entity rendering > 5ms? Entity culling needed
-  - Python script time > 5ms? Python callback optimization
-
-### Running Benchmarks
-
-Use the benchmark API for detailed data capture:
-
-```python
-import mcrfpy
-
-mcrfpy.start_benchmark()
-
-# ... run test scenario ...
-
-filename = mcrfpy.end_benchmark()
-print(f"Benchmark saved to: {filename}")
-```
-
-See [[Performance-and-Profiling]] for benchmark output format and analysis.
-
---
-
-## Step 2: Identify - Understand the Problem
-
-### Common Performance Issues
-
-**Issue: High Grid Render Time on Static Screens**
- **Symptom:** 20-40ms grid render, nothing changing
- **Cause:** Redrawing unchanged cells every frame
- **Status:** Solved by chunk-based dirty flag system
-
-**Issue: High Entity Render Time with Many Entities**
- **Symptom:** 10-20ms entity render with 500+ entities
- **Cause:** O(n) iteration, no spatial indexing
- **Solution:** [#115](../issues/115) SpatialHash (planned)
-
-**Issue: Slow Bulk Grid Updates from Python**
- **Symptom:** Frame drops when updating many cells
- **Cause:** Python/C++ boundary crossings for each cell
- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
-
-**Issue: High Python Script Time**
- **Symptom:** 10-50ms in Python callbacks
- **Cause:** Heavy computation in Python update loops
- **Solution:** Move hot paths to C++ or optimize Python
-
---
-
-## Step 3: Instrument - Measure Precisely
-
-### Adding ScopedTimer (C++)
-
-Wrap slow functions with timing:
-
-```cpp
-#include "Profiler.h"
-
-void MySystem::slowFunction() {
-    ScopedTimer timer(Resources::game->metrics.mySystemTime);
-    // ... code to measure ...
-}
-```
-
-### Adding Custom Metrics (C++)
-
-1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
-2. Reset in `resetPerFrame()`
-3. Display in `src/ProfilerOverlay.cpp::update()`
-4. Instrument with ScopedTimer
-5. Rebuild and press F3
-
-### Creating Python Benchmarks
-
-```python
-import mcrfpy
-import sys
-
-def benchmark():
-    scene = mcrfpy.Scene("bench")
-    grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
-    scene.children.append(grid)
-    mcrfpy.current_scene = scene
-    
-    frame_times = []
-    
-    def measure(timer, runtime):
-        frame_times.append(runtime)
-        if len(frame_times) >= 300:
-            avg = sum(frame_times) / len(frame_times)
-            print(f"Average: {avg:.2f}ms")
-            print(f"Min: {min(frame_times):.2f}ms")
-            print(f"Max: {max(frame_times):.2f}ms")
-            print(f"FPS: {1000/avg:.1f}")
-            timer.stop()
-            sys.exit(0)
-    
-    mcrfpy.Timer("benchmark", measure, 16)
-
-benchmark()
-```
-
-**Run:**
-```bash
-cd build
-./mcrogueface --exec ../tests/benchmark_mysystem.py
-```
-
-For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
-
---
-
-## Step 4: Optimize - Make It Faster
-
-### Strategy 1: Reduce Work
-
-**Example: Dirty Flags (already implemented)**
-
-Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
-
-### Strategy 2: Reduce Complexity
-
-**Example: Spatial queries**
-
-Instead of O(n) search through all entities, use `entities_in_radius()`:
-
-```python
-# O(1) spatial query
-nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
-```
-
-### Strategy 3: Batch Operations
-
-**Minimize Python/C++ boundary crossings:**
-
-```python
-# Less efficient: many individual layer.set() calls
-for x in range(100):
-    for y in range(100):
-        layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
-
-# More efficient: single fill operation
-layer.fill(mcrfpy.Color(0, 0, 0, 0))
-```
-
-### Strategy 4: Cache Results
-
-**Example: Path caching**
-
-```python
-cached_path = [None]
-last_target = [None]
-
-def get_path_to(grid, start, target):
-    if last_target[0] != target:
-        cached_path[0] = grid.find_path(start, target)
-        last_target[0] = target
-    return cached_path[0]
-```
-
-### Optimization Checklist
-
-Before optimizing:
- [ ] Profiled and identified real bottleneck
- [ ] Measured baseline performance
- [ ] Understood root cause
-
-After optimization:
- [ ] Measured improvement
- [ ] Verified correctness
- [ ] Updated tests if needed
-
---
-
-## Step 5: Verify - Measure Improvement
-
-### Re-run Benchmarks
-
-Compare before and after measurements.
-
-### Check Correctness
-
-**Visual testing:**
-1. Run game normally (not headless)
-2. Verify visual output unchanged
-3. Test edge cases
-
-**Automated testing:**
-```bash
-cd build
-./mcrogueface --headless --exec ../tests/unit/my_test.py
-```
-
-### Document Results
-
-Add findings to the relevant Gitea issue with:
- Baseline numbers
- Optimized numbers
- Improvement factor
- Test script name
- Commit hash
-
---
-
-## When NOT to Optimize
-
-**Don't optimize if:**
- Performance is already acceptable (< 16ms frame time)
- Optimization makes code significantly more complex
- You haven't profiled yet (no guessing!)
- The bottleneck is elsewhere
-
-Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
-
---
-
-## Related Documentation
-
- [[Performance-and-Profiling]] - Profiling tools reference
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
- [[Grid-System]] - Grid optimization opportunities
+# Performance Optimization Workflow
+
+Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
+
+## Quick Reference
+
+**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
+
+**Tools:**
+- F3 profiler overlay (in-game)
+- `src/Profiler.h` - ScopedTimer
+- Benchmark API (`mcrfpy.start_benchmark()`)
+
+## The Optimization Cycle
+
+```
+1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
+     ^                                                         |
+     +---------------------------------------------------------+
+```
+
+---
+
+## Step 1: Profile - Find the Bottleneck
+
+### Using F3 Overlay
+
+**Start the game and press F3:**
+
+Look for:
+- **Red frame times** (>33ms) - Unacceptable performance
+- **Yellow frame times** (16-33ms) - Marginal performance
+- **High subsystem times** - Which system is slow?
+  - Grid rendering > 10ms? Grid optimization needed
+  - Entity rendering > 5ms? Entity culling needed
+  - Python script time > 5ms? Python callback optimization
+
+### Running Benchmarks
+
+Use the benchmark API for detailed data capture:
+
+```python
+import mcrfpy
+
+mcrfpy.start_benchmark()
+
+# ... run test scenario ...
+
+filename = mcrfpy.end_benchmark()
+print(f"Benchmark saved to: {filename}")
+```
+
+See [[Performance-and-Profiling]] for benchmark output format and analysis.
+
+---
+
+## Step 2: Identify - Understand the Problem
+
+### Common Performance Issues
+
+**Issue: High Grid Render Time on Static Screens**
+- **Symptom:** 20-40ms grid render, nothing changing
+- **Cause:** Redrawing unchanged cells every frame
+- **Status:** Solved by chunk-based dirty flag system
+
+**Issue: High Entity Render Time with Many Entities**
+- **Symptom:** 10-20ms entity render with 500+ entities
+- **Cause:** O(n) iteration, no spatial indexing
+- **Solution:** [#115](../issues/115) SpatialHash (planned)
+
+**Issue: Slow Bulk Grid Updates from Python**
+- **Symptom:** Frame drops when updating many cells
+- **Cause:** Python/C++ boundary crossings for each cell
+- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
+
+**Issue: High Python Script Time**
+- **Symptom:** 10-50ms in Python callbacks
+- **Cause:** Heavy computation in Python update loops
+- **Solution:** Move hot paths to C++ or optimize Python
+
+---
+
+## Step 3: Instrument - Measure Precisely
+
+### Adding ScopedTimer (C++)
+
+Wrap slow functions with timing:
+
+```cpp
+#include "Profiler.h"
+
+void MySystem::slowFunction() {
+    ScopedTimer timer(Resources::game->metrics.mySystemTime);
+    // ... code to measure ...
+}
+```
+
+### Adding Custom Metrics (C++)
+
+1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
+2. Reset in `resetPerFrame()`
+3. Display in `src/ProfilerOverlay.cpp::update()`
+4. Instrument with ScopedTimer
+5. Rebuild and press F3
+
+### Creating Python Benchmarks
+
+```python
+import mcrfpy
+import sys
+
+def benchmark():
+    scene = mcrfpy.Scene("bench")
+    grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
+    scene.children.append(grid)
+    mcrfpy.current_scene = scene
+    
+    frame_times = []
+    
+    def measure(timer, runtime):
+        frame_times.append(runtime)
+        if len(frame_times) >= 300:
+            avg = sum(frame_times) / len(frame_times)
+            print(f"Average: {avg:.2f}ms")
+            print(f"Min: {min(frame_times):.2f}ms")
+            print(f"Max: {max(frame_times):.2f}ms")
+            print(f"FPS: {1000/avg:.1f}")
+            timer.stop()
+            sys.exit(0)
+    
+    mcrfpy.Timer("benchmark", measure, 16)
+
+benchmark()
+```
+
+**Run:**
+```bash
+cd build
+./mcrogueface --exec ../tests/benchmark_mysystem.py
+```
+
+For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
+
+---
+
+## Step 4: Optimize - Make It Faster
+
+### Strategy 1: Reduce Work
+
+**Example: Dirty Flags (already implemented)**
+
+Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
+
+### Strategy 2: Reduce Complexity
+
+**Example: Spatial queries**
+
+Instead of O(n) search through all entities, use `entities_in_radius()`:
+
+```python
+# O(1) spatial query
+nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
+```
+
+### Strategy 3: Batch Operations
+
+**Minimize Python/C++ boundary crossings:**
+
+```python
+# Less efficient: many individual layer.set() calls
+for x in range(100):
+    for y in range(100):
+        layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
+
+# More efficient: single fill operation
+layer.fill(mcrfpy.Color(0, 0, 0, 0))
+```
+
+### Strategy 4: Cache Results
+
+**Example: Path caching**
+
+```python
+cached_path = [None]
+last_target = [None]
+
+def get_path_to(grid, start, target):
+    if last_target[0] != target:
+        cached_path[0] = grid.find_path(start, target)
+        last_target[0] = target
+    return cached_path[0]
+```
+
+### Optimization Checklist
+
+Before optimizing:
+- [ ] Profiled and identified real bottleneck
+- [ ] Measured baseline performance
+- [ ] Understood root cause
+
+After optimization:
+- [ ] Measured improvement
+- [ ] Verified correctness
+- [ ] Updated tests if needed
+
+---
+
+## Step 5: Verify - Measure Improvement
+
+### Re-run Benchmarks
+
+Compare before and after measurements.
+
+### Check Correctness
+
+**Visual testing:**
+1. Run game normally (not headless)
+2. Verify visual output unchanged
+3. Test edge cases
+
+**Automated testing:**
+```bash
+cd build
+./mcrogueface --headless --exec ../tests/unit/my_test.py
+```
+
+### Document Results
+
+Add findings to the relevant Gitea issue with:
+- Baseline numbers
+- Optimized numbers
+- Improvement factor
+- Test script name
+- Commit hash
+
+---
+
+## When NOT to Optimize
+
+**Don't optimize if:**
+- Performance is already acceptable (< 16ms frame time)
+- Optimization makes code significantly more complex
+- You haven't profiled yet (no guessing!)
+- The bottleneck is elsewhere
+
+Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
+
+---
+
+## Related Documentation
+
+- [[Performance-and-Profiling]] - Profiling tools reference
+- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
+- [[Grid-System]] - Grid optimization opportunities
 - [[Writing-Tests]] - Creating performance tests