Update "Performance-Optimization-Workflow.-"

2026-02-07 22:33:36 +00:00 · 2026-02-07 22:33:36 +00:00 · 070d5bb813
commit 070d5bb813
parent e1d3504bf5
2 changed files with 255 additions and 494 deletions
--- a/Performance-Optimization-Workflow.-.md
+++ b/Performance-Optimization-Workflow.-.md
@ -0,0 +1,255 @@
 # Performance Optimization Workflow
 Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
 ## Quick Reference
 **Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
 **Tools:**
 - F3 profiler overlay (in-game)
 - `src/Profiler.h` - ScopedTimer
 - Benchmark API (`mcrfpy.start_benchmark()`)
 ## The Optimization Cycle
 ```
 1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
     ^                                                         |
     +---------------------------------------------------------+
 ```
 ---
 ## Step 1: Profile - Find the Bottleneck
 ### Using F3 Overlay
 **Start the game and press F3:**
 Look for:
 - **Red frame times** (>33ms) - Unacceptable performance
 - **Yellow frame times** (16-33ms) - Marginal performance
 - **High subsystem times** - Which system is slow?
  - Grid rendering > 10ms? Grid optimization needed
  - Entity rendering > 5ms? Entity culling needed
  - Python script time > 5ms? Python callback optimization
 ### Running Benchmarks
 Use the benchmark API for detailed data capture:
 ```python
 import mcrfpy
 mcrfpy.start_benchmark()
 # ... run test scenario ...
 filename = mcrfpy.end_benchmark()
 print(f"Benchmark saved to: {filename}")
 ```
 See [[Performance-and-Profiling]] for benchmark output format and analysis.
 ---
 ## Step 2: Identify - Understand the Problem
 ### Common Performance Issues
 **Issue: High Grid Render Time on Static Screens**
 - **Symptom:** 20-40ms grid render, nothing changing
 - **Cause:** Redrawing unchanged cells every frame
 - **Status:** Solved by chunk-based dirty flag system
 **Issue: High Entity Render Time with Many Entities**
 - **Symptom:** 10-20ms entity render with 500+ entities
 - **Cause:** O(n) iteration, no spatial indexing
 - **Solution:** [#115](../issues/115) SpatialHash (planned)
 **Issue: Slow Bulk Grid Updates from Python**
 - **Symptom:** Frame drops when updating many cells
 - **Cause:** Python/C++ boundary crossings for each cell
 - **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
 **Issue: High Python Script Time**
 - **Symptom:** 10-50ms in Python callbacks
 - **Cause:** Heavy computation in Python update loops
 - **Solution:** Move hot paths to C++ or optimize Python
 ---
 ## Step 3: Instrument - Measure Precisely
 ### Adding ScopedTimer (C++)
 Wrap slow functions with timing:
 ```cpp
 #include "Profiler.h"
 void MySystem::slowFunction() {
    ScopedTimer timer(Resources::game->metrics.mySystemTime);
    // ... code to measure ...
 }
 ```
 ### Adding Custom Metrics (C++)
 1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
 2. Reset in `resetPerFrame()`
 3. Display in `src/ProfilerOverlay.cpp::update()`
 4. Instrument with ScopedTimer
 5. Rebuild and press F3
 ### Creating Python Benchmarks
 ```python
 import mcrfpy
 import sys
 def benchmark():
    scene = mcrfpy.Scene("bench")
    grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
    scene.children.append(grid)
    mcrfpy.current_scene = scene
    frame_times = []
    def measure(timer, runtime):
        frame_times.append(runtime)
        if len(frame_times) >= 300:
            avg = sum(frame_times) / len(frame_times)
            print(f"Average: {avg:.2f}ms")
            print(f"Min: {min(frame_times):.2f}ms")
            print(f"Max: {max(frame_times):.2f}ms")
            print(f"FPS: {1000/avg:.1f}")
            timer.stop()
            sys.exit(0)
    mcrfpy.Timer("benchmark", measure, 16)
 benchmark()
 ```
 **Run:**
 ```bash
 cd build
 ./mcrogueface --exec ../tests/benchmark_mysystem.py
 ```
 For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
 ---
 ## Step 4: Optimize - Make It Faster
 ### Strategy 1: Reduce Work
 **Example: Dirty Flags (already implemented)**
 Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
 ### Strategy 2: Reduce Complexity
 **Example: Spatial queries**
 Instead of O(n) search through all entities, use `entities_in_radius()`:
 ```python
 # O(1) spatial query
 nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
 ```
 ### Strategy 3: Batch Operations
 **Minimize Python/C++ boundary crossings:**
 ```python
 # Less efficient: many individual layer.set() calls
 for x in range(100):
    for y in range(100):
        layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
 # More efficient: single fill operation
 layer.fill(mcrfpy.Color(0, 0, 0, 0))
 ```
 ### Strategy 4: Cache Results
 **Example: Path caching**
 ```python
 cached_path = [None]
 last_target = [None]
 def get_path_to(grid, start, target):
    if last_target[0] != target:
        cached_path[0] = grid.find_path(start, target)
        last_target[0] = target
    return cached_path[0]
 ```
 ### Optimization Checklist
 Before optimizing:
 - [ ] Profiled and identified real bottleneck
 - [ ] Measured baseline performance
 - [ ] Understood root cause
 After optimization:
 - [ ] Measured improvement
 - [ ] Verified correctness
 - [ ] Updated tests if needed
 ---
 ## Step 5: Verify - Measure Improvement
 ### Re-run Benchmarks
 Compare before and after measurements.
 ### Check Correctness
 **Visual testing:**
 1. Run game normally (not headless)
 2. Verify visual output unchanged
 3. Test edge cases
 **Automated testing:**
 ```bash
 cd build
 ./mcrogueface --headless --exec ../tests/unit/my_test.py
 ```
 ### Document Results
 Add findings to the relevant Gitea issue with:
 - Baseline numbers
 - Optimized numbers
 - Improvement factor
 - Test script name
 - Commit hash
 ---
 ## When NOT to Optimize
 **Don't optimize if:**
 - Performance is already acceptable (< 16ms frame time)
 - Optimization makes code significantly more complex
 - You haven't profiled yet (no guessing!)
 - The bottleneck is elsewhere
 Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
 ---
 ## Related Documentation
 - [[Performance-and-Profiling]] - Profiling tools reference
 - [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
 - [[Grid-System]] - Grid optimization opportunities
 - [[Writing-Tests]] - Creating performance tests
--- a/Performance-Optimization-Workflow.md
+++ b/Performance-Optimization-Workflow.md
@ -1,494 +0,0 @@
 # Performance Optimization Workflow
 Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
 ## Quick Reference
 **Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
 **Tools:**
 - F3 profiler overlay (in-game)
 - `src/Profiler.h` - ScopedTimer
 - `tests/benchmark_*.py` - Performance benchmarks
 ## The Optimization Cycle
 ```
 1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY
     ↑                                                         |
     └─────────────────────────────────────────────────────────┘
 ```
 ---
 ## Step 1: Profile - Find the Bottleneck
 ### Using F3 Overlay
 **Start the game and press F3:**
 Look for:
 - **Red frame times** (>33ms) - Unacceptable performance
 - **Yellow frame times** (16-33ms) - Marginal performance
 - **High subsystem times** - Which system is slow?
  - Grid rendering > 10ms? → Grid optimization needed
  - Entity rendering > 5ms? → Entity culling/SpatialHash needed
  - Python script time > 5ms? → Python callback optimization
 **Example profiler output:**
 ```
 Frame: 45.2ms (RED)
 FPS: 22
 Grid Render: 32.1ms  ← BOTTLENECK!
 Entity Render: 8.5ms
 Python Script: 2.1ms
 Animation: 1.2ms
 Cells Rendered: 10000
 Entities: 150 (visible: 50)
 ```
 **Analysis:** Grid rendering is the bottleneck (32ms of 45ms total).
 ### Running Benchmarks
 **Static Grid Benchmark:**
 ```bash
 cd build
 ./mcrogueface --headless --exec ../tests/benchmark_static_grid.py
 ```
 **Moving Entities Benchmark:**
 ```bash
 ./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py
 ```
 **Output:** Baseline performance metrics to measure improvement against.
 ---
 ## Step 2: Identify - Understand the Problem
 ### Common Performance Issues
 **Issue: High Grid Render Time on Static Screens**
 - **Symptom:** 20-40ms grid render, nothing changing
 - **Cause:** Redrawing unchanged cells every frame
 - **Solution:** Implement dirty flag system ([#116](../issues/116))
 **Issue: High Entity Render Time with Many Entities**
 - **Symptom:** 10-20ms entity render with 500+ entities
 - **Cause:** O(n) iteration, no spatial indexing
 - **Solution:** Implement SpatialHash ([#115](../issues/115))
 **Issue: Slow Bulk Grid Updates from Python**
 - **Symptom:** Frame drops when updating many cells
 - **Cause:** Python/C++ boundary crossings for each cell
 - **Solution:** Implement batch operations ([#113](../issues/113))
 **Issue: High Python Script Time**
 - **Symptom:** 10-50ms in Python callbacks
 - **Cause:** Heavy computation in Python update loops
 - **Solution:** Move hot paths to C++ or optimize Python
 ### Analyzing Call Stacks
 **When F3 overlay isn't enough:**
 ```bash
 # Build with debug symbols
 make clean
 cmake .. -DCMAKE_BUILD_TYPE=Debug
 make
 # Profile with gdb
 gdb ./mcrogueface
 (gdb) run
 <trigger slow behavior>
 (gdb) info threads
 (gdb) bt  # Backtrace
 ```
 ---
 ## Step 3: Instrument - Measure Precisely
 ### Adding ScopedTimer
 **Identify the slow function and wrap it:**
 ```cpp
 #include "Profiler.h"
 void MySystem::slowFunction() {
    // Add timer
    ScopedTimer timer(Resources::game->metrics.mySystemTime);
    // Your slow code here
    for (auto& item : items) {
        item->process();
    }
 }
 ```
 ### Adding Custom Metrics
 **1. Add metric field to ProfilingMetrics:**
 `src/GameEngine.h`:
 ```cpp
 struct ProfilingMetrics {
    // ... existing fields ...
    float mySystemTime = 0.0f;  // Add this
 };
 ```
 **2. Reset in resetPerFrame():**
 ```cpp
 void ProfilingMetrics::resetPerFrame() {
    // ... existing resets ...
    mySystemTime = 0.0f;  // Add this
 }
 ```
 **3. Display in ProfilerOverlay:**
 `src/ProfilerOverlay.cpp`:
 ```cpp
 void ProfilerOverlay::update(const ProfilingMetrics& metrics) {
    // ... existing formatting ...
    ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n";
 }
 ```
 **4. Rebuild and test:**
 ```bash
 make
 cd build
 ./mcrogueface
 # Press F3 - see your metric!
 ```
 ### Creating Benchmarks
 **Create `tests/benchmark_mysystem.py`:**
 ```python
 import mcrfpy
 import sys
 import time
 def benchmark():
    # Setup
    mcrfpy.createScene("bench")
    # ... create test scenario ...
    frame_times = []
    def measure(runtime_ms):
        frame_times.append(runtime_ms)
        if len(frame_times) >= 300:  # 5 seconds at 60fps
            # Report statistics
            avg = sum(frame_times) / len(frame_times)
            min_time = min(frame_times)
            max_time = max(frame_times)
            print(f"Average: {avg:.2f}ms")
            print(f"Min: {min_time:.2f}ms")
            print(f"Max: {max_time:.2f}ms")
            print(f"FPS: {1000/avg:.1f}")
            sys.exit(0)
    mcrfpy.setTimer("benchmark", measure, 16)  # Every frame
 benchmark()
 ```
 **Run:**
 ```bash
 ./mcrogueface --headless --exec ../tests/benchmark_mysystem.py
 ```
 ---
 ## Step 4: Optimize - Make It Faster
 ### Optimization Strategies
 #### Strategy 1: Reduce Work
 **Example: Grid Dirty Flags**
 **Before:** Redraw all cells every frame
 ```cpp
 for (int x = 0; x < grid_x; x++) {
    for (int y = 0; y < grid_y; y++) {
        renderCell(x, y);  // Always renders
    }
 }
 ```
 **After:** Only redraw when changed
 ```cpp
 if (grid_dirty) {
    for (int x = 0; x < grid_x; x++) {
        for (int y = 0; y < grid_y; y++) {
            renderCell(x, y);
        }
    }
    grid_dirty = false;
 }
 ```
 **Expected:** 10-50x improvement for static scenes
 #### Strategy 2: Reduce Complexity
 **Example: SpatialHash for Entity Queries**
 **Before:** O(n) search through all entities
 ```cpp
 for (auto& entity : entities) {
    if (distanceTo(entity, target) < radius) {
        nearby.push_back(entity);
    }
 }
 ```
 **After:** O(1) hash lookup
 ```cpp
 auto cell = spatialHash.getCell(target.x, target.y);
 for (auto& entity : cell.entities) {
    nearby.push_back(entity);
 }
 ```
 **Expected:** 100x+ improvement for large entity counts
 #### Strategy 3: Batch Operations
 **Example: Grid Batch Updates**
 **Before:** Multiple Python/C++ crossings
 ```python
 for x in range(100):
    for y in range(100):
        grid.at((x, y)).tilesprite = 42  # 10,000 calls!
 ```
 **After:** Single batch operation
 ```python
 grid.fill_rect(0, 0, 100, 100, 42)  # 1 call!
 ```
 **Expected:** 10-100x improvement for bulk updates
 #### Strategy 4: Cache Results
 **Example: Path Caching in Entities**
 **Before:** Recompute path every frame
 ```cpp
 void Entity::update() {
    path = computePathTo(target);  // Expensive!
    followPath(path);
 }
 ```
 **After:** Cache and reuse
 ```cpp
 void Entity::update() {
    if (!cachedPath || targetMoved) {
        cachedPath = computePathTo(target);
    }
    followPath(cachedPath);
 }
 ```
 **Expected:** 10x+ improvement for pathfinding-heavy scenarios
 ### Optimization Checklist
 Before optimizing:
 - [ ] Profiled and identified real bottleneck
 - [ ] Measured baseline performance
 - [ ] Understood root cause
 During optimization:
 - [ ] Changed only one thing at a time
 - [ ] Kept original code for comparison
 - [ ] Added comments explaining optimization
 After optimization:
 - [ ] Measured improvement
 - [ ] Verified correctness (no bugs introduced)
 - [ ] Updated tests if needed
 ---
 ## Step 5: Verify - Measure Improvement
 ### Re-run Benchmarks
 **Before optimization:**
 ```
 Average: 45.2ms
 Min: 38.1ms
 Max: 62.3ms
 FPS: 22.1
 ```
 **After optimization:**
 ```
 Average: 8.5ms  ← 5.3x improvement!
 Min: 7.2ms
 Max: 12.1ms
 FPS: 117.6
 ```
 ### Check Correctness
 **Visual testing:**
 1. Run game normally (not headless)
 2. Verify visual output unchanged
 3. Test edge cases (empty grids, max entities, etc.)
 **Automated testing:**
 ```bash
 # Run existing test suite
 ./mcrogueface --headless --exec tests/test_grid_operations.py
 ./mcrogueface --headless --exec tests/test_entity_movement.py
 ```
 ### Document Results
 **Create issue comment:**
 ```markdown
 ## Performance Optimization Results
 **Issue:** #116 (Dirty Flag System)
 **Baseline:**
 - Static grid (100x100): 32.1ms average
 - FPS: 22
 **After Optimization:**
 - Static grid (100x100): 0.8ms average
 - FPS: 118
 **Improvement:** 40x faster for static scenes
 **Test:** `tests/benchmark_static_grid.py`
 **Commit:** abc123def
 ```
 ---
 ## Common Optimization Patterns
 ### Pattern 1: Early Exit
 ```cpp
 // Check cheap conditions first
 if (!visible) return;
 if (opacity <= 0.0f) return;
 if (!inViewport(bounds)) return;
 // Then do expensive work
 render();
 ```
 ### Pattern 2: Lazy Evaluation
 ```cpp
 // Don't compute until needed
 mutable bool fovComputed = false;
 mutable std::vector<Point> visibleCells;
 std::vector<Point>& getVisibleCells() {
    if (!fovComputed) {
        visibleCells = computeFOV();
        fovComputed = true;
    }
    return visibleCells;
 }
 ```
 ### Pattern 3: Object Pooling
 ```cpp
 // Reuse instead of allocate/deallocate
 class EntityPool {
    std::vector<Entity> pool;
    std::vector<bool> active;
 public:
    Entity* spawn() {
        for (size_t i = 0; i < pool.size(); i++) {
            if (!active[i]) {
                active[i] = true;
                return &pool[i];
            }
        }
        // Grow pool if needed
        pool.emplace_back();
        active.push_back(true);
        return &pool.back();
    }
 };
 ```
 ### Pattern 4: Space-Time Tradeoff
 ```cpp
 // Cache expensive computation
 std::unordered_map<int, std::vector<Point>> pathCache;
 std::vector<Point> getPathTo(int targetId) {
    if (pathCache.contains(targetId)) {
        return pathCache[targetId];  // O(1) lookup
    }
    auto path = computeExpensivePath(targetId);
    pathCache[targetId] = path;
    return path;
 }
 ```
 ---
 ## When NOT to Optimize
 **Don't optimize if:**
 - Performance is already acceptable (< 16ms frame time)
 - Optimization makes code significantly more complex
 - You haven't profiled yet (no guessing!)
 - The bottleneck is elsewhere (optimize hot paths first)
 **Premature optimization is the root of all evil** - Donald Knuth
 Focus on:
 1. Correctness first
 2. Profile to find real bottlenecks
 3. Optimize hot paths only
 4. Keep code maintainable
 ---
 ## Related Documentation
 - [[Performance-and-Profiling]] - Profiling tools reference
 - [[Grid-System]] - Grid optimization opportunities
 - [[Writing-Tests]] - Creating performance tests
 **Open Issues:**
 - [#115](../issues/115) - SpatialHash Implementation
 - [#116](../issues/116) - Dirty Flag System
 - [#113](../issues/113) - Batch Operations for Grid
 - [#117](../issues/117) - Memory Pool for Entities