3 Performance Optimization Workflow
John McCardle edited this page 2026-02-07 23:49:36 +00:00

Performance Optimization Workflow

Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.

Quick Reference

Related Systems: Performance-and-Profiling, Grid-System

Tools:

  • F3 profiler overlay (in-game)
  • src/Profiler.h - ScopedTimer
  • Benchmark API (mcrfpy.start_benchmark())

The Optimization Cycle

1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
     ^                                                         |
     +---------------------------------------------------------+

Step 1: Profile - Find the Bottleneck

Using F3 Overlay

Start the game and press F3:

Look for:

  • Red frame times (>33ms) - Unacceptable performance
  • Yellow frame times (16-33ms) - Marginal performance
  • High subsystem times - Which system is slow?
    • Grid rendering > 10ms? Grid optimization needed
    • Entity rendering > 5ms? Entity culling needed
    • Python script time > 5ms? Python callback optimization

Running Benchmarks

Use the benchmark API for detailed data capture:

import mcrfpy

mcrfpy.start_benchmark()

# ... run test scenario ...

filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")

See Performance-and-Profiling for benchmark output format and analysis.


Step 2: Identify - Understand the Problem

Common Performance Issues

Issue: High Grid Render Time on Static Screens

  • Symptom: 20-40ms grid render, nothing changing
  • Cause: Redrawing unchanged cells every frame
  • Status: Solved by chunk-based dirty flag system

Issue: High Entity Render Time with Many Entities

  • Symptom: 10-20ms entity render with 500+ entities
  • Cause: O(n) iteration, no spatial indexing
  • Solution: #115 SpatialHash (planned)

Issue: Slow Bulk Grid Updates from Python

  • Symptom: Frame drops when updating many cells
  • Cause: Python/C++ boundary crossings for each cell
  • Workaround: Minimize individual layer.set() calls; use layer.fill() for uniform data

Issue: High Python Script Time

  • Symptom: 10-50ms in Python callbacks
  • Cause: Heavy computation in Python update loops
  • Solution: Move hot paths to C++ or optimize Python

Step 3: Instrument - Measure Precisely

Adding ScopedTimer (C++)

Wrap slow functions with timing:

#include "Profiler.h"

void MySystem::slowFunction() {
    ScopedTimer timer(Resources::game->metrics.mySystemTime);
    // ... code to measure ...
}

Adding Custom Metrics (C++)

  1. Add field to ProfilingMetrics in src/GameEngine.h
  2. Reset in resetPerFrame()
  3. Display in src/ProfilerOverlay.cpp::update()
  4. Instrument with ScopedTimer
  5. Rebuild and press F3

Creating Python Benchmarks

import mcrfpy
import sys

def benchmark():
    scene = mcrfpy.Scene("bench")
    grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
    scene.children.append(grid)
    mcrfpy.current_scene = scene
    
    frame_times = []
    
    def measure(timer, runtime):
        frame_times.append(runtime)
        if len(frame_times) >= 300:
            avg = sum(frame_times) / len(frame_times)
            print(f"Average: {avg:.2f}ms")
            print(f"Min: {min(frame_times):.2f}ms")
            print(f"Max: {max(frame_times):.2f}ms")
            print(f"FPS: {1000/avg:.1f}")
            timer.stop()
            sys.exit(0)
    
    mcrfpy.Timer("benchmark", measure, 16)

benchmark()

Run:

cd build
./mcrogueface --exec ../tests/benchmark_mysystem.py

For headless benchmarks, use Python's time module instead of the Timer API since step() bypasses the game loop.


Step 4: Optimize - Make It Faster

Strategy 1: Reduce Work

Example: Dirty Flags (already implemented)

Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.

Strategy 2: Reduce Complexity

Example: Spatial queries

Instead of O(n) search through all entities, use entities_in_radius():

# O(1) spatial query
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)

Strategy 3: Batch Operations

Minimize Python/C++ boundary crossings:

# Less efficient: many individual layer.set() calls
for x in range(100):
    for y in range(100):
        layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))

# More efficient: single fill operation
layer.fill(mcrfpy.Color(0, 0, 0, 0))

Strategy 4: Cache Results

Example: Path caching

cached_path = [None]
last_target = [None]

def get_path_to(grid, start, target):
    if last_target[0] != target:
        cached_path[0] = grid.find_path(start, target)
        last_target[0] = target
    return cached_path[0]

Optimization Checklist

Before optimizing:

  • Profiled and identified real bottleneck
  • Measured baseline performance
  • Understood root cause

After optimization:

  • Measured improvement
  • Verified correctness
  • Updated tests if needed

Step 5: Verify - Measure Improvement

Re-run Benchmarks

Compare before and after measurements.

Check Correctness

Visual testing:

  1. Run game normally (not headless)
  2. Verify visual output unchanged
  3. Test edge cases

Automated testing:

cd build
./mcrogueface --headless --exec ../tests/unit/my_test.py

Document Results

Add findings to the relevant Gitea issue with:

  • Baseline numbers
  • Optimized numbers
  • Improvement factor
  • Test script name
  • Commit hash

When NOT to Optimize

Don't optimize if:

  • Performance is already acceptable (< 16ms frame time)
  • Optimization makes code significantly more complex
  • You haven't profiled yet (no guessing!)
  • The bottleneck is elsewhere

Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.