1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
     ^                                                         |
     +---------------------------------------------------------+

Step 1: Profile - Find the Bottleneck

Using F3 Overlay

Start the game and press F3:

Look for:

Red frame times (>33ms) - Unacceptable performance
Yellow frame times (16-33ms) - Marginal performance
High subsystem times - Which system is slow?
- Grid rendering > 10ms? Grid optimization needed
- Entity rendering > 5ms? Entity culling needed
- Python script time > 5ms? Python callback optimization

Running Benchmarks

Use the benchmark API for detailed data capture:

import mcrfpy

mcrfpy.start_benchmark()

# ... run test scenario ...

filename = mcrfpy.end_benchmark()
print(f"Benchmark saved to: {filename}")

See Performance-and-Profiling for benchmark output format and analysis.

Step 2: Identify - Understand the Problem

Common Performance Issues

Issue: High Grid Render Time on Static Screens

Symptom: 20-40ms grid render, nothing changing
Cause: Redrawing unchanged cells every frame
Status: Solved by chunk-based dirty flag system

Issue: High Entity Render Time with Many Entities

Symptom: 10-20ms entity render with 500+ entities
Cause: O(n) iteration, no spatial indexing
Solution: #115 SpatialHash (planned)

Issue: Slow Bulk Grid Updates from Python

Symptom: Frame drops when updating many cells
Cause: Python/C++ boundary crossings for each cell
Workaround: Minimize individual layer.set() calls; use layer.fill() for uniform data

Issue: High Python Script Time

Symptom: 10-50ms in Python callbacks
Cause: Heavy computation in Python update loops
Solution: Move hot paths to C++ or optimize Python

Step 3: Instrument - Measure Precisely

Adding ScopedTimer (C++)

Wrap slow functions with timing:

#include "Profiler.h"

void MySystem::slowFunction() {
    ScopedTimer timer(Resources::game->metrics.mySystemTime);
    // ... code to measure ...
}

Adding Custom Metrics (C++)

Add field to ProfilingMetrics in src/GameEngine.h
Reset in resetPerFrame()
Display in src/ProfilerOverlay.cpp::update()
Instrument with ScopedTimer
Rebuild and press F3

Creating Python Benchmarks

import mcrfpy
import sys

def benchmark():
    scene = mcrfpy.Scene("bench")
    grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
    scene.children.append(grid)
    mcrfpy.current_scene = scene
    
    frame_times = []
    
    def measure(timer, runtime):
        frame_times.append(runtime)
        if len(frame_times) >= 300:
            avg = sum(frame_times) / len(frame_times)
            print(f"Average: {avg:.2f}ms")
            print(f"Min: {min(frame_times):.2f}ms")
            print(f"Max: {max(frame_times):.2f}ms")
            print(f"FPS: {1000/avg:.1f}")
            timer.stop()
            sys.exit(0)
    
    mcrfpy.Timer("benchmark", measure, 16)

benchmark()

Run:

cd build
./mcrogueface --exec ../tests/benchmark_mysystem.py

For headless benchmarks, use Python's time module instead of the Timer API since step() bypasses the game loop.

Step 4: Optimize - Make It Faster

Strategy 1: Reduce Work

Example: Dirty Flags (already implemented)

Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.

Strategy 2: Reduce Complexity

Example: Spatial queries

Instead of O(n) search through all entities, use entities_in_radius():

# O(1) spatial query
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)

Strategy 3: Batch Operations

Minimize Python/C++ boundary crossings:

# Less efficient: many individual layer.set() calls
for x in range(100):
    for y in range(100):
        layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))

# More efficient: single fill operation
layer.fill(mcrfpy.Color(0, 0, 0, 0))

Strategy 4: Cache Results

Example: Path caching

cached_path = [None]
last_target = [None]

def get_path_to(grid, start, target):
    if last_target[0] != target:
        cached_path[0] = grid.find_path(start, target)
        last_target[0] = target
    return cached_path[0]

Optimization Checklist

Before optimizing:

Profiled and identified real bottleneck
Measured baseline performance
Understood root cause

After optimization:

Measured improvement
Verified correctness
Updated tests if needed

Step 5: Verify - Measure Improvement

Re-run Benchmarks

Compare before and after measurements.

Check Correctness

Visual testing:

Run game normally (not headless)
Verify visual output unchanged
Test edge cases

Automated testing:

cd build
./mcrogueface --headless --exec ../tests/unit/my_test.py

Document Results

Add findings to the relevant Gitea issue with:

Baseline numbers
Optimized numbers
Improvement factor
Test script name
Commit hash

When NOT to Optimize

Don't optimize if:

Performance is already acceptable (< 16ms frame time)
Optimization makes code significantly more complex
You haven't profiled yet (no guessing!)
The bottleneck is elsewhere

Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.

Performance-and-Profiling - Profiling tools reference
Grid-Rendering-Pipeline - Chunk caching and dirty flags
Grid-System - Grid optimization opportunities
Writing-Tests - Creating performance tests

Performance Optimization Workflow

Quick Reference

The Optimization Cycle