Update Performance Optimization Workflow
parent
1c4d82ccc9
commit
7f22636ca7
1 changed files with 254 additions and 254 deletions
|
|
@ -1,255 +1,255 @@
|
|||
# Performance Optimization Workflow
|
||||
|
||||
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
|
||||
|
||||
**Tools:**
|
||||
- F3 profiler overlay (in-game)
|
||||
- `src/Profiler.h` - ScopedTimer
|
||||
- Benchmark API (`mcrfpy.start_benchmark()`)
|
||||
|
||||
## The Optimization Cycle
|
||||
|
||||
```
|
||||
1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
|
||||
^ |
|
||||
+---------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Profile - Find the Bottleneck
|
||||
|
||||
### Using F3 Overlay
|
||||
|
||||
**Start the game and press F3:**
|
||||
|
||||
Look for:
|
||||
- **Red frame times** (>33ms) - Unacceptable performance
|
||||
- **Yellow frame times** (16-33ms) - Marginal performance
|
||||
- **High subsystem times** - Which system is slow?
|
||||
- Grid rendering > 10ms? Grid optimization needed
|
||||
- Entity rendering > 5ms? Entity culling needed
|
||||
- Python script time > 5ms? Python callback optimization
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
Use the benchmark API for detailed data capture:
|
||||
|
||||
```python
|
||||
import mcrfpy
|
||||
|
||||
mcrfpy.start_benchmark()
|
||||
|
||||
# ... run test scenario ...
|
||||
|
||||
filename = mcrfpy.end_benchmark()
|
||||
print(f"Benchmark saved to: {filename}")
|
||||
```
|
||||
|
||||
See [[Performance-and-Profiling]] for benchmark output format and analysis.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Identify - Understand the Problem
|
||||
|
||||
### Common Performance Issues
|
||||
|
||||
**Issue: High Grid Render Time on Static Screens**
|
||||
- **Symptom:** 20-40ms grid render, nothing changing
|
||||
- **Cause:** Redrawing unchanged cells every frame
|
||||
- **Status:** Solved by chunk-based dirty flag system
|
||||
|
||||
**Issue: High Entity Render Time with Many Entities**
|
||||
- **Symptom:** 10-20ms entity render with 500+ entities
|
||||
- **Cause:** O(n) iteration, no spatial indexing
|
||||
- **Solution:** [#115](../issues/115) SpatialHash (planned)
|
||||
|
||||
**Issue: Slow Bulk Grid Updates from Python**
|
||||
- **Symptom:** Frame drops when updating many cells
|
||||
- **Cause:** Python/C++ boundary crossings for each cell
|
||||
- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
|
||||
|
||||
**Issue: High Python Script Time**
|
||||
- **Symptom:** 10-50ms in Python callbacks
|
||||
- **Cause:** Heavy computation in Python update loops
|
||||
- **Solution:** Move hot paths to C++ or optimize Python
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Instrument - Measure Precisely
|
||||
|
||||
### Adding ScopedTimer (C++)
|
||||
|
||||
Wrap slow functions with timing:
|
||||
|
||||
```cpp
|
||||
#include "Profiler.h"
|
||||
|
||||
void MySystem::slowFunction() {
|
||||
ScopedTimer timer(Resources::game->metrics.mySystemTime);
|
||||
// ... code to measure ...
|
||||
}
|
||||
```
|
||||
|
||||
### Adding Custom Metrics (C++)
|
||||
|
||||
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
|
||||
2. Reset in `resetPerFrame()`
|
||||
3. Display in `src/ProfilerOverlay.cpp::update()`
|
||||
4. Instrument with ScopedTimer
|
||||
5. Rebuild and press F3
|
||||
|
||||
### Creating Python Benchmarks
|
||||
|
||||
```python
|
||||
import mcrfpy
|
||||
import sys
|
||||
|
||||
def benchmark():
|
||||
scene = mcrfpy.Scene("bench")
|
||||
grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
|
||||
scene.children.append(grid)
|
||||
mcrfpy.current_scene = scene
|
||||
|
||||
frame_times = []
|
||||
|
||||
def measure(timer, runtime):
|
||||
frame_times.append(runtime)
|
||||
if len(frame_times) >= 300:
|
||||
avg = sum(frame_times) / len(frame_times)
|
||||
print(f"Average: {avg:.2f}ms")
|
||||
print(f"Min: {min(frame_times):.2f}ms")
|
||||
print(f"Max: {max(frame_times):.2f}ms")
|
||||
print(f"FPS: {1000/avg:.1f}")
|
||||
timer.stop()
|
||||
sys.exit(0)
|
||||
|
||||
mcrfpy.Timer("benchmark", measure, 16)
|
||||
|
||||
benchmark()
|
||||
```
|
||||
|
||||
**Run:**
|
||||
```bash
|
||||
cd build
|
||||
./mcrogueface --exec ../tests/benchmark_mysystem.py
|
||||
```
|
||||
|
||||
For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Optimize - Make It Faster
|
||||
|
||||
### Strategy 1: Reduce Work
|
||||
|
||||
**Example: Dirty Flags (already implemented)**
|
||||
|
||||
Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
|
||||
|
||||
### Strategy 2: Reduce Complexity
|
||||
|
||||
**Example: Spatial queries**
|
||||
|
||||
Instead of O(n) search through all entities, use `entities_in_radius()`:
|
||||
|
||||
```python
|
||||
# O(1) spatial query
|
||||
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
|
||||
```
|
||||
|
||||
### Strategy 3: Batch Operations
|
||||
|
||||
**Minimize Python/C++ boundary crossings:**
|
||||
|
||||
```python
|
||||
# Less efficient: many individual layer.set() calls
|
||||
for x in range(100):
|
||||
for y in range(100):
|
||||
layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
|
||||
|
||||
# More efficient: single fill operation
|
||||
layer.fill(mcrfpy.Color(0, 0, 0, 0))
|
||||
```
|
||||
|
||||
### Strategy 4: Cache Results
|
||||
|
||||
**Example: Path caching**
|
||||
|
||||
```python
|
||||
cached_path = [None]
|
||||
last_target = [None]
|
||||
|
||||
def get_path_to(grid, start, target):
|
||||
if last_target[0] != target:
|
||||
cached_path[0] = grid.find_path(start, target)
|
||||
last_target[0] = target
|
||||
return cached_path[0]
|
||||
```
|
||||
|
||||
### Optimization Checklist
|
||||
|
||||
Before optimizing:
|
||||
- [ ] Profiled and identified real bottleneck
|
||||
- [ ] Measured baseline performance
|
||||
- [ ] Understood root cause
|
||||
|
||||
After optimization:
|
||||
- [ ] Measured improvement
|
||||
- [ ] Verified correctness
|
||||
- [ ] Updated tests if needed
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Verify - Measure Improvement
|
||||
|
||||
### Re-run Benchmarks
|
||||
|
||||
Compare before and after measurements.
|
||||
|
||||
### Check Correctness
|
||||
|
||||
**Visual testing:**
|
||||
1. Run game normally (not headless)
|
||||
2. Verify visual output unchanged
|
||||
3. Test edge cases
|
||||
|
||||
**Automated testing:**
|
||||
```bash
|
||||
cd build
|
||||
./mcrogueface --headless --exec ../tests/unit/my_test.py
|
||||
```
|
||||
|
||||
### Document Results
|
||||
|
||||
Add findings to the relevant Gitea issue with:
|
||||
- Baseline numbers
|
||||
- Optimized numbers
|
||||
- Improvement factor
|
||||
- Test script name
|
||||
- Commit hash
|
||||
|
||||
---
|
||||
|
||||
## When NOT to Optimize
|
||||
|
||||
**Don't optimize if:**
|
||||
- Performance is already acceptable (< 16ms frame time)
|
||||
- Optimization makes code significantly more complex
|
||||
- You haven't profiled yet (no guessing!)
|
||||
- The bottleneck is elsewhere
|
||||
|
||||
Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [[Performance-and-Profiling]] - Profiling tools reference
|
||||
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
|
||||
- [[Grid-System]] - Grid optimization opportunities
|
||||
# Performance Optimization Workflow
|
||||
|
||||
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
|
||||
|
||||
**Tools:**
|
||||
- F3 profiler overlay (in-game)
|
||||
- `src/Profiler.h` - ScopedTimer
|
||||
- Benchmark API (`mcrfpy.start_benchmark()`)
|
||||
|
||||
## The Optimization Cycle
|
||||
|
||||
```
|
||||
1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
|
||||
^ |
|
||||
+---------------------------------------------------------+
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Profile - Find the Bottleneck
|
||||
|
||||
### Using F3 Overlay
|
||||
|
||||
**Start the game and press F3:**
|
||||
|
||||
Look for:
|
||||
- **Red frame times** (>33ms) - Unacceptable performance
|
||||
- **Yellow frame times** (16-33ms) - Marginal performance
|
||||
- **High subsystem times** - Which system is slow?
|
||||
- Grid rendering > 10ms? Grid optimization needed
|
||||
- Entity rendering > 5ms? Entity culling needed
|
||||
- Python script time > 5ms? Python callback optimization
|
||||
|
||||
### Running Benchmarks
|
||||
|
||||
Use the benchmark API for detailed data capture:
|
||||
|
||||
```python
|
||||
import mcrfpy
|
||||
|
||||
mcrfpy.start_benchmark()
|
||||
|
||||
# ... run test scenario ...
|
||||
|
||||
filename = mcrfpy.end_benchmark()
|
||||
print(f"Benchmark saved to: {filename}")
|
||||
```
|
||||
|
||||
See [[Performance-and-Profiling]] for benchmark output format and analysis.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Identify - Understand the Problem
|
||||
|
||||
### Common Performance Issues
|
||||
|
||||
**Issue: High Grid Render Time on Static Screens**
|
||||
- **Symptom:** 20-40ms grid render, nothing changing
|
||||
- **Cause:** Redrawing unchanged cells every frame
|
||||
- **Status:** Solved by chunk-based dirty flag system
|
||||
|
||||
**Issue: High Entity Render Time with Many Entities**
|
||||
- **Symptom:** 10-20ms entity render with 500+ entities
|
||||
- **Cause:** O(n) iteration, no spatial indexing
|
||||
- **Solution:** [#115](../issues/115) SpatialHash (planned)
|
||||
|
||||
**Issue: Slow Bulk Grid Updates from Python**
|
||||
- **Symptom:** Frame drops when updating many cells
|
||||
- **Cause:** Python/C++ boundary crossings for each cell
|
||||
- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
|
||||
|
||||
**Issue: High Python Script Time**
|
||||
- **Symptom:** 10-50ms in Python callbacks
|
||||
- **Cause:** Heavy computation in Python update loops
|
||||
- **Solution:** Move hot paths to C++ or optimize Python
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Instrument - Measure Precisely
|
||||
|
||||
### Adding ScopedTimer (C++)
|
||||
|
||||
Wrap slow functions with timing:
|
||||
|
||||
```cpp
|
||||
#include "Profiler.h"
|
||||
|
||||
void MySystem::slowFunction() {
|
||||
ScopedTimer timer(Resources::game->metrics.mySystemTime);
|
||||
// ... code to measure ...
|
||||
}
|
||||
```
|
||||
|
||||
### Adding Custom Metrics (C++)
|
||||
|
||||
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
|
||||
2. Reset in `resetPerFrame()`
|
||||
3. Display in `src/ProfilerOverlay.cpp::update()`
|
||||
4. Instrument with ScopedTimer
|
||||
5. Rebuild and press F3
|
||||
|
||||
### Creating Python Benchmarks
|
||||
|
||||
```python
|
||||
import mcrfpy
|
||||
import sys
|
||||
|
||||
def benchmark():
|
||||
scene = mcrfpy.Scene("bench")
|
||||
grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
|
||||
scene.children.append(grid)
|
||||
mcrfpy.current_scene = scene
|
||||
|
||||
frame_times = []
|
||||
|
||||
def measure(timer, runtime):
|
||||
frame_times.append(runtime)
|
||||
if len(frame_times) >= 300:
|
||||
avg = sum(frame_times) / len(frame_times)
|
||||
print(f"Average: {avg:.2f}ms")
|
||||
print(f"Min: {min(frame_times):.2f}ms")
|
||||
print(f"Max: {max(frame_times):.2f}ms")
|
||||
print(f"FPS: {1000/avg:.1f}")
|
||||
timer.stop()
|
||||
sys.exit(0)
|
||||
|
||||
mcrfpy.Timer("benchmark", measure, 16)
|
||||
|
||||
benchmark()
|
||||
```
|
||||
|
||||
**Run:**
|
||||
```bash
|
||||
cd build
|
||||
./mcrogueface --exec ../tests/benchmark_mysystem.py
|
||||
```
|
||||
|
||||
For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Optimize - Make It Faster
|
||||
|
||||
### Strategy 1: Reduce Work
|
||||
|
||||
**Example: Dirty Flags (already implemented)**
|
||||
|
||||
Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
|
||||
|
||||
### Strategy 2: Reduce Complexity
|
||||
|
||||
**Example: Spatial queries**
|
||||
|
||||
Instead of O(n) search through all entities, use `entities_in_radius()`:
|
||||
|
||||
```python
|
||||
# O(1) spatial query
|
||||
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
|
||||
```
|
||||
|
||||
### Strategy 3: Batch Operations
|
||||
|
||||
**Minimize Python/C++ boundary crossings:**
|
||||
|
||||
```python
|
||||
# Less efficient: many individual layer.set() calls
|
||||
for x in range(100):
|
||||
for y in range(100):
|
||||
layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
|
||||
|
||||
# More efficient: single fill operation
|
||||
layer.fill(mcrfpy.Color(0, 0, 0, 0))
|
||||
```
|
||||
|
||||
### Strategy 4: Cache Results
|
||||
|
||||
**Example: Path caching**
|
||||
|
||||
```python
|
||||
cached_path = [None]
|
||||
last_target = [None]
|
||||
|
||||
def get_path_to(grid, start, target):
|
||||
if last_target[0] != target:
|
||||
cached_path[0] = grid.find_path(start, target)
|
||||
last_target[0] = target
|
||||
return cached_path[0]
|
||||
```
|
||||
|
||||
### Optimization Checklist
|
||||
|
||||
Before optimizing:
|
||||
- [ ] Profiled and identified real bottleneck
|
||||
- [ ] Measured baseline performance
|
||||
- [ ] Understood root cause
|
||||
|
||||
After optimization:
|
||||
- [ ] Measured improvement
|
||||
- [ ] Verified correctness
|
||||
- [ ] Updated tests if needed
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Verify - Measure Improvement
|
||||
|
||||
### Re-run Benchmarks
|
||||
|
||||
Compare before and after measurements.
|
||||
|
||||
### Check Correctness
|
||||
|
||||
**Visual testing:**
|
||||
1. Run game normally (not headless)
|
||||
2. Verify visual output unchanged
|
||||
3. Test edge cases
|
||||
|
||||
**Automated testing:**
|
||||
```bash
|
||||
cd build
|
||||
./mcrogueface --headless --exec ../tests/unit/my_test.py
|
||||
```
|
||||
|
||||
### Document Results
|
||||
|
||||
Add findings to the relevant Gitea issue with:
|
||||
- Baseline numbers
|
||||
- Optimized numbers
|
||||
- Improvement factor
|
||||
- Test script name
|
||||
- Commit hash
|
||||
|
||||
---
|
||||
|
||||
## When NOT to Optimize
|
||||
|
||||
**Don't optimize if:**
|
||||
- Performance is already acceptable (< 16ms frame time)
|
||||
- Optimization makes code significantly more complex
|
||||
- You haven't profiled yet (no guessing!)
|
||||
- The bottleneck is elsewhere
|
||||
|
||||
Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [[Performance-and-Profiling]] - Profiling tools reference
|
||||
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
|
||||
- [[Grid-System]] - Grid optimization opportunities
|
||||
- [[Writing-Tests]] - Creating performance tests
|
||||
Loading…
Add table
Add a link
Reference in a new issue