Update "Performance-Optimization-Workflow.-"
parent
e1d3504bf5
commit
070d5bb813
2 changed files with 255 additions and 494 deletions
255
Performance-Optimization-Workflow.-.md
Normal file
255
Performance-Optimization-Workflow.-.md
Normal file
|
|
@ -0,0 +1,255 @@
|
||||||
|
# Performance Optimization Workflow
|
||||||
|
|
||||||
|
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
|
||||||
|
|
||||||
|
## Quick Reference
|
||||||
|
|
||||||
|
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
|
||||||
|
|
||||||
|
**Tools:**
|
||||||
|
- F3 profiler overlay (in-game)
|
||||||
|
- `src/Profiler.h` - ScopedTimer
|
||||||
|
- Benchmark API (`mcrfpy.start_benchmark()`)
|
||||||
|
|
||||||
|
## The Optimization Cycle
|
||||||
|
|
||||||
|
```
|
||||||
|
1. PROFILE -> 2. IDENTIFY -> 3. INSTRUMENT -> 4. OPTIMIZE -> 5. VERIFY
|
||||||
|
^ |
|
||||||
|
+---------------------------------------------------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: Profile - Find the Bottleneck
|
||||||
|
|
||||||
|
### Using F3 Overlay
|
||||||
|
|
||||||
|
**Start the game and press F3:**
|
||||||
|
|
||||||
|
Look for:
|
||||||
|
- **Red frame times** (>33ms) - Unacceptable performance
|
||||||
|
- **Yellow frame times** (16-33ms) - Marginal performance
|
||||||
|
- **High subsystem times** - Which system is slow?
|
||||||
|
- Grid rendering > 10ms? Grid optimization needed
|
||||||
|
- Entity rendering > 5ms? Entity culling needed
|
||||||
|
- Python script time > 5ms? Python callback optimization
|
||||||
|
|
||||||
|
### Running Benchmarks
|
||||||
|
|
||||||
|
Use the benchmark API for detailed data capture:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import mcrfpy
|
||||||
|
|
||||||
|
mcrfpy.start_benchmark()
|
||||||
|
|
||||||
|
# ... run test scenario ...
|
||||||
|
|
||||||
|
filename = mcrfpy.end_benchmark()
|
||||||
|
print(f"Benchmark saved to: {filename}")
|
||||||
|
```
|
||||||
|
|
||||||
|
See [[Performance-and-Profiling]] for benchmark output format and analysis.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2: Identify - Understand the Problem
|
||||||
|
|
||||||
|
### Common Performance Issues
|
||||||
|
|
||||||
|
**Issue: High Grid Render Time on Static Screens**
|
||||||
|
- **Symptom:** 20-40ms grid render, nothing changing
|
||||||
|
- **Cause:** Redrawing unchanged cells every frame
|
||||||
|
- **Status:** Solved by chunk-based dirty flag system
|
||||||
|
|
||||||
|
**Issue: High Entity Render Time with Many Entities**
|
||||||
|
- **Symptom:** 10-20ms entity render with 500+ entities
|
||||||
|
- **Cause:** O(n) iteration, no spatial indexing
|
||||||
|
- **Solution:** [#115](../issues/115) SpatialHash (planned)
|
||||||
|
|
||||||
|
**Issue: Slow Bulk Grid Updates from Python**
|
||||||
|
- **Symptom:** Frame drops when updating many cells
|
||||||
|
- **Cause:** Python/C++ boundary crossings for each cell
|
||||||
|
- **Workaround:** Minimize individual `layer.set()` calls; use `layer.fill()` for uniform data
|
||||||
|
|
||||||
|
**Issue: High Python Script Time**
|
||||||
|
- **Symptom:** 10-50ms in Python callbacks
|
||||||
|
- **Cause:** Heavy computation in Python update loops
|
||||||
|
- **Solution:** Move hot paths to C++ or optimize Python
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3: Instrument - Measure Precisely
|
||||||
|
|
||||||
|
### Adding ScopedTimer (C++)
|
||||||
|
|
||||||
|
Wrap slow functions with timing:
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
#include "Profiler.h"
|
||||||
|
|
||||||
|
void MySystem::slowFunction() {
|
||||||
|
ScopedTimer timer(Resources::game->metrics.mySystemTime);
|
||||||
|
// ... code to measure ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding Custom Metrics (C++)
|
||||||
|
|
||||||
|
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
|
||||||
|
2. Reset in `resetPerFrame()`
|
||||||
|
3. Display in `src/ProfilerOverlay.cpp::update()`
|
||||||
|
4. Instrument with ScopedTimer
|
||||||
|
5. Rebuild and press F3
|
||||||
|
|
||||||
|
### Creating Python Benchmarks
|
||||||
|
|
||||||
|
```python
|
||||||
|
import mcrfpy
|
||||||
|
import sys
|
||||||
|
|
||||||
|
def benchmark():
|
||||||
|
scene = mcrfpy.Scene("bench")
|
||||||
|
grid = mcrfpy.Grid(grid_size=(100, 100), pos=(0, 0), size=(800, 600))
|
||||||
|
scene.children.append(grid)
|
||||||
|
mcrfpy.current_scene = scene
|
||||||
|
|
||||||
|
frame_times = []
|
||||||
|
|
||||||
|
def measure(timer, runtime):
|
||||||
|
frame_times.append(runtime)
|
||||||
|
if len(frame_times) >= 300:
|
||||||
|
avg = sum(frame_times) / len(frame_times)
|
||||||
|
print(f"Average: {avg:.2f}ms")
|
||||||
|
print(f"Min: {min(frame_times):.2f}ms")
|
||||||
|
print(f"Max: {max(frame_times):.2f}ms")
|
||||||
|
print(f"FPS: {1000/avg:.1f}")
|
||||||
|
timer.stop()
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
mcrfpy.Timer("benchmark", measure, 16)
|
||||||
|
|
||||||
|
benchmark()
|
||||||
|
```
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```bash
|
||||||
|
cd build
|
||||||
|
./mcrogueface --exec ../tests/benchmark_mysystem.py
|
||||||
|
```
|
||||||
|
|
||||||
|
For headless benchmarks, use Python's `time` module instead of the Timer API since `step()` bypasses the game loop.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4: Optimize - Make It Faster
|
||||||
|
|
||||||
|
### Strategy 1: Reduce Work
|
||||||
|
|
||||||
|
**Example: Dirty Flags (already implemented)**
|
||||||
|
|
||||||
|
Only redraw when content changes. The chunk-based caching system handles this automatically for grid layers.
|
||||||
|
|
||||||
|
### Strategy 2: Reduce Complexity
|
||||||
|
|
||||||
|
**Example: Spatial queries**
|
||||||
|
|
||||||
|
Instead of O(n) search through all entities, use `entities_in_radius()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# O(1) spatial query
|
||||||
|
nearby = grid.entities_in_radius((target_x, target_y), radius=5.0)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Strategy 3: Batch Operations
|
||||||
|
|
||||||
|
**Minimize Python/C++ boundary crossings:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Less efficient: many individual layer.set() calls
|
||||||
|
for x in range(100):
|
||||||
|
for y in range(100):
|
||||||
|
layer.set((x, y), mcrfpy.Color(0, 0, 0, 0))
|
||||||
|
|
||||||
|
# More efficient: single fill operation
|
||||||
|
layer.fill(mcrfpy.Color(0, 0, 0, 0))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Strategy 4: Cache Results
|
||||||
|
|
||||||
|
**Example: Path caching**
|
||||||
|
|
||||||
|
```python
|
||||||
|
cached_path = [None]
|
||||||
|
last_target = [None]
|
||||||
|
|
||||||
|
def get_path_to(grid, start, target):
|
||||||
|
if last_target[0] != target:
|
||||||
|
cached_path[0] = grid.find_path(start, target)
|
||||||
|
last_target[0] = target
|
||||||
|
return cached_path[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Optimization Checklist
|
||||||
|
|
||||||
|
Before optimizing:
|
||||||
|
- [ ] Profiled and identified real bottleneck
|
||||||
|
- [ ] Measured baseline performance
|
||||||
|
- [ ] Understood root cause
|
||||||
|
|
||||||
|
After optimization:
|
||||||
|
- [ ] Measured improvement
|
||||||
|
- [ ] Verified correctness
|
||||||
|
- [ ] Updated tests if needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5: Verify - Measure Improvement
|
||||||
|
|
||||||
|
### Re-run Benchmarks
|
||||||
|
|
||||||
|
Compare before and after measurements.
|
||||||
|
|
||||||
|
### Check Correctness
|
||||||
|
|
||||||
|
**Visual testing:**
|
||||||
|
1. Run game normally (not headless)
|
||||||
|
2. Verify visual output unchanged
|
||||||
|
3. Test edge cases
|
||||||
|
|
||||||
|
**Automated testing:**
|
||||||
|
```bash
|
||||||
|
cd build
|
||||||
|
./mcrogueface --headless --exec ../tests/unit/my_test.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Document Results
|
||||||
|
|
||||||
|
Add findings to the relevant Gitea issue with:
|
||||||
|
- Baseline numbers
|
||||||
|
- Optimized numbers
|
||||||
|
- Improvement factor
|
||||||
|
- Test script name
|
||||||
|
- Commit hash
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When NOT to Optimize
|
||||||
|
|
||||||
|
**Don't optimize if:**
|
||||||
|
- Performance is already acceptable (< 16ms frame time)
|
||||||
|
- Optimization makes code significantly more complex
|
||||||
|
- You haven't profiled yet (no guessing!)
|
||||||
|
- The bottleneck is elsewhere
|
||||||
|
|
||||||
|
Focus on correctness first, then profile to find real bottlenecks, and optimize only the hot paths.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
- [[Performance-and-Profiling]] - Profiling tools reference
|
||||||
|
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
|
||||||
|
- [[Grid-System]] - Grid optimization opportunities
|
||||||
|
- [[Writing-Tests]] - Creating performance tests
|
||||||
|
|
@ -1,494 +0,0 @@
|
||||||
# Performance Optimization Workflow
|
|
||||||
|
|
||||||
Systematic approach to identifying and resolving performance bottlenecks in McRogueFace.
|
|
||||||
|
|
||||||
## Quick Reference
|
|
||||||
|
|
||||||
**Related Systems:** [[Performance-and-Profiling]], [[Grid-System]]
|
|
||||||
|
|
||||||
**Tools:**
|
|
||||||
- F3 profiler overlay (in-game)
|
|
||||||
- `src/Profiler.h` - ScopedTimer
|
|
||||||
- `tests/benchmark_*.py` - Performance benchmarks
|
|
||||||
|
|
||||||
## The Optimization Cycle
|
|
||||||
|
|
||||||
```
|
|
||||||
1. PROFILE → 2. IDENTIFY → 3. INSTRUMENT → 4. OPTIMIZE → 5. VERIFY
|
|
||||||
↑ |
|
|
||||||
└─────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 1: Profile - Find the Bottleneck
|
|
||||||
|
|
||||||
### Using F3 Overlay
|
|
||||||
|
|
||||||
**Start the game and press F3:**
|
|
||||||
|
|
||||||
Look for:
|
|
||||||
- **Red frame times** (>33ms) - Unacceptable performance
|
|
||||||
- **Yellow frame times** (16-33ms) - Marginal performance
|
|
||||||
- **High subsystem times** - Which system is slow?
|
|
||||||
- Grid rendering > 10ms? → Grid optimization needed
|
|
||||||
- Entity rendering > 5ms? → Entity culling/SpatialHash needed
|
|
||||||
- Python script time > 5ms? → Python callback optimization
|
|
||||||
|
|
||||||
**Example profiler output:**
|
|
||||||
```
|
|
||||||
Frame: 45.2ms (RED)
|
|
||||||
FPS: 22
|
|
||||||
|
|
||||||
Grid Render: 32.1ms ← BOTTLENECK!
|
|
||||||
Entity Render: 8.5ms
|
|
||||||
Python Script: 2.1ms
|
|
||||||
Animation: 1.2ms
|
|
||||||
|
|
||||||
Cells Rendered: 10000
|
|
||||||
Entities: 150 (visible: 50)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Analysis:** Grid rendering is the bottleneck (32ms of 45ms total).
|
|
||||||
|
|
||||||
### Running Benchmarks
|
|
||||||
|
|
||||||
**Static Grid Benchmark:**
|
|
||||||
```bash
|
|
||||||
cd build
|
|
||||||
./mcrogueface --headless --exec ../tests/benchmark_static_grid.py
|
|
||||||
```
|
|
||||||
|
|
||||||
**Moving Entities Benchmark:**
|
|
||||||
```bash
|
|
||||||
./mcrogueface --headless --exec ../tests/benchmark_moving_entities.py
|
|
||||||
```
|
|
||||||
|
|
||||||
**Output:** Baseline performance metrics to measure improvement against.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 2: Identify - Understand the Problem
|
|
||||||
|
|
||||||
### Common Performance Issues
|
|
||||||
|
|
||||||
**Issue: High Grid Render Time on Static Screens**
|
|
||||||
- **Symptom:** 20-40ms grid render, nothing changing
|
|
||||||
- **Cause:** Redrawing unchanged cells every frame
|
|
||||||
- **Solution:** Implement dirty flag system ([#116](../issues/116))
|
|
||||||
|
|
||||||
**Issue: High Entity Render Time with Many Entities**
|
|
||||||
- **Symptom:** 10-20ms entity render with 500+ entities
|
|
||||||
- **Cause:** O(n) iteration, no spatial indexing
|
|
||||||
- **Solution:** Implement SpatialHash ([#115](../issues/115))
|
|
||||||
|
|
||||||
**Issue: Slow Bulk Grid Updates from Python**
|
|
||||||
- **Symptom:** Frame drops when updating many cells
|
|
||||||
- **Cause:** Python/C++ boundary crossings for each cell
|
|
||||||
- **Solution:** Implement batch operations ([#113](../issues/113))
|
|
||||||
|
|
||||||
**Issue: High Python Script Time**
|
|
||||||
- **Symptom:** 10-50ms in Python callbacks
|
|
||||||
- **Cause:** Heavy computation in Python update loops
|
|
||||||
- **Solution:** Move hot paths to C++ or optimize Python
|
|
||||||
|
|
||||||
### Analyzing Call Stacks
|
|
||||||
|
|
||||||
**When F3 overlay isn't enough:**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Build with debug symbols
|
|
||||||
make clean
|
|
||||||
cmake .. -DCMAKE_BUILD_TYPE=Debug
|
|
||||||
make
|
|
||||||
|
|
||||||
# Profile with gdb
|
|
||||||
gdb ./mcrogueface
|
|
||||||
(gdb) run
|
|
||||||
<trigger slow behavior>
|
|
||||||
(gdb) info threads
|
|
||||||
(gdb) bt # Backtrace
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 3: Instrument - Measure Precisely
|
|
||||||
|
|
||||||
### Adding ScopedTimer
|
|
||||||
|
|
||||||
**Identify the slow function and wrap it:**
|
|
||||||
|
|
||||||
```cpp
|
|
||||||
#include "Profiler.h"
|
|
||||||
|
|
||||||
void MySystem::slowFunction() {
|
|
||||||
// Add timer
|
|
||||||
ScopedTimer timer(Resources::game->metrics.mySystemTime);
|
|
||||||
|
|
||||||
// Your slow code here
|
|
||||||
for (auto& item : items) {
|
|
||||||
item->process();
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Adding Custom Metrics
|
|
||||||
|
|
||||||
**1. Add metric field to ProfilingMetrics:**
|
|
||||||
|
|
||||||
`src/GameEngine.h`:
|
|
||||||
```cpp
|
|
||||||
struct ProfilingMetrics {
|
|
||||||
// ... existing fields ...
|
|
||||||
float mySystemTime = 0.0f; // Add this
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
**2. Reset in resetPerFrame():**
|
|
||||||
|
|
||||||
```cpp
|
|
||||||
void ProfilingMetrics::resetPerFrame() {
|
|
||||||
// ... existing resets ...
|
|
||||||
mySystemTime = 0.0f; // Add this
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**3. Display in ProfilerOverlay:**
|
|
||||||
|
|
||||||
`src/ProfilerOverlay.cpp`:
|
|
||||||
```cpp
|
|
||||||
void ProfilerOverlay::update(const ProfilingMetrics& metrics) {
|
|
||||||
// ... existing formatting ...
|
|
||||||
ss << "My System: " << formatFloat(metrics.mySystemTime) << "ms\n";
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**4. Rebuild and test:**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
make
|
|
||||||
cd build
|
|
||||||
./mcrogueface
|
|
||||||
# Press F3 - see your metric!
|
|
||||||
```
|
|
||||||
|
|
||||||
### Creating Benchmarks
|
|
||||||
|
|
||||||
**Create `tests/benchmark_mysystem.py`:**
|
|
||||||
|
|
||||||
```python
|
|
||||||
import mcrfpy
|
|
||||||
import sys
|
|
||||||
import time
|
|
||||||
|
|
||||||
def benchmark():
|
|
||||||
# Setup
|
|
||||||
mcrfpy.createScene("bench")
|
|
||||||
# ... create test scenario ...
|
|
||||||
|
|
||||||
frame_times = []
|
|
||||||
|
|
||||||
def measure(runtime_ms):
|
|
||||||
frame_times.append(runtime_ms)
|
|
||||||
if len(frame_times) >= 300: # 5 seconds at 60fps
|
|
||||||
# Report statistics
|
|
||||||
avg = sum(frame_times) / len(frame_times)
|
|
||||||
min_time = min(frame_times)
|
|
||||||
max_time = max(frame_times)
|
|
||||||
|
|
||||||
print(f"Average: {avg:.2f}ms")
|
|
||||||
print(f"Min: {min_time:.2f}ms")
|
|
||||||
print(f"Max: {max_time:.2f}ms")
|
|
||||||
print(f"FPS: {1000/avg:.1f}")
|
|
||||||
|
|
||||||
sys.exit(0)
|
|
||||||
|
|
||||||
mcrfpy.setTimer("benchmark", measure, 16) # Every frame
|
|
||||||
|
|
||||||
benchmark()
|
|
||||||
```
|
|
||||||
|
|
||||||
**Run:**
|
|
||||||
```bash
|
|
||||||
./mcrogueface --headless --exec ../tests/benchmark_mysystem.py
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 4: Optimize - Make It Faster
|
|
||||||
|
|
||||||
### Optimization Strategies
|
|
||||||
|
|
||||||
#### Strategy 1: Reduce Work
|
|
||||||
|
|
||||||
**Example: Grid Dirty Flags**
|
|
||||||
|
|
||||||
**Before:** Redraw all cells every frame
|
|
||||||
```cpp
|
|
||||||
for (int x = 0; x < grid_x; x++) {
|
|
||||||
for (int y = 0; y < grid_y; y++) {
|
|
||||||
renderCell(x, y); // Always renders
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**After:** Only redraw when changed
|
|
||||||
```cpp
|
|
||||||
if (grid_dirty) {
|
|
||||||
for (int x = 0; x < grid_x; x++) {
|
|
||||||
for (int y = 0; y < grid_y; y++) {
|
|
||||||
renderCell(x, y);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
grid_dirty = false;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Expected:** 10-50x improvement for static scenes
|
|
||||||
|
|
||||||
#### Strategy 2: Reduce Complexity
|
|
||||||
|
|
||||||
**Example: SpatialHash for Entity Queries**
|
|
||||||
|
|
||||||
**Before:** O(n) search through all entities
|
|
||||||
```cpp
|
|
||||||
for (auto& entity : entities) {
|
|
||||||
if (distanceTo(entity, target) < radius) {
|
|
||||||
nearby.push_back(entity);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**After:** O(1) hash lookup
|
|
||||||
```cpp
|
|
||||||
auto cell = spatialHash.getCell(target.x, target.y);
|
|
||||||
for (auto& entity : cell.entities) {
|
|
||||||
nearby.push_back(entity);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Expected:** 100x+ improvement for large entity counts
|
|
||||||
|
|
||||||
#### Strategy 3: Batch Operations
|
|
||||||
|
|
||||||
**Example: Grid Batch Updates**
|
|
||||||
|
|
||||||
**Before:** Multiple Python/C++ crossings
|
|
||||||
```python
|
|
||||||
for x in range(100):
|
|
||||||
for y in range(100):
|
|
||||||
grid.at((x, y)).tilesprite = 42 # 10,000 calls!
|
|
||||||
```
|
|
||||||
|
|
||||||
**After:** Single batch operation
|
|
||||||
```python
|
|
||||||
grid.fill_rect(0, 0, 100, 100, 42) # 1 call!
|
|
||||||
```
|
|
||||||
|
|
||||||
**Expected:** 10-100x improvement for bulk updates
|
|
||||||
|
|
||||||
#### Strategy 4: Cache Results
|
|
||||||
|
|
||||||
**Example: Path Caching in Entities**
|
|
||||||
|
|
||||||
**Before:** Recompute path every frame
|
|
||||||
```cpp
|
|
||||||
void Entity::update() {
|
|
||||||
path = computePathTo(target); // Expensive!
|
|
||||||
followPath(path);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**After:** Cache and reuse
|
|
||||||
```cpp
|
|
||||||
void Entity::update() {
|
|
||||||
if (!cachedPath || targetMoved) {
|
|
||||||
cachedPath = computePathTo(target);
|
|
||||||
}
|
|
||||||
followPath(cachedPath);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Expected:** 10x+ improvement for pathfinding-heavy scenarios
|
|
||||||
|
|
||||||
### Optimization Checklist
|
|
||||||
|
|
||||||
Before optimizing:
|
|
||||||
- [ ] Profiled and identified real bottleneck
|
|
||||||
- [ ] Measured baseline performance
|
|
||||||
- [ ] Understood root cause
|
|
||||||
|
|
||||||
During optimization:
|
|
||||||
- [ ] Changed only one thing at a time
|
|
||||||
- [ ] Kept original code for comparison
|
|
||||||
- [ ] Added comments explaining optimization
|
|
||||||
|
|
||||||
After optimization:
|
|
||||||
- [ ] Measured improvement
|
|
||||||
- [ ] Verified correctness (no bugs introduced)
|
|
||||||
- [ ] Updated tests if needed
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 5: Verify - Measure Improvement
|
|
||||||
|
|
||||||
### Re-run Benchmarks
|
|
||||||
|
|
||||||
**Before optimization:**
|
|
||||||
```
|
|
||||||
Average: 45.2ms
|
|
||||||
Min: 38.1ms
|
|
||||||
Max: 62.3ms
|
|
||||||
FPS: 22.1
|
|
||||||
```
|
|
||||||
|
|
||||||
**After optimization:**
|
|
||||||
```
|
|
||||||
Average: 8.5ms ← 5.3x improvement!
|
|
||||||
Min: 7.2ms
|
|
||||||
Max: 12.1ms
|
|
||||||
FPS: 117.6
|
|
||||||
```
|
|
||||||
|
|
||||||
### Check Correctness
|
|
||||||
|
|
||||||
**Visual testing:**
|
|
||||||
1. Run game normally (not headless)
|
|
||||||
2. Verify visual output unchanged
|
|
||||||
3. Test edge cases (empty grids, max entities, etc.)
|
|
||||||
|
|
||||||
**Automated testing:**
|
|
||||||
```bash
|
|
||||||
# Run existing test suite
|
|
||||||
./mcrogueface --headless --exec tests/test_grid_operations.py
|
|
||||||
./mcrogueface --headless --exec tests/test_entity_movement.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### Document Results
|
|
||||||
|
|
||||||
**Create issue comment:**
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Performance Optimization Results
|
|
||||||
|
|
||||||
**Issue:** #116 (Dirty Flag System)
|
|
||||||
|
|
||||||
**Baseline:**
|
|
||||||
- Static grid (100x100): 32.1ms average
|
|
||||||
- FPS: 22
|
|
||||||
|
|
||||||
**After Optimization:**
|
|
||||||
- Static grid (100x100): 0.8ms average
|
|
||||||
- FPS: 118
|
|
||||||
|
|
||||||
**Improvement:** 40x faster for static scenes
|
|
||||||
|
|
||||||
**Test:** `tests/benchmark_static_grid.py`
|
|
||||||
|
|
||||||
**Commit:** abc123def
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Common Optimization Patterns
|
|
||||||
|
|
||||||
### Pattern 1: Early Exit
|
|
||||||
|
|
||||||
```cpp
|
|
||||||
// Check cheap conditions first
|
|
||||||
if (!visible) return;
|
|
||||||
if (opacity <= 0.0f) return;
|
|
||||||
if (!inViewport(bounds)) return;
|
|
||||||
|
|
||||||
// Then do expensive work
|
|
||||||
render();
|
|
||||||
```
|
|
||||||
|
|
||||||
### Pattern 2: Lazy Evaluation
|
|
||||||
|
|
||||||
```cpp
|
|
||||||
// Don't compute until needed
|
|
||||||
mutable bool fovComputed = false;
|
|
||||||
mutable std::vector<Point> visibleCells;
|
|
||||||
|
|
||||||
std::vector<Point>& getVisibleCells() {
|
|
||||||
if (!fovComputed) {
|
|
||||||
visibleCells = computeFOV();
|
|
||||||
fovComputed = true;
|
|
||||||
}
|
|
||||||
return visibleCells;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Pattern 3: Object Pooling
|
|
||||||
|
|
||||||
```cpp
|
|
||||||
// Reuse instead of allocate/deallocate
|
|
||||||
class EntityPool {
|
|
||||||
std::vector<Entity> pool;
|
|
||||||
std::vector<bool> active;
|
|
||||||
|
|
||||||
public:
|
|
||||||
Entity* spawn() {
|
|
||||||
for (size_t i = 0; i < pool.size(); i++) {
|
|
||||||
if (!active[i]) {
|
|
||||||
active[i] = true;
|
|
||||||
return &pool[i];
|
|
||||||
}
|
|
||||||
}
|
|
||||||
// Grow pool if needed
|
|
||||||
pool.emplace_back();
|
|
||||||
active.push_back(true);
|
|
||||||
return &pool.back();
|
|
||||||
}
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
### Pattern 4: Space-Time Tradeoff
|
|
||||||
|
|
||||||
```cpp
|
|
||||||
// Cache expensive computation
|
|
||||||
std::unordered_map<int, std::vector<Point>> pathCache;
|
|
||||||
|
|
||||||
std::vector<Point> getPathTo(int targetId) {
|
|
||||||
if (pathCache.contains(targetId)) {
|
|
||||||
return pathCache[targetId]; // O(1) lookup
|
|
||||||
}
|
|
||||||
|
|
||||||
auto path = computeExpensivePath(targetId);
|
|
||||||
pathCache[targetId] = path;
|
|
||||||
return path;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## When NOT to Optimize
|
|
||||||
|
|
||||||
**Don't optimize if:**
|
|
||||||
- Performance is already acceptable (< 16ms frame time)
|
|
||||||
- Optimization makes code significantly more complex
|
|
||||||
- You haven't profiled yet (no guessing!)
|
|
||||||
- The bottleneck is elsewhere (optimize hot paths first)
|
|
||||||
|
|
||||||
**Premature optimization is the root of all evil** - Donald Knuth
|
|
||||||
|
|
||||||
Focus on:
|
|
||||||
1. Correctness first
|
|
||||||
2. Profile to find real bottlenecks
|
|
||||||
3. Optimize hot paths only
|
|
||||||
4. Keep code maintainable
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Related Documentation
|
|
||||||
|
|
||||||
- [[Performance-and-Profiling]] - Profiling tools reference
|
|
||||||
- [[Grid-System]] - Grid optimization opportunities
|
|
||||||
- [[Writing-Tests]] - Creating performance tests
|
|
||||||
|
|
||||||
**Open Issues:**
|
|
||||||
- [#115](../issues/115) - SpatialHash Implementation
|
|
||||||
- [#116](../issues/116) - Dirty Flag System
|
|
||||||
- [#113](../issues/113) - Batch Operations for Grid
|
|
||||||
- [#117](../issues/117) - Memory Pool for Entities
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue