Update Performance and Profiling
parent
6beb3ee39f
commit
0a8c349bae
1 changed files with 224 additions and 224 deletions
|
|
@ -1,225 +1,225 @@
|
||||||
# Performance and Profiling
|
# Performance and Profiling
|
||||||
|
|
||||||
Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
|
Performance monitoring and optimization infrastructure for McRogueFace. Press F3 in-game to see real-time metrics, or use the benchmark API to capture detailed timing data to disk.
|
||||||
|
|
||||||
## Quick Reference
|
## Quick Reference
|
||||||
|
|
||||||
**Related Issues:**
|
**Related Issues:**
|
||||||
- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
|
- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
|
||||||
- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
|
- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
|
||||||
- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
|
- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
|
||||||
- [#115](../issues/115) - SpatialHash Implementation (Open)
|
- [#115](../issues/115) - SpatialHash Implementation (Open)
|
||||||
- [#113](../issues/113) - Batch Operations for Grid (Open)
|
- [#113](../issues/113) - Batch Operations for Grid (Open)
|
||||||
|
|
||||||
**Key Files:**
|
**Key Files:**
|
||||||
- `src/Profiler.h` - ScopedTimer RAII helper
|
- `src/Profiler.h` - ScopedTimer RAII helper
|
||||||
- `src/ProfilerOverlay.cpp` - F3 overlay visualization
|
- `src/ProfilerOverlay.cpp` - F3 overlay visualization
|
||||||
- `src/GameEngine.h` - ProfilingMetrics struct
|
- `src/GameEngine.h` - ProfilingMetrics struct
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Benchmark API
|
## Benchmark API
|
||||||
|
|
||||||
The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward.
|
The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward.
|
||||||
|
|
||||||
### Basic Usage
|
### Basic Usage
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import mcrfpy
|
import mcrfpy
|
||||||
|
|
||||||
# Start capturing benchmark data
|
# Start capturing benchmark data
|
||||||
mcrfpy.start_benchmark()
|
mcrfpy.start_benchmark()
|
||||||
|
|
||||||
# ... run your test scenario ...
|
# ... run your test scenario ...
|
||||||
|
|
||||||
# Stop and get the output filename
|
# Stop and get the output filename
|
||||||
filename = mcrfpy.end_benchmark()
|
filename = mcrfpy.end_benchmark()
|
||||||
print(f"Benchmark saved to: {filename}")
|
print(f"Benchmark saved to: {filename}")
|
||||||
```
|
```
|
||||||
|
|
||||||
### Adding Log Messages
|
### Adding Log Messages
|
||||||
|
|
||||||
Mark specific events within the benchmark:
|
Mark specific events within the benchmark:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
mcrfpy.start_benchmark()
|
mcrfpy.start_benchmark()
|
||||||
|
|
||||||
# Your code...
|
# Your code...
|
||||||
mcrfpy.log_benchmark("Player spawned")
|
mcrfpy.log_benchmark("Player spawned")
|
||||||
|
|
||||||
# More code...
|
# More code...
|
||||||
mcrfpy.log_benchmark("Combat started")
|
mcrfpy.log_benchmark("Combat started")
|
||||||
|
|
||||||
filename = mcrfpy.end_benchmark()
|
filename = mcrfpy.end_benchmark()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Headless Mode Note
|
### Headless Mode Note
|
||||||
|
|
||||||
In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module:
|
In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import time
|
import time
|
||||||
|
|
||||||
start = time.perf_counter()
|
start = time.perf_counter()
|
||||||
# ... operation to measure ...
|
# ... operation to measure ...
|
||||||
elapsed = time.perf_counter() - start
|
elapsed = time.perf_counter() - start
|
||||||
print(f"Operation took {elapsed*1000:.2f}ms")
|
print(f"Operation took {elapsed*1000:.2f}ms")
|
||||||
```
|
```
|
||||||
|
|
||||||
The benchmark API works best with the normal game loop (non-headless mode).
|
The benchmark API works best with the normal game loop (non-headless mode).
|
||||||
|
|
||||||
### Output Format
|
### Output Format
|
||||||
|
|
||||||
The JSON file contains per-frame data:
|
The JSON file contains per-frame data:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"frames": [
|
"frames": [
|
||||||
{
|
{
|
||||||
"frame_number": 1,
|
"frame_number": 1,
|
||||||
"frame_time_ms": 12.5,
|
"frame_time_ms": 12.5,
|
||||||
"grid_render_time_ms": 8.2,
|
"grid_render_time_ms": 8.2,
|
||||||
"entity_render_time_ms": 2.1,
|
"entity_render_time_ms": 2.1,
|
||||||
"python_time_ms": 1.8,
|
"python_time_ms": 1.8,
|
||||||
"logs": ["Player spawned"]
|
"logs": ["Player spawned"]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"summary": {
|
"summary": {
|
||||||
"total_frames": 1000,
|
"total_frames": 1000,
|
||||||
"avg_frame_time_ms": 14.2,
|
"avg_frame_time_ms": 14.2,
|
||||||
"max_frame_time_ms": 28.5,
|
"max_frame_time_ms": 28.5,
|
||||||
"min_frame_time_ms": 8.1
|
"min_frame_time_ms": 8.1
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Processing Results
|
### Processing Results
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import json
|
import json
|
||||||
|
|
||||||
def analyze_benchmark(filename):
|
def analyze_benchmark(filename):
|
||||||
with open(filename) as f:
|
with open(filename) as f:
|
||||||
data = json.load(f)
|
data = json.load(f)
|
||||||
|
|
||||||
frames = data["frames"]
|
frames = data["frames"]
|
||||||
slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
|
slow_frames = [f for f in frames if f["frame_time_ms"] > 16.67]
|
||||||
|
|
||||||
print(f"Total frames: {len(frames)}")
|
print(f"Total frames: {len(frames)}")
|
||||||
print(f"Slow frames (>16.67ms): {len(slow_frames)}")
|
print(f"Slow frames (>16.67ms): {len(slow_frames)}")
|
||||||
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
|
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
|
||||||
|
|
||||||
for frame in slow_frames[:5]:
|
for frame in slow_frames[:5]:
|
||||||
print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
|
print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
|
||||||
if frame.get("logs"):
|
if frame.get("logs"):
|
||||||
print(f" Logs: {frame['logs']}")
|
print(f" Logs: {frame['logs']}")
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## F3 Profiler Overlay
|
## F3 Profiler Overlay
|
||||||
|
|
||||||
**Activation:** Press F3 during gameplay
|
**Activation:** Press F3 during gameplay
|
||||||
|
|
||||||
**Displays:**
|
**Displays:**
|
||||||
- Frame time (ms) with color coding:
|
- Frame time (ms) with color coding:
|
||||||
- Green: < 16ms (60+ FPS)
|
- Green: < 16ms (60+ FPS)
|
||||||
- Yellow: 16-33ms (30-60 FPS)
|
- Yellow: 16-33ms (30-60 FPS)
|
||||||
- Red: > 33ms (< 30 FPS)
|
- Red: > 33ms (< 30 FPS)
|
||||||
- FPS (averaged over 60 frames)
|
- FPS (averaged over 60 frames)
|
||||||
- Detailed breakdowns:
|
- Detailed breakdowns:
|
||||||
- Grid rendering time
|
- Grid rendering time
|
||||||
- Entity rendering time
|
- Entity rendering time
|
||||||
- Python script time
|
- Python script time
|
||||||
- Animation update time
|
- Animation update time
|
||||||
- Per-frame counts:
|
- Per-frame counts:
|
||||||
- Grid cells rendered
|
- Grid cells rendered
|
||||||
- Entities rendered (visible/total)
|
- Entities rendered (visible/total)
|
||||||
|
|
||||||
**Implementation:** `src/ProfilerOverlay.cpp`
|
**Implementation:** `src/ProfilerOverlay.cpp`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Current Performance
|
## Current Performance
|
||||||
|
|
||||||
### Implemented Optimizations
|
### Implemented Optimizations
|
||||||
|
|
||||||
**Chunk-based Rendering** ([#123](../issues/123)):
|
**Chunk-based Rendering** ([#123](../issues/123)):
|
||||||
- Large grids divided into chunks
|
- Large grids divided into chunks
|
||||||
- Only visible chunks processed
|
- Only visible chunks processed
|
||||||
- 1000x1000+ grids render efficiently
|
- 1000x1000+ grids render efficiently
|
||||||
|
|
||||||
**Dirty Flag Caching** ([#148](../issues/148)):
|
**Dirty Flag Caching** ([#148](../issues/148)):
|
||||||
- Layers track changes per-chunk
|
- Layers track changes per-chunk
|
||||||
- Unchanged chunks reuse cached RenderTexture
|
- Unchanged chunks reuse cached RenderTexture
|
||||||
- Static scenes: near-zero CPU cost after initial render
|
- Static scenes: near-zero CPU cost after initial render
|
||||||
|
|
||||||
**Viewport Culling:**
|
**Viewport Culling:**
|
||||||
- Only cells/entities within viewport processed
|
- Only cells/entities within viewport processed
|
||||||
- Camera position and zoom respected
|
- Camera position and zoom respected
|
||||||
|
|
||||||
### Current Bottlenecks
|
### Current Bottlenecks
|
||||||
|
|
||||||
**Entity Spatial Queries** - O(n) iteration for large counts:
|
**Entity Spatial Queries** - O(n) iteration for large counts:
|
||||||
- Use `grid.entities_in_radius()` for proximity queries
|
- Use `grid.entities_in_radius()` for proximity queries
|
||||||
- SpatialHash planned in [#115](../issues/115)
|
- SpatialHash planned in [#115](../issues/115)
|
||||||
|
|
||||||
**Bulk Grid Updates** - Python/C++ boundary:
|
**Bulk Grid Updates** - Python/C++ boundary:
|
||||||
- Many individual `layer.set()` calls are slower than batch operations
|
- Many individual `layer.set()` calls are slower than batch operations
|
||||||
- Use `layer.fill()` for uniform values
|
- Use `layer.fill()` for uniform values
|
||||||
- Batch operations planned in [#113](../issues/113)
|
- Batch operations planned in [#113](../issues/113)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Optimization Workflow
|
## Optimization Workflow
|
||||||
|
|
||||||
1. **Profile**: Press F3, identify which metric is high
|
1. **Profile**: Press F3, identify which metric is high
|
||||||
2. **Benchmark**: Use `start_benchmark()` to capture detailed data
|
2. **Benchmark**: Use `start_benchmark()` to capture detailed data
|
||||||
3. **Analyze**: Process JSON to find patterns in slow frames
|
3. **Analyze**: Process JSON to find patterns in slow frames
|
||||||
4. **Optimize**: Make targeted changes
|
4. **Optimize**: Make targeted changes
|
||||||
5. **Verify**: Re-run benchmark, compare results
|
5. **Verify**: Re-run benchmark, compare results
|
||||||
|
|
||||||
See [[Performance-Optimization-Workflow]] for the full methodology.
|
See [[Performance-Optimization-Workflow]] for the full methodology.
|
||||||
|
|
||||||
### Performance Targets
|
### Performance Targets
|
||||||
|
|
||||||
| Metric | Target | Notes |
|
| Metric | Target | Notes |
|
||||||
|--------|--------|-------|
|
|--------|--------|-------|
|
||||||
| Frame time | < 16.67ms | 60 FPS |
|
| Frame time | < 16.67ms | 60 FPS |
|
||||||
| Grid render | < 5ms | For typical game grids |
|
| Grid render | < 5ms | For typical game grids |
|
||||||
| Entity render | < 2ms | For < 200 entities |
|
| Entity render | < 2ms | For < 200 entities |
|
||||||
| Python callbacks | < 2ms | Keep logic light |
|
| Python callbacks | < 2ms | Keep logic light |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## C++ Instrumentation
|
## C++ Instrumentation
|
||||||
|
|
||||||
### ScopedTimer (RAII)
|
### ScopedTimer (RAII)
|
||||||
|
|
||||||
Automatic timing for code blocks:
|
Automatic timing for code blocks:
|
||||||
|
|
||||||
```cpp
|
```cpp
|
||||||
#include "Profiler.h"
|
#include "Profiler.h"
|
||||||
|
|
||||||
void expensiveFunction() {
|
void expensiveFunction() {
|
||||||
ScopedTimer timer(Resources::game->metrics.functionTime);
|
ScopedTimer timer(Resources::game->metrics.functionTime);
|
||||||
// ... code to measure ...
|
// ... code to measure ...
|
||||||
// Timer automatically records duration on destruction
|
// Timer automatically records duration on destruction
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Adding New Metrics
|
### Adding New Metrics
|
||||||
|
|
||||||
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
|
1. Add field to `ProfilingMetrics` in `src/GameEngine.h`
|
||||||
2. Reset in `resetPerFrame()` if per-frame counter
|
2. Reset in `resetPerFrame()` if per-frame counter
|
||||||
3. Display in `src/ProfilerOverlay.cpp::update()`
|
3. Display in `src/ProfilerOverlay.cpp::update()`
|
||||||
4. Instrument code with ScopedTimer
|
4. Instrument code with ScopedTimer
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Related Systems
|
## Related Systems
|
||||||
|
|
||||||
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
|
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
|
||||||
- [[Performance-Optimization-Workflow]] - Full optimization methodology
|
- [[Performance-Optimization-Workflow]] - Full optimization methodology
|
||||||
- [[Entity-Management]] - Entity performance considerations
|
- [[Entity-Management]] - Entity performance considerations
|
||||||
- [[Writing-Tests]] - Performance test creation
|
- [[Writing-Tests]] - Performance test creation
|
||||||
Loading…
Add table
Add a link
Reference in a new issue