Update "Performance-and-Profiling.-"
parent
070d5bb813
commit
9c742d4f72
1 changed files with 225 additions and 226 deletions
|
|
@ -8,9 +8,8 @@ Performance monitoring and optimization infrastructure for McRogueFace. Press F3
|
|||
- [#104](../issues/104) - Basic Profiling/Metrics (Closed - Implemented)
|
||||
- [#148](../issues/148) - Dirty Flag RenderTexture Caching (Closed - Implemented)
|
||||
- [#123](../issues/123) - Chunk-based Grid Rendering (Closed - Implemented)
|
||||
- [#115](../issues/115) - SpatialHash Implementation (Open - Tier 1)
|
||||
- [#113](../issues/113) - Batch Operations for Grid (Open - Tier 1)
|
||||
- [#117](../issues/117) - Memory Pool for Entities (Open - Tier 1)
|
||||
- [#115](../issues/115) - SpatialHash Implementation (Open)
|
||||
- [#113](../issues/113) - Batch Operations for Grid (Open)
|
||||
|
||||
**Key Files:**
|
||||
- `src/Profiler.h` - ScopedTimer RAII helper
|
||||
|
|
@ -21,7 +20,7 @@ Performance monitoring and optimization infrastructure for McRogueFace. Press F3
|
|||
|
||||
## Benchmark API
|
||||
|
||||
The benchmark API captures detailed per-frame timing data to JSON files. C++ handles all timing responsibility; Python processes results afterward.
|
||||
The benchmark API captures per-frame timing data to JSON files. C++ handles all timing; Python processes results afterward.
|
||||
|
||||
### Basic Usage
|
||||
|
||||
|
|
@ -36,7 +35,6 @@ mcrfpy.start_benchmark()
|
|||
# Stop and get the output filename
|
||||
filename = mcrfpy.end_benchmark()
|
||||
print(f"Benchmark saved to: {filename}")
|
||||
# e.g., "benchmark_12345_20250528_143022.json"
|
||||
```
|
||||
|
||||
### Adding Log Messages
|
||||
|
|
@ -55,7 +53,20 @@ mcrfpy.log_benchmark("Combat started")
|
|||
filename = mcrfpy.end_benchmark()
|
||||
```
|
||||
|
||||
Log messages appear in the `logs` array of each frame in the output JSON.
|
||||
### Headless Mode Note
|
||||
|
||||
In `--headless` mode with `step()`, the benchmark API warns that step-based simulation bypasses the game loop. For headless performance measurement, use Python's `time` module:
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
start = time.perf_counter()
|
||||
# ... operation to measure ...
|
||||
elapsed = time.perf_counter() - start
|
||||
print(f"Operation took {elapsed*1000:.2f}ms")
|
||||
```
|
||||
|
||||
The benchmark API works best with the normal game loop (non-headless mode).
|
||||
|
||||
### Output Format
|
||||
|
||||
|
|
@ -71,8 +82,7 @@ The JSON file contains per-frame data:
|
|||
"entity_render_time_ms": 2.1,
|
||||
"python_time_ms": 1.8,
|
||||
"logs": ["Player spawned"]
|
||||
},
|
||||
...
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"total_frames": 1000,
|
||||
|
|
@ -85,8 +95,6 @@ The JSON file contains per-frame data:
|
|||
|
||||
### Processing Results
|
||||
|
||||
Since Python processes results *after* capture, timing overhead doesn't affect measurements:
|
||||
|
||||
```python
|
||||
import json
|
||||
|
||||
|
|
@ -101,7 +109,6 @@ def analyze_benchmark(filename):
|
|||
print(f"Slow frames (>16.67ms): {len(slow_frames)}")
|
||||
print(f"Average: {data['summary']['avg_frame_time_ms']:.2f}ms")
|
||||
|
||||
# Find what was happening during slow frames
|
||||
for frame in slow_frames[:5]:
|
||||
print(f" Frame {frame['frame_number']}: {frame['frame_time_ms']:.1f}ms")
|
||||
if frame.get("logs"):
|
||||
|
|
@ -128,7 +135,6 @@ def analyze_benchmark(filename):
|
|||
- Per-frame counts:
|
||||
- Grid cells rendered
|
||||
- Entities rendered (visible/total)
|
||||
- Draw calls
|
||||
|
||||
**Implementation:** `src/ProfilerOverlay.cpp`
|
||||
|
||||
|
|
@ -139,7 +145,7 @@ def analyze_benchmark(filename):
|
|||
### Implemented Optimizations
|
||||
|
||||
**Chunk-based Rendering** ([#123](../issues/123)):
|
||||
- Large grids divided into chunks (~256 cells each)
|
||||
- Large grids divided into chunks
|
||||
- Only visible chunks processed
|
||||
- 1000x1000+ grids render efficiently
|
||||
|
||||
|
|
@ -154,19 +160,14 @@ def analyze_benchmark(filename):
|
|||
|
||||
### Current Bottlenecks
|
||||
|
||||
**Entity Spatial Queries** - O(n) iteration:
|
||||
- Finding entities at position requires checking all entities
|
||||
- Becomes noticeable at 500+ entities
|
||||
- **Solution:** [#115](../issues/115) SpatialHash
|
||||
**Entity Spatial Queries** - O(n) iteration for large counts:
|
||||
- Use `grid.entities_in_radius()` for proximity queries
|
||||
- SpatialHash planned in [#115](../issues/115)
|
||||
|
||||
**Bulk Grid Updates** - Python/C++ boundary:
|
||||
- Many individual `layer.set()` calls are slower than batch operations
|
||||
- Each call crosses the Python/C++ boundary
|
||||
- **Solution:** [#113](../issues/113) Batch Operations
|
||||
|
||||
**Entity Allocation** - Memory fragmentation:
|
||||
- Frequent spawn/destroy cycles fragment memory
|
||||
- **Solution:** [#117](../issues/117) Memory Pool
|
||||
- Use `layer.fill()` for uniform values
|
||||
- Batch operations planned in [#113](../issues/113)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -177,7 +178,8 @@ def analyze_benchmark(filename):
|
|||
3. **Analyze**: Process JSON to find patterns in slow frames
|
||||
4. **Optimize**: Make targeted changes
|
||||
5. **Verify**: Re-run benchmark, compare results
|
||||
6. **Iterate**: Repeat until acceptable performance
|
||||
|
||||
See [[Performance-Optimization-Workflow]] for the full methodology.
|
||||
|
||||
### Performance Targets
|
||||
|
||||
|
|
@ -218,9 +220,6 @@ void expensiveFunction() {
|
|||
## Related Systems
|
||||
|
||||
- [[Grid-Rendering-Pipeline]] - Chunk caching and dirty flags
|
||||
- [[Performance-Optimization-Workflow]] - Full optimization methodology
|
||||
- [[Entity-Management]] - Entity performance considerations
|
||||
- [[Writing-Tests]] - Performance test creation
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
Loading…
Add table
Add a link
Reference in a new issue