Sparse perspective writeback in UIEntity::updateVisibility (Phase 5.2 finding) #316

Open
opened 2026-04-18 10:46:49 +00:00 by john · 0 comments
Owner

Source: Phase 5.2 benchmark tests/benchmarks/fov_opt_bench.py (Kanboard #37). Baseline at tests/benchmarks/baseline/phase5_2/fov_opt_bench.json.

Finding

UIEntity::updateVisibility() (src/UIEntity.cpp:42) does two full-grid passes per call:

perspective_map->demoteVisible();           // walks all W*H cells
grid->computeFOV(x, y, fov_radius, ...);    // bounded -> O(radius^2-ish)
for (int gy = 0; gy < grid_h; gy++)         // walks all W*H cells again
    for (int gx = 0; gx < grid_w; gx++)
        if (grid->isInFOV(gx, gy)) buf[gy*W+gx] = VISIBLE;

On a 1000x1000 grid, the per-entity update is dominated by the two W*H walks (1M ops each), not the actual TCOD FOV computation:

Algorithm Radius grid.compute_fov() only + perspective writeback Writeback overhead
BASIC 8 2.71 ms/ent 27.70 ms/ent +24.99 ms
BASIC 16 2.79 ms/ent 29.84 ms/ent +27.05 ms
BASIC 32 2.96 ms/ent 28.93 ms/ent +25.98 ms
SHADOW 8 2.87 ms/ent 29.11 ms/ent +26.24 ms
SHADOW 16 2.88 ms/ent 30.25 ms/ent +27.36 ms
SHADOW 32 2.85 ms/ent 28.99 ms/ent +26.15 ms
SYMMETRIC_SHADOWCAST 8 23.99 ms/ent 48.72 ms/ent +24.73 ms
SYMMETRIC_SHADOWCAST 16 23.40 ms/ent 54.20 ms/ent +30.79 ms
SYMMETRIC_SHADOWCAST 32 24.42 ms/ent 60.90 ms/ent +36.48 ms

For 100 entities updating per turn this is ~3 seconds/turn on a 1000x1000 grid even with cheap algorithms.

Suggested optimization

Both passes should be bounded by an AABB around the entity sized to fov_radius:

int x0 = std::max(0, x - fov_radius - 1);
int y0 = std::max(0, y - fov_radius - 1);
int x1 = std::min(grid_w, x + fov_radius + 2);
int y1 = std::min(grid_h, y + fov_radius + 2);

Then walk only [y0..y1) x [x0..x1) for both demote and promote. Cells outside that window cannot transition state for this entity in this tick (they are not in FOV), so leaving them at their stored value is correct. For radius=32 on 1000x1000 this drops the cost from 1M to ~4225 cells per pass -- expected ~250x speedup on the writeback portion.

Caveat: with fov_radius=0 (unlimited) the bounded walk degenerates to the full grid; behavior is unchanged. The current full-grid loop is the right answer when radius is unlimited.

Validation

The bench in this commit can be re-run to confirm the speedup:

./mcrogueface --headless --exec ../tests/benchmarks/fov_opt_bench.py

Diff against tests/benchmarks/baseline/phase5_2/fov_opt_bench.json.

Labels

system:performance, system:grid, priority:tier2-foundation, Minor Feature, workflow:needs-benchmark

**Source:** Phase 5.2 benchmark `tests/benchmarks/fov_opt_bench.py` (Kanboard #37). Baseline at `tests/benchmarks/baseline/phase5_2/fov_opt_bench.json`. ## Finding `UIEntity::updateVisibility()` (src/UIEntity.cpp:42) does two full-grid passes per call: ```cpp perspective_map->demoteVisible(); // walks all W*H cells grid->computeFOV(x, y, fov_radius, ...); // bounded -> O(radius^2-ish) for (int gy = 0; gy < grid_h; gy++) // walks all W*H cells again for (int gx = 0; gx < grid_w; gx++) if (grid->isInFOV(gx, gy)) buf[gy*W+gx] = VISIBLE; ``` On a **1000x1000 grid**, the per-entity update is dominated by the two W*H walks (1M ops each), not the actual TCOD FOV computation: | Algorithm | Radius | grid.compute_fov() only | + perspective writeback | Writeback overhead | |-----------|--------|-------------------------|-------------------------|-------------------:| | BASIC | 8 | 2.71 ms/ent | 27.70 ms/ent | **+24.99 ms** | | BASIC | 16 | 2.79 ms/ent | 29.84 ms/ent | **+27.05 ms** | | BASIC | 32 | 2.96 ms/ent | 28.93 ms/ent | **+25.98 ms** | | SHADOW | 8 | 2.87 ms/ent | 29.11 ms/ent | **+26.24 ms** | | SHADOW | 16 | 2.88 ms/ent | 30.25 ms/ent | **+27.36 ms** | | SHADOW | 32 | 2.85 ms/ent | 28.99 ms/ent | **+26.15 ms** | | SYMMETRIC_SHADOWCAST | 8 | 23.99 ms/ent | 48.72 ms/ent | **+24.73 ms** | | SYMMETRIC_SHADOWCAST | 16 | 23.40 ms/ent | 54.20 ms/ent | **+30.79 ms** | | SYMMETRIC_SHADOWCAST | 32 | 24.42 ms/ent | 60.90 ms/ent | **+36.48 ms** | For 100 entities updating per turn this is ~3 seconds/turn on a 1000x1000 grid even with cheap algorithms. ## Suggested optimization Both passes should be bounded by an AABB around the entity sized to `fov_radius`: ```cpp int x0 = std::max(0, x - fov_radius - 1); int y0 = std::max(0, y - fov_radius - 1); int x1 = std::min(grid_w, x + fov_radius + 2); int y1 = std::min(grid_h, y + fov_radius + 2); ``` Then walk only `[y0..y1) x [x0..x1)` for both demote and promote. Cells outside that window cannot transition state for this entity in this tick (they are not in FOV), so leaving them at their stored value is correct. For radius=32 on 1000x1000 this drops the cost from 1M to ~4225 cells per pass -- expected ~250x speedup on the writeback portion. Caveat: with `fov_radius=0` (unlimited) the bounded walk degenerates to the full grid; behavior is unchanged. The current full-grid loop is the right answer when radius is unlimited. ## Validation The bench in this commit can be re-run to confirm the speedup: ``` ./mcrogueface --headless --exec ../tests/benchmarks/fov_opt_bench.py ``` Diff against `tests/benchmarks/baseline/phase5_2/fov_opt_bench.json`. ## Labels system:performance, system:grid, priority:tier2-foundation, Minor Feature, workflow:needs-benchmark
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
john/McRogueFace#316
No description provided.