Sparse perspective writeback in UIEntity::updateVisibility (Phase 5.2 finding) #316

New issue

Open

opened 2026-04-18 10:46:49 +00:00 by john · 0 comments

john commented

2026-04-18 10:46:49 +00:00

Owner

Source: Phase 5.2 benchmark tests/benchmarks/fov_opt_bench.py (Kanboard #37). Baseline at tests/benchmarks/baseline/phase5_2/fov_opt_bench.json.

Finding

UIEntity::updateVisibility() (src/UIEntity.cpp:42) does two full-grid passes per call:

perspective_map->demoteVisible();           // walks all W*H cells
grid->computeFOV(x, y, fov_radius, ...);    // bounded -> O(radius^2-ish)
for (int gy = 0; gy < grid_h; gy++)         // walks all W*H cells again
    for (int gx = 0; gx < grid_w; gx++)
        if (grid->isInFOV(gx, gy)) buf[gy*W+gx] = VISIBLE;

On a 1000x1000 grid, the per-entity update is dominated by the two W*H walks (1M ops each), not the actual TCOD FOV computation:

Algorithm	Radius	grid.compute_fov() only	+ perspective writeback	Writeback overhead
BASIC	8	2.71 ms/ent	27.70 ms/ent	+24.99 ms
BASIC	16	2.79 ms/ent	29.84 ms/ent	+27.05 ms
BASIC	32	2.96 ms/ent	28.93 ms/ent	+25.98 ms
SHADOW	8	2.87 ms/ent	29.11 ms/ent	+26.24 ms
SHADOW	16	2.88 ms/ent	30.25 ms/ent	+27.36 ms
SHADOW	32	2.85 ms/ent	28.99 ms/ent	+26.15 ms
SYMMETRIC_SHADOWCAST	8	23.99 ms/ent	48.72 ms/ent	+24.73 ms
SYMMETRIC_SHADOWCAST	16	23.40 ms/ent	54.20 ms/ent	+30.79 ms
SYMMETRIC_SHADOWCAST	32	24.42 ms/ent	60.90 ms/ent	+36.48 ms

For 100 entities updating per turn this is ~3 seconds/turn on a 1000x1000 grid even with cheap algorithms.

Suggested optimization

Both passes should be bounded by an AABB around the entity sized to fov_radius:

int x0 = std::max(0, x - fov_radius - 1);
int y0 = std::max(0, y - fov_radius - 1);
int x1 = std::min(grid_w, x + fov_radius + 2);
int y1 = std::min(grid_h, y + fov_radius + 2);

Then walk only [y0..y1) x [x0..x1) for both demote and promote. Cells outside that window cannot transition state for this entity in this tick (they are not in FOV), so leaving them at their stored value is correct. For radius=32 on 1000x1000 this drops the cost from 1M to ~4225 cells per pass -- expected ~250x speedup on the writeback portion.

Caveat: with fov_radius=0 (unlimited) the bounded walk degenerates to the full grid; behavior is unchanged. The current full-grid loop is the right answer when radius is unlimited.

Validation

The bench in this commit can be re-run to confirm the speedup:

./mcrogueface --headless --exec ../tests/benchmarks/fov_opt_bench.py

Diff against tests/benchmarks/baseline/phase5_2/fov_opt_bench.json.

Labels

system:performance, system:grid, priority:tier2-foundation, Minor Feature, workflow:needs-benchmark

**Source:** Phase 5.2 benchmark `tests/benchmarks/fov_opt_bench.py` (Kanboard #37). Baseline at `tests/benchmarks/baseline/phase5_2/fov_opt_bench.json`. ## Finding `UIEntity::updateVisibility()` (src/UIEntity.cpp:42) does two full-grid passes per call: ```cpp perspective_map->demoteVisible(); // walks all W*H cells grid->computeFOV(x, y, fov_radius, ...); // bounded -> O(radius^2-ish) for (int gy = 0; gy < grid_h; gy++) // walks all W*H cells again for (int gx = 0; gx < grid_w; gx++) if (grid->isInFOV(gx, gy)) buf[gy*W+gx] = VISIBLE; ``` On a **1000x1000 grid**, the per-entity update is dominated by the two W*H walks (1M ops each), not the actual TCOD FOV computation: | Algorithm | Radius | grid.compute_fov() only | + perspective writeback | Writeback overhead | |-----------|--------|-------------------------|-------------------------|-------------------:| | BASIC | 8 | 2.71 ms/ent | 27.70 ms/ent | **+24.99 ms** | | BASIC | 16 | 2.79 ms/ent | 29.84 ms/ent | **+27.05 ms** | | BASIC | 32 | 2.96 ms/ent | 28.93 ms/ent | **+25.98 ms** | | SHADOW | 8 | 2.87 ms/ent | 29.11 ms/ent | **+26.24 ms** | | SHADOW | 16 | 2.88 ms/ent | 30.25 ms/ent | **+27.36 ms** | | SHADOW | 32 | 2.85 ms/ent | 28.99 ms/ent | **+26.15 ms** | | SYMMETRIC_SHADOWCAST | 8 | 23.99 ms/ent | 48.72 ms/ent | **+24.73 ms** | | SYMMETRIC_SHADOWCAST | 16 | 23.40 ms/ent | 54.20 ms/ent | **+30.79 ms** | | SYMMETRIC_SHADOWCAST | 32 | 24.42 ms/ent | 60.90 ms/ent | **+36.48 ms** | For 100 entities updating per turn this is ~3 seconds/turn on a 1000x1000 grid even with cheap algorithms. ## Suggested optimization Both passes should be bounded by an AABB around the entity sized to `fov_radius`: ```cpp int x0 = std::max(0, x - fov_radius - 1); int y0 = std::max(0, y - fov_radius - 1); int x1 = std::min(grid_w, x + fov_radius + 2); int y1 = std::min(grid_h, y + fov_radius + 2); ``` Then walk only `[y0..y1) x [x0..x1)` for both demote and promote. Cells outside that window cannot transition state for this entity in this tick (they are not in FOV), so leaving them at their stored value is correct. For radius=32 on 1000x1000 this drops the cost from 1M to ~4225 cells per pass -- expected ~250x speedup on the writeback portion. Caveat: with `fov_radius=0` (unlimited) the bounded walk degenerates to the full grid; behavior is unchanged. The current full-grid loop is the right answer when radius is unlimited. ## Validation The bench in this commit can be re-run to confirm the speedup: ``` ./mcrogueface --headless --exec ../tests/benchmarks/fov_opt_bench.py ``` Diff against `tests/benchmarks/baseline/phase5_2/fov_opt_bench.json`. ## Labels system:performance, system:grid, priority:tier2-foundation, Minor Feature, workflow:needs-benchmark