Tracking down performance improvement opportunities #255

New issue

Open

opened 2026-03-04 04:32:07 +00:00 by john · 0 comments

john commented

2026-03-04 04:32:07 +00:00

Owner

∴ Thinking…                                           
                                                                                                                                                                             
  Now I have the full picture. Let me compile my analysis.                                                                                                                   
                                                                                                                                                                             
  Summary of per-frame work during walking                                                                                                                                   
                                                                                                                                                                             
  Steady state (during 0.35s walk animation):                                                                                                                              

  Animation system update (C++):
  - 5 active animations: entity.x, entity.y, grid.center_x, grid.center_y, entity.sprite_index
  - Each: lock weak_ptr → interpolate → applyValue → setProperty
  - entity.x setProperty → grid->markDirty() (markContentDirty propagates up)
  - entity.y setProperty → grid->markDirty() (again)

  - entity.sprite_index setProperty → grid->markDirty() (again)
  - grid.center_x setProperty → grid->markDirty() (on grid itself, propagates up)
  - grid.center_y setProperty → grid->markDirty() (on grid itself, propagates up)
  - Total: 5 markDirty calls per frame, propagating up the parent chain each time

  AnimationManager::update also:
  - cleanupPropertyLocks() - iterates unordered_map, removes expired entries
  - processQueue() - iterates animation queue (usually empty)

  Grid render (

  C++, every frame unconditionally): clears the renderTexture, iterates visible chunks per layer checking if they're dirty and drawing cached textures (chunks stay clean
  during animation since only grid-level gets marked dirty), loops through entities for frustum culling and pixel position calculation to draw them, then outputs the final
  sprite to the parent.

  Key autorepeat (Python): the OS sends PRESSED events periodically at varying rates, each triggering on_key() → move_player() which returns early if the player is already
  moving, keeping the overhead minimal despite involving the Python interpreter.

  Animation boundary frame (every 0.35s): when the entity.y animation completes, it triggers a callback that runs Python's _done_moving() method.

  Next movement cycle: once player.moving becomes False, the next autorepeat allows move_player() to proceed, querying the spatial hash for nearby entities, checking grid
  walkability, then calling player.anim() which does a dictionary lookup for facing direction and creates animation frames through list comprehension before setting up the
  movement animations.

  Now I'm looking at the root cause of those frame rate dips. The grid's render() method doesn't have any optimization to skip rendering when nothing's changed—it clears and
   redraws the renderTexture every frame regardless, including all visible layer chunks and entities, which accounts for the baseline 58/59 fps even when the player is
  standing still.

  The visible lag spikes happen every 0.35 seconds when animations complete and restart, because that single frame has to handle 5 animation completions with Python
  callbacks, execute the player movement logic with dictionary lookups and list comprehensions, create 5 new Animation objects with memory allocation, call
  entities_in_radius(), and access walkable properties—all at once.

  There's also a potential issue with markDirty() being called 5 times per frame during animation, where each call propagates up through the parent chain checking if the
  object was already marked dirty. ates. On the second call, was_dirty is already true from the first invocation, so the condition becomes false and propagation stops—the
  short-circuit works as intended. For the game side, I should recommend consolidating the entity animations into a single Vector2f instead of separate x and y calls,
  avoiding redundant grid camera animations by calculating the center position during rendering, and potentially pre-warming the animation system to reduce object creation
  overhead.

  On the engine side, I could add an early-out check in the grid render to skip full rendering when nothing's dirty and just redraw the cached texture. Rather than marking
  content dirty for entity position changes, I should use a separate composite-dirty flag since position updates don't invalidate the layer chunks themselves. A
  camera-follow mode would eliminate the need for separate center_x and center_y animations entirely. Though looking at this more carefully, the grid actually re-renders
  unconditionally regardless of dirty flags, so the optimization strategy needs to account for that.

  The real issue seems to be animation boundary frames when the player moves. If standing still hits 58-59fps but walking causes spikes, then consolidating animations would
  help—combining entity.x and entity.y into a single vector animation, and doing the same for camera center, would reduce from five animations down to three. Using a
  dedicated center_camera() method instead of animating the camera position separately would be more efficient.

  On the engine side, the grid's markDirty() call is too broad—it marks the entire texture for rebuild even though entity positions don't actually affect the layer chunk
  cache. A more granular dirty flag would help. The real culprit might be the renderTexture.clear() and display() calls forcing GPU flushes every frame, which could be
  causing the performance hiccup on certain hardware.

● Now I have a complete picture. Here's my analysis:    

  ---                                                                                                                                                                        
  Per-Frame Cost During Walking
                                                                                                                                                                             
  When holding WASD, here's what runs every frame during the 0.35s walk animation:                                                                                         
                                                                                                                                                                           
  C++ Animation System (per frame)

  5 concurrent animations update: entity.x, entity.y, grid.center_x, grid.center_y, entity.sprite_index. Each one:
  1. Locks weak_ptr, interpolates, calls setProperty()
  2. setProperty() calls grid->markDirty() → markContentDirty() → propagates to parent

  That's 5 markDirty calls per frame (though after the first, subsequent ones short-circuit parent propagation).

  C++ Grid Render (per frame, unconditionally)

  The grid's render() at UIGrid.cpp:140 has no early-out for clean state. Every frame it:
  1. Clears renderTexture
  2. Iterates visible layer chunks → checks chunk_dirty → draws cached textures (chunks are NOT dirty during animation, just the grid composite)
  3. Iterates all entities (just 1), frustum culls, calculates pixel position, draws
  4. Draws output sprite to screen

  Python (per key autorepeat event)

  on_key() → move_player() → if player.moving: return  # ~immediate
  Very cheap, but still enters the Python interpreter each time.

  ---
  The Spike: Animation Boundary Frame

  Every 0.35 seconds, all 5 animations complete and 5 new ones start. This single frame does significantly more work:

  1. 5 animation completions (including Python callback _done_moving)
  2. Python move_player() fully executes: entities_in_radius() query, grid[x,y].walkable check
  3. player.anim("walk", ...): dict lookup, list comprehension [i + 29*self.facing_dir for i in frames], 3 calls to self.animate()
  4. 2 grid.animate() calls
  5. 5 new Animation objects allocated in C++

  This is the frame that spikes.

  ---
  Recommendations

  Game-side (quick wins)

  1. Combine entity position animations into one callback frame

  Currently gamemap.py:132-134:
  if player.anim("walk", delta=delta, do_move=True):
      grid.animate("center_x", player.x+(delta[0]*16), duration=0.35)
      grid.animate("center_y", player.y+(delta[1]*16), duration=0.35)

  And shade_humanoid.py:525-528:
  self.animate('x', float(delta[0]), duration, delta=True)
  self.animate('y', float(delta[1]), duration, delta=True, callback=self._done_moving)

  The callback is on y only. If the player only moves in X (not Y), the y animation still runs (animating 0.0 delta) just to carry the callback. Consider putting the
  callback on whichever axis actually moves, or using a Timer instead.

  2. Use grid.center_camera() in the callback instead of parallel center animations

  Instead of running 2 extra animations alongside entity movement, snap the camera at the end:
  def _done_moving(self, *args):
      self.moving = False
      # Let the game's post-move hook handle camera
  And in gamemap, tie camera position to entity position via the callback:
  def on_move_complete(target, prop, value):
      grid.center = (player.x, player.y)  # or center_camera()
      player.moving = False

  # In Humanoid.anim():
  self.animate('y', float(delta[1]), duration, delta=True, callback=on_move_complete)
  This eliminates 2 animations (center_x, center_y), dropping from 5 to 3 per step. The camera will jump to final position on completion instead of animating, but at zoom
  2.0 with 0.35s steps the entity animation itself provides visual continuity.

  Or, if smooth camera is important, animate camera independently with a single x,y call and use the delta=True pattern just like the entity.

  Engine-side (root causes)

  3. Entity setProperty should use markCompositeDirty() not markContentDirty()

  At UIEntity.cpp:1130,1136,1141,1164:
  if (grid) grid->markDirty();  // calls markContentDirty()

  Entity position and sprite_index changes don't invalidate layer chunk caches. They only require the grid to re-composite its renderTexture. Using markCompositeDirty()
  would be more accurate and avoids propagating "content changed" up the tree unnecessarily.

  4. Grid render should have a dirty early-out when nothing changed

  UIGrid::render() unconditionally clears and re-renders its renderTexture every frame, even when standing still. Adding:
  if (!render_dirty && !composite_dirty) {
      // Just redraw cached output sprite
      output.setPosition(box.getPosition() + offset);
      target.draw(output);
      return;
  }
  This wouldn't help during animation (dirty every frame), but would eliminate ~1-2ms overhead when idle, likely recovering those missing fps to hit 60.

  5. Consider a "camera follow" entity feature

  A grid.follow_entity = entity mode that automatically sets center_x/center_y from the entity's position each frame would eliminate the need for separate camera animations
  entirely.

  ---
  TL;DR

  The spike is the animation boundary frame (every 0.35s) where 5 animations complete + 5 restart + Python callback + move_player() fully executes. Reduce from 5 to 3
  animations by eliminating separate camera animations. The baseline 58/59fps is from the grid unconditionally re-rendering to its renderTexture every frame -- an
  engine-side dirty-check early-out would fix that.

markCompositeDirty() and an early direct render escape hatch when all caches are valid have been added, but tracking down additional grid performance when animating entities and camera position is on the to-do.

``` ∴ Thinking… Now I have the full picture. Let me compile my analysis. Summary of per-frame work during walking Steady state (during 0.35s walk animation): Animation system update (C++): - 5 active animations: entity.x, entity.y, grid.center_x, grid.center_y, entity.sprite_index - Each: lock weak_ptr → interpolate → applyValue → setProperty - entity.x setProperty → grid->markDirty() (markContentDirty propagates up) - entity.y setProperty → grid->markDirty() (again) - entity.sprite_index setProperty → grid->markDirty() (again) - grid.center_x setProperty → grid->markDirty() (on grid itself, propagates up) - grid.center_y setProperty → grid->markDirty() (on grid itself, propagates up) - Total: 5 markDirty calls per frame, propagating up the parent chain each time AnimationManager::update also: - cleanupPropertyLocks() - iterates unordered_map, removes expired entries - processQueue() - iterates animation queue (usually empty) Grid render ( C++, every frame unconditionally): clears the renderTexture, iterates visible chunks per layer checking if they're dirty and drawing cached textures (chunks stay clean during animation since only grid-level gets marked dirty), loops through entities for frustum culling and pixel position calculation to draw them, then outputs the final sprite to the parent. Key autorepeat (Python): the OS sends PRESSED events periodically at varying rates, each triggering on_key() → move_player() which returns early if the player is already moving, keeping the overhead minimal despite involving the Python interpreter. Animation boundary frame (every 0.35s): when the entity.y animation completes, it triggers a callback that runs Python's _done_moving() method. Next movement cycle: once player.moving becomes False, the next autorepeat allows move_player() to proceed, querying the spatial hash for nearby entities, checking grid walkability, then calling player.anim() which does a dictionary lookup for facing direction and creates animation frames through list comprehension before setting up the movement animations. Now I'm looking at the root cause of those frame rate dips. The grid's render() method doesn't have any optimization to skip rendering when nothing's changed—it clears and redraws the renderTexture every frame regardless, including all visible layer chunks and entities, which accounts for the baseline 58/59 fps even when the player is standing still. The visible lag spikes happen every 0.35 seconds when animations complete and restart, because that single frame has to handle 5 animation completions with Python callbacks, execute the player movement logic with dictionary lookups and list comprehensions, create 5 new Animation objects with memory allocation, call entities_in_radius(), and access walkable properties—all at once. There's also a potential issue with markDirty() being called 5 times per frame during animation, where each call propagates up through the parent chain checking if the object was already marked dirty. ates. On the second call, was_dirty is already true from the first invocation, so the condition becomes false and propagation stops—the short-circuit works as intended. For the game side, I should recommend consolidating the entity animations into a single Vector2f instead of separate x and y calls, avoiding redundant grid camera animations by calculating the center position during rendering, and potentially pre-warming the animation system to reduce object creation overhead. On the engine side, I could add an early-out check in the grid render to skip full rendering when nothing's dirty and just redraw the cached texture. Rather than marking content dirty for entity position changes, I should use a separate composite-dirty flag since position updates don't invalidate the layer chunks themselves. A camera-follow mode would eliminate the need for separate center_x and center_y animations entirely. Though looking at this more carefully, the grid actually re-renders unconditionally regardless of dirty flags, so the optimization strategy needs to account for that. The real issue seems to be animation boundary frames when the player moves. If standing still hits 58-59fps but walking causes spikes, then consolidating animations would help—combining entity.x and entity.y into a single vector animation, and doing the same for camera center, would reduce from five animations down to three. Using a dedicated center_camera() method instead of animating the camera position separately would be more efficient. On the engine side, the grid's markDirty() call is too broad—it marks the entire texture for rebuild even though entity positions don't actually affect the layer chunk cache. A more granular dirty flag would help. The real culprit might be the renderTexture.clear() and display() calls forcing GPU flushes every frame, which could be causing the performance hiccup on certain hardware. ● Now I have a complete picture. Here's my analysis: --- Per-Frame Cost During Walking When holding WASD, here's what runs every frame during the 0.35s walk animation: C++ Animation System (per frame) 5 concurrent animations update: entity.x, entity.y, grid.center_x, grid.center_y, entity.sprite_index. Each one: 1. Locks weak_ptr, interpolates, calls setProperty() 2. setProperty() calls grid->markDirty() → markContentDirty() → propagates to parent That's 5 markDirty calls per frame (though after the first, subsequent ones short-circuit parent propagation). C++ Grid Render (per frame, unconditionally) The grid's render() at UIGrid.cpp:140 has no early-out for clean state. Every frame it: 1. Clears renderTexture 2. Iterates visible layer chunks → checks chunk_dirty → draws cached textures (chunks are NOT dirty during animation, just the grid composite) 3. Iterates all entities (just 1), frustum culls, calculates pixel position, draws 4. Draws output sprite to screen Python (per key autorepeat event) on_key() → move_player() → if player.moving: return # ~immediate Very cheap, but still enters the Python interpreter each time. --- The Spike: Animation Boundary Frame Every 0.35 seconds, all 5 animations complete and 5 new ones start. This single frame does significantly more work: 1. 5 animation completions (including Python callback _done_moving) 2. Python move_player() fully executes: entities_in_radius() query, grid[x,y].walkable check 3. player.anim("walk", ...): dict lookup, list comprehension [i + 29*self.facing_dir for i in frames], 3 calls to self.animate() 4. 2 grid.animate() calls 5. 5 new Animation objects allocated in C++ This is the frame that spikes. --- Recommendations Game-side (quick wins) 1. Combine entity position animations into one callback frame Currently gamemap.py:132-134: if player.anim("walk", delta=delta, do_move=True): grid.animate("center_x", player.x+(delta[0]*16), duration=0.35) grid.animate("center_y", player.y+(delta[1]*16), duration=0.35) And shade_humanoid.py:525-528: self.animate('x', float(delta[0]), duration, delta=True) self.animate('y', float(delta[1]), duration, delta=True, callback=self._done_moving) The callback is on y only. If the player only moves in X (not Y), the y animation still runs (animating 0.0 delta) just to carry the callback. Consider putting the callback on whichever axis actually moves, or using a Timer instead. 2. Use grid.center_camera() in the callback instead of parallel center animations Instead of running 2 extra animations alongside entity movement, snap the camera at the end: def _done_moving(self, *args): self.moving = False # Let the game's post-move hook handle camera And in gamemap, tie camera position to entity position via the callback: def on_move_complete(target, prop, value): grid.center = (player.x, player.y) # or center_camera() player.moving = False # In Humanoid.anim(): self.animate('y', float(delta[1]), duration, delta=True, callback=on_move_complete) This eliminates 2 animations (center_x, center_y), dropping from 5 to 3 per step. The camera will jump to final position on completion instead of animating, but at zoom 2.0 with 0.35s steps the entity animation itself provides visual continuity. Or, if smooth camera is important, animate camera independently with a single x,y call and use the delta=True pattern just like the entity. Engine-side (root causes) 3. Entity setProperty should use markCompositeDirty() not markContentDirty() At UIEntity.cpp:1130,1136,1141,1164: if (grid) grid->markDirty(); // calls markContentDirty() Entity position and sprite_index changes don't invalidate layer chunk caches. They only require the grid to re-composite its renderTexture. Using markCompositeDirty() would be more accurate and avoids propagating "content changed" up the tree unnecessarily. 4. Grid render should have a dirty early-out when nothing changed UIGrid::render() unconditionally clears and re-renders its renderTexture every frame, even when standing still. Adding: if (!render_dirty && !composite_dirty) { // Just redraw cached output sprite output.setPosition(box.getPosition() + offset); target.draw(output); return; } This wouldn't help during animation (dirty every frame), but would eliminate ~1-2ms overhead when idle, likely recovering those missing fps to hit 60. 5. Consider a "camera follow" entity feature A grid.follow_entity = entity mode that automatically sets center_x/center_y from the entity's position each frame would eliminate the need for separate camera animations entirely. --- TL;DR The spike is the animation boundary frame (every 0.35s) where 5 animations complete + 5 restart + Python callback + move_player() fully executes. Reduce from 5 to 3 animations by eliminating separate camera animations. The baseline 58/59fps is from the grid unconditionally re-rendering to its renderTexture every frame -- an engine-side dirty-check early-out would fix that. ``` `markCompositeDirty()` and an early direct render escape hatch when all caches are valid have been added, but tracking down additional grid performance when animating entities and camera position is on the to-do.