Timer fires with corrupted callback after Python object is garbage collected #251
Labels
No labels
Alpha Release Requirement
Bugfix
Demo Target
Documentation
Major Feature
Minor Feature
priority:tier1-active
priority:tier2-foundation
priority:tier3-future
priority:tier4-deferred
Refactoring & Cleanup
system:animation
system:documentation
system:grid
system:input
system:performance
system:procgen
system:python-binding
system:rendering
system:ui-hierarchy
Tiny Feature
workflow:blocked
workflow:needs-benchmark
workflow:needs-documentation
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
john/McRogueFace#251
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
When a
mcrfpy.Timeris created without storing the returned Python object, Python's garbage collector destroys it. However, the C++ timer system still holds a reference and continues to fire the callback — resulting in wrong argument counts and eventual segfaults.This is a footgun because the natural pattern
mcrfpy.Timer("name", callback, 100)(withoutt = ...) silently breaks.Minimal Reproduction
Actual output:
Repeated firings eventually segfault.
Expected behavior (option A — prevent GC):
Timer stays alive as long as it's active. The C++ timer system prevents Python GC by holding a strong reference to the Python Timer object.
Expected behavior (option B — clean teardown):
Timer is automatically cancelled/removed from the C++ timer system when the Python object is GC'd (via
tp_dealloc).Workaround
Store the Timer reference so it isn't garbage collected:
Analysis
The C++ timer system registers the timer and holds a raw reference to the Python callback. When Python GC destroys the Timer object (because no Python references remain), the C++ side still fires on schedule. The callback's Python function reference is stale/corrupted, causing the wrong number of arguments to be passed and eventually a segfault.
Option A (prevent GC) is probably what users expect — creating a Timer should keep it alive until
stop()is called or the scene changes. This could be implemented by having the C++ timer registry hold aPy_INCREF'd reference to the Python Timer object, released onstop()or scene teardown.Option B (clean teardown) is the minimal safety fix —
tp_deallocshould unregister the timer from the C++ system so it never fires a dead callback.Both options could be combined: prevent GC while active (A), and as a safety net, unregister on dealloc (B).
Root Cause Found:
tp_deallocunconditionally destroys callbacks on ALL drawable typesThe Timer bug reported here is one instance of a systemic issue across every UIDrawable type (Frame, Caption, Sprite, Grid, Circle, Arc, Line, Viewport3D).
The Mechanism
When you do
frame.on_click = callback:PyCallableconstructor callsPy_XNewRef(callback)— properly increments refcount ✅unique_ptr<PyClickCallable>on the C++UIDrawable✅shared_ptr✅But when the Python wrapper goes out of scope:
tp_deallocfires on the wrapperclick_unregister(),on_enter_unregister(), etc.unique_ptr, which callsPy_DECREFon the callbackThe comment in the code even said "Clear Python references to break cycles" — this is
tp_clearlogic that was copy-pasted intotp_dealloc, where it's premature destruction.The Fix
Guard callback unregistration on
shared_ptr::use_count():Applied to all 8 drawable types: UIFrame.h, UICaption.h, UISprite.h, UIGrid.h, UICircle.h, UIArc.h, UILine.h, Viewport3D.h.
tp_clear(cycle breaker) left as-is since that's specifically called by the cyclic GC to break reference cycles.Regression test:
tests/regression/issue_251_callback_gc_test.py— 6 tests covering Frame, Caption, Sprite, multi-callback survival, standalone cleanup.Affected Files
src/UIFrame.h(tp_dealloc)src/UICaption.h(tp_dealloc)src/UISprite.h(tp_dealloc)src/UIGrid.h(tp_dealloc + cell callbacks)src/UICircle.h(tp_dealloc)src/UIArc.h(tp_dealloc)src/UILine.h(tp_dealloc)src/3d/Viewport3D.h(tp_dealloc)Lessons Learned: Python Patterns That Were Silently Broken
This bug made perfectly reasonable Python patterns silently fail. Users would write natural code, see no error, and wonder why nothing happened. These all worked fine before this commit — they just silently lost their callbacks:
1. Factory functions returning UI trees
2. List comprehension UI building
3. Scene setup in a function
4. Temporary variable reuse
5. The "it works if you store it" mystery
Why it wasn't caught earlier
_and historyNote on Timer (original report)
The Timer issue is a related but slightly different variant:
mcrfpy.Timer()stores the callback in the C++ timer system, but the Python Timer wrapper is GC'd. The fix for Timers should follow Option A from the original report (C++ timer system holdsPy_INCREF'd reference to prevent GC while active). The drawable callback fix here addresses the broader class of the same problem for all UI elements.Timer GC Fix Implemented (Option A)
Active timers now hold a strong reference (
Py_INCREF) to their Python wrapper object, preventing garbage collection while the timer is registered in the engine.Mechanism
Added
PyObject* py_wrapperto the C++Timerclass withretainPyWrapper()/releasePyWrapper()methods:init(start=True),start(),restart(),set_active(True)stop(), one-shot fires,~Timer()destructorState invariant
py_wrapperis non-null if and only if the timer is active in the engine's timer map.Edge cases handled
releasePyWrapper()called alongsidestopped = trueinTimer::test().retainPyWrapper()checksif (py_wrapper == wrapper) returnto avoid double-INCREF.releasePyWrapper()as safety net.Files modified
src/Timer.h— addedpy_wrapper,retainPyWrapper(),releasePyWrapper()src/Timer.cpp— implemented retain/release, added tostop(), one-shot path, destructorsrc/PyTimer.cpp— callsretainPyWrapper()ininit(),start(),restart(),set_active()Patterns that now work
Test
tests/regression/issue_251_timer_gc_test.py— 6 tests covering unstored timers, correct callback args, one-shot timers, stop-from-callback safety, and restart behavior.Combined with the drawable callback fix from the previous comment, this fully resolves issue #251 for both UI callbacks and timers.