[Bugfix] PythonObjectCache::lookup() reads hash map without mutex #269

Closed
opened 2026-03-07 23:21:03 +00:00 by john · 0 comments
Owner

Summary

PythonObjectCache::lookup() reads the std::unordered_map<uint64_t, PyObject*> cache without acquiring the serial_mutex, while registerObject(), remove(), cleanup(), and clear() all mutate the map under the mutex. If any of those methods run concurrently with lookup(), the hash map read can race with a write, causing undefined behavior (iterator invalidation during rehash, corrupted bucket chains, etc.).

Root Cause

PythonObjectCache.cpp:37-52:

PyObject* PythonObjectCache::lookup(uint64_t serial) {
    if (serial == 0) return nullptr;

    // No mutex needed for read - GIL protects PyWeakref_GetRef
    auto it = cache.find(serial);    // UNSAFE: concurrent modification possible
    if (it != cache.end()) {
        PyObject* obj = nullptr;
        int result = PyWeakref_GetRef(it->second, &obj);
        ...
    }
    return nullptr;
}

The comment says "No mutex needed for read - GIL protects PyWeakref_GetRef" — but the GIL protects the PyWeakref call, not the std::unordered_map::find(). The GIL does happen to serialize most Python-level operations, but:

  1. assignSerial() uses std::atomic suggesting thread-safety was intended
  2. Any C++ code running outside the GIL (e.g., render thread, timer callbacks) that touches the cache would race
  3. registerObject() and remove() take the mutex, implying thread-safety is expected

Impact

In the current single-threaded architecture, the GIL likely prevents actual races. But:

  • The code is inconsistent: some methods lock, others don't
  • If McRogueFace ever uses background threads (audio, rendering), this becomes a crash
  • rehash during cache[serial] = weakref in registerObject() would invalidate the iterator in a concurrent lookup()

Fix

Add mutex acquisition to lookup():

PyObject* PythonObjectCache::lookup(uint64_t serial) {
    if (serial == 0) return nullptr;

    std::lock_guard<std::mutex> lock(serial_mutex);
    auto it = cache.find(serial);
    if (it != cache.end()) {
        PyObject* obj = nullptr;
        int result = PyWeakref_GetRef(it->second, &obj);
        if (result == 1 && obj) {
            return obj;
        }
    }
    return nullptr;
}

Or, if performance is a concern, use a std::shared_mutex with shared_lock for reads.

Severity

Medium — currently protected by GIL in practice, but the code contract is inconsistent and any future threading would expose a data race.

## Summary `PythonObjectCache::lookup()` reads the `std::unordered_map<uint64_t, PyObject*> cache` without acquiring the `serial_mutex`, while `registerObject()`, `remove()`, `cleanup()`, and `clear()` all mutate the map under the mutex. If any of those methods run concurrently with `lookup()`, the hash map read can race with a write, causing undefined behavior (iterator invalidation during rehash, corrupted bucket chains, etc.). ## Root Cause `PythonObjectCache.cpp:37-52`: ```cpp PyObject* PythonObjectCache::lookup(uint64_t serial) { if (serial == 0) return nullptr; // No mutex needed for read - GIL protects PyWeakref_GetRef auto it = cache.find(serial); // UNSAFE: concurrent modification possible if (it != cache.end()) { PyObject* obj = nullptr; int result = PyWeakref_GetRef(it->second, &obj); ... } return nullptr; } ``` The comment says "No mutex needed for read - GIL protects PyWeakref_GetRef" — but the GIL protects the _PyWeakref_ call, not the `std::unordered_map::find()`. The GIL does happen to serialize most Python-level operations, but: 1. `assignSerial()` uses `std::atomic` suggesting thread-safety was intended 2. Any C++ code running outside the GIL (e.g., render thread, timer callbacks) that touches the cache would race 3. `registerObject()` and `remove()` take the mutex, implying thread-safety is expected ## Impact In the current single-threaded architecture, the GIL likely prevents actual races. But: - The code is inconsistent: some methods lock, others don't - If McRogueFace ever uses background threads (audio, rendering), this becomes a crash - `rehash` during `cache[serial] = weakref` in `registerObject()` would invalidate the iterator in a concurrent `lookup()` ## Fix Add mutex acquisition to `lookup()`: ```cpp PyObject* PythonObjectCache::lookup(uint64_t serial) { if (serial == 0) return nullptr; std::lock_guard<std::mutex> lock(serial_mutex); auto it = cache.find(serial); if (it != cache.end()) { PyObject* obj = nullptr; int result = PyWeakref_GetRef(it->second, &obj); if (result == 1 && obj) { return obj; } } return nullptr; } ``` Or, if performance is a concern, use a `std::shared_mutex` with `shared_lock` for reads. ## Severity **Medium** — currently protected by GIL in practice, but the code contract is inconsistent and any future threading would expose a data race.
john closed this issue 2026-03-14 06:25:16 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
john/McRogueFace#269
No description provided.