-
Notifications
You must be signed in to change notification settings - Fork 2.3k
WIP: internals leak analysis and fixes #5961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Destruct internals during interpreter finalization to fix PyTypeObject leaks.
During subinterpreter destruction, the internals object stored in a capsule in the interpreter's state dict is not freed. The capsule destructor (internals_shutdown) only runs at interpreter shutdown, not when individual subinterpreters are destroyed, causing a memory leak of ~1.7 MB per subinterpreter cycle. This commit adds a reset() method to internals_pp_manager that explicitly resets the unique_ptr in the capsule before Py_EndInterpreter is called, ensuring the internals object is freed when the subinterpreter is destroyed. Changes: - Added reset() method to internals_pp_manager that acquires the GIL and resets the unique_ptr from the state dict - Updated destroy() to also reset the unique_ptr before deleting the cached pointer-to-pointer - Modified subinterpreter destructor to call reset() before Py_EndInterpreter Note: Initial testing shows the leak persists, suggesting the issue may be more complex. Further investigation with heaptrack is needed to identify the root cause.
- Added Py_CLEAR(instance_base) to internals destructor to ensure it's properly decref'd when internals is destroyed - Modified reset() to directly access the capsule from the state dict instead of using cached pointer, ensuring we reset the correct unique_ptr Note: Leak investigation continues - these changes don't fully resolve the ~1.7 MB per iteration leak yet.
The patients container stores PyObject* pointers that are INCREF'd when added (see add_patient in class.h). These need to be DECREF'd when the internals object is destroyed to prevent leaks. Note: This doesn't fully resolve the leak yet - investigation continues.
Check that the unique_ptr actually contains an object before resetting it to avoid unnecessary operations on null pointers.
Clarify that when called from reset() before Py_EndInterpreter, the interpreter is definitely alive, so cleanup should always happen.
Wrap get_python_state_dict() call in try-catch to handle potential exceptions gracefully. If we can't get the state dict or capsule, silently continue as the capsule destructor will clean up eventually.
Introduce a non-creating internals lookup and use it in pybind11_meta_dealloc so teardown paths don’t accidentally rebuild internals and their static types during Py_EndInterpreter. This prevents with_internals from triggering make_static_property_type while the interpreter is finalizing, which heaptrack showed as leaked allocations. Changes: - add is_interpreter_finalizing() helper - add internals_pp_manager::get_pp_if_exists() and a state-dict lookup helper that does not create capsules - switch reset() to use the non-creating lookup - update pybind11_meta_dealloc to operate only when internals exist, avoiding implicit internals creation during GC/finalization This reduces the leaked allocations tied to pybind11_meta_dealloc and keeps teardown from reinitializing internals after they were reset.
Summary @ commit 51a8187
What heaptrack shows now
Root-cause direction: obmalloc arenasControlled experiments with
This strongly suggests per-interpreter obmalloc arenas are not freed on Next steps I can take
|
|
@b-pass @oremanj @Skylion007 Please let me know any feedback. If you have ideas for how to direct Cursor, please post them here, I'll try them out. AFAICT Claude 4.5 Opus got stuck kind-of, therefore I switched to GPT-5.2 Codex Extra High. At first sight it looks like it got a lot deeper, but I didn't look in any depth myself. If we can trust the Summary @ commit 51a8187 above, and IIUC, the remaining problems appear to be more likely in CPython than in pybind11. But TBH, I don't know anything about |
|
This is meant to help jump-start work in other agents: pr5961_context_2026-01-17+2010.md It references a bunch of files I have on my workstation. Please let me know if you think any of those might help. |
Description
Currently all changes are Cursor-generated and unreviewed
The first commit is PR #5958 squashed: 0da9df0
Cursor-driven changes (oldest@top to newest@bottom)
Suggested changelog entry: