app queue devicestate high water mark issue 1972#1973
Conversation
There was a problem hiding this comment.
Attention! This pull request may contain issues that could prevent it from being accepted. Please review the checklist below and take the recommended action. If you believe any of these are not applicable, just add a comment and let us know.
- The PR title does not match the commit title. This can cause confusion for reviewers and future maintainers. GitHub doesn't automatically update the PR title when you update the commit message so if you've updated the commit with a force-push, please update the PR title to match the new commit message body.
- The PR description does not match the commit message body. This can cause confusion for reviewers and future maintainers. GitHub doesn't automatically update the PR description when you update the commit message so if you've updated the commit with a force-push, please update the PR description to match the new commit message body.
Documentation:
|
Workflow Check failed |
|
Additionally, if AI was used in the creation of patches or issues its usage must be disclosed. |
device_state_cb() iterated every queue and every member for each device state message on the devicestate:all topic. Reloading a queue with thousands of members floods that single subscription's taskprocessor (stasis/m:devicestate:all) with the per-member pause/avail hints app_queue publishes, tripping the 500 high water mark and raising the global taskprocessor congestion alert. Maintain a reference-counted index of the device-state identifiers that queue members actually watch (via state_interface) and consult it at the top of device_state_cb(). Device states no member watches are dropped in O(1) instead of triggering an O(queues * members) scan, and the Queue:..._avail hints the callback republishes no longer re-enter it. Behavior for watched devices is unchanged. Also fix a race in rt_handle_member_record(): when a realtime reload changes a member's state_interface, start watching the new device before storing it on the member (and before unwatching the old). Previously the member was pointed at the new interface first and only then added to the watcher set, leaving a brief window where m->state_interface referred to a device not yet watched. Watching before publishing closes the window: any device_state_cb() that passes the watch check then serializes on the queue lock and observes the committed state_interface. Resolves: asterisk#1972 UserNote: app_queue now handles device-state changes efficiently when reloading queues with large member counts, avoiding a flood of the stasis/m:devicestate:all taskprocessor past its high water mark. Co-authored-by: Thomas <1258170+ThomasSevestre@users.noreply.github.com>
0db05ad to
f71e605
Compare
|
cherry-pick-to: 20 |
AI usage disclosureThis contribution was prepared with AI assistance (Claude Code / Anthropic Claude).
I have reviewed the change in full and understand it, and I can debug and own it. |
|
Workflow Check completed successfully |
app_queue.c: Index member device states to avoid scanning on every event
device_state_cb() iterated every queue and every member for each device
state message on the devicestate:all topic. Reloading a queue with
thousands of members floods that single subscription's taskprocessor
(stasis/m:devicestate:all) with the per-member pause/avail hints
app_queue publishes, tripping the 500 high water mark and raising the
global taskprocessor congestion alert.
Maintain a reference-counted index of the device-state identifiers that
queue members actually watch (via state_interface) and consult it at the
top of device_state_cb(). Device states no member watches are dropped in
O(1) instead of triggering an O(queues * members) scan, and the
Queue:..._avail hints the callback republishes no longer re-enter it.
Behavior for watched devices is unchanged.
Also fix a race in rt_handle_member_record(): when a realtime reload
changes a member's state_interface, start watching the new device before
storing it on the member (and before unwatching the old). Previously the
member was pointed at the new interface first and only then added to the
watcher set, leaving a brief window where m->state_interface referred to
a device not yet watched. Watching before publishing closes the window:
any device_state_cb() that passes the watch check then serializes on the
queue lock and observes the committed state_interface.
Resolves: #1972
UserNote: app_queue now handles device-state changes efficiently when
reloading queues with large member counts, avoiding a flood of the
stasis/m:devicestate:all taskprocessor past its high water mark.