Skip to content

app queue devicestate high water mark issue 1972#1973

Open
PujaGediya wants to merge 1 commit into
asterisk:masterfrom
PujaGediya:app_queue-devicestate-high-water-mark-issue-1972
Open

app queue devicestate high water mark issue 1972#1973
PujaGediya wants to merge 1 commit into
asterisk:masterfrom
PujaGediya:app_queue-devicestate-high-water-mark-issue-1972

Conversation

@PujaGediya

@PujaGediya PujaGediya commented Jun 4, 2026

Copy link
Copy Markdown

app_queue.c: Index member device states to avoid scanning on every event

device_state_cb() iterated every queue and every member for each device
state message on the devicestate:all topic. Reloading a queue with
thousands of members floods that single subscription's taskprocessor
(stasis/m:devicestate:all) with the per-member pause/avail hints
app_queue publishes, tripping the 500 high water mark and raising the
global taskprocessor congestion alert.

Maintain a reference-counted index of the device-state identifiers that
queue members actually watch (via state_interface) and consult it at the
top of device_state_cb(). Device states no member watches are dropped in
O(1) instead of triggering an O(queues * members) scan, and the
Queue:..._avail hints the callback republishes no longer re-enter it.
Behavior for watched devices is unchanged.

Also fix a race in rt_handle_member_record(): when a realtime reload
changes a member's state_interface, start watching the new device before
storing it on the member (and before unwatching the old). Previously the
member was pointed at the new interface first and only then added to the
watcher set, leaving a brief window where m->state_interface referred to
a device not yet watched. Watching before publishing closes the window:
any device_state_cb() that passes the watch check then serializes on the
queue lock and observes the committed state_interface.

Resolves: #1972

UserNote: app_queue now handles device-state changes efficiently when
reloading queues with large member counts, avoiding a flood of the
stasis/m:devicestate:all taskprocessor past its high water mark.

@sangoma-oss-cla

sangoma-oss-cla Bot commented Jun 4, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention! This pull request may contain issues that could prevent it from being accepted. Please review the checklist below and take the recommended action. If you believe any of these are not applicable, just add a comment and let us know.

  • The PR title does not match the commit title. This can cause confusion for reviewers and future maintainers. GitHub doesn't automatically update the PR title when you update the commit message so if you've updated the commit with a force-push, please update the PR title to match the new commit message body.
  • The PR description does not match the commit message body. This can cause confusion for reviewers and future maintainers. GitHub doesn't automatically update the PR description when you update the commit message so if you've updated the commit with a force-push, please update the PR description to match the new commit message body.

Documentation:

@github-actions github-actions Bot added the has-pr-checklist A PR Checklist is present on the PR label Jun 4, 2026
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Workflow Check failed
master-pjs5-check-506: FAILED TEST: channels/pjsip/subscriptions/rls/lists_of_lists/nominal/mwi/batched

@jcolp

jcolp commented Jun 4, 2026

Copy link
Copy Markdown
Member

Additionally, if AI was used in the creation of patches or issues its usage must be disclosed.

device_state_cb() iterated every queue and every member for each device
state message on the devicestate:all topic. Reloading a queue with
thousands of members floods that single subscription's taskprocessor
(stasis/m:devicestate:all) with the per-member pause/avail hints
app_queue publishes, tripping the 500 high water mark and raising the
global taskprocessor congestion alert.

Maintain a reference-counted index of the device-state identifiers that
queue members actually watch (via state_interface) and consult it at the
top of device_state_cb(). Device states no member watches are dropped in
O(1) instead of triggering an O(queues * members) scan, and the
Queue:..._avail hints the callback republishes no longer re-enter it.
Behavior for watched devices is unchanged.

Also fix a race in rt_handle_member_record(): when a realtime reload
changes a member's state_interface, start watching the new device before
storing it on the member (and before unwatching the old). Previously the
member was pointed at the new interface first and only then added to the
watcher set, leaving a brief window where m->state_interface referred to
a device not yet watched. Watching before publishing closes the window:
any device_state_cb() that passes the watch check then serializes on the
queue lock and observes the committed state_interface.

Resolves: asterisk#1972

UserNote: app_queue now handles device-state changes efficiently when
reloading queues with large member counts, avoiding a flood of the
stasis/m:devicestate:all taskprocessor past its high water mark.

Co-authored-by: Thomas <1258170+ThomasSevestre@users.noreply.github.com>
@PujaGediya PujaGediya force-pushed the app_queue-devicestate-high-water-mark-issue-1972 branch from 0db05ad to f71e605 Compare June 4, 2026 08:41
@PujaGediya

Copy link
Copy Markdown
Author

cherry-pick-to: 20

@PujaGediya

Copy link
Copy Markdown
Author

AI usage disclosure

This contribution was prepared with AI assistance (Claude Code / Anthropic Claude).
AI was used to:

  • investigate the root cause in device_state_cb() and the device-state stasis path, and confirm the taskprocessor high-water behavior;
  • draft the code change in apps/app_queue.c (the reference-counted device_state_watchers index and the device_state_cb fast-path early return);

I have reviewed the change in full and understand it, and I can debug and own it.
It was built against [master / 20.19] and tested on my test server a reload of a queue with
5000 members no longer pushes the stasis/m:devicestate:all taskprocessor past
its high water mark.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Workflow Check completed successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: app_queue: device_state_cb floods stasis/m:devicestate:all taskprocessor on reload of queues with many members

2 participants