feat: introduce SerialIO dispatcher + runOnSerialIOIfBackgroundThreading helper#2643
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses an ANR risk by moving BackgroundManager lifecycle-triggered JobScheduler operations off the main thread, reducing the chance of long synchronous Binder stalls during foreground/background transitions (notably observed on some Xiaomi/MIUI devices).
Changes:
- Offload
onFocus()cancellation of the background sync job tosuspendifyOnIO. - Offload
onUnfocused()scheduling of background work tosuspendifyOnIO. - Add in-file documentation explaining why lifecycle-triggered
JobScheduleroperations are offloaded.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
📊 Diff Coverage ReportDiff Coverage Report (Changed Lines Only)Gate: aggregate coverage on changed executable lines must be ≥ 80% (JaCoCo line data for lines touched in the diff). Changed Files Coverage
Overall (aggregate gate)48/52 touched executable lines covered (92.3% — requires ≥ 80%) |
fadi-george
approved these changes
May 11, 2026
This was referenced May 12, 2026
47522c0 to
7d4a763
Compare
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…dk_background_threading FF Wraps every IApplicationLifecycleHandler that does slow / blocking work on the main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All five handlers share one rollout knob, one ordering guarantee (the SerialIO single-thread executor), and one observable contract in tests. The handlers + why they were ANR-ing BackgroundManager.onFocus / onUnfocused Synchronous JobScheduler.cancel / .schedule on the main thread. Binder transactions to system_server that can block for many seconds on Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a 20,796 ms main-thread block on a 25078RA3EL / Android 15 device. NotificationsManager.onFocus refreshNotificationState() drives NotificationRestoreWorkManager .beginEnqueueingWork, which lazily constructs WorkManager (opens / migrates the SQLite store at app_data/databases/androidx.work.workdb on first call) and then writes a WorkSpec row. OTel insertId 9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 / Android 15 device. Short-circuits on `restored = true` after the first call, so only the first focus event per process eats the SQLite stall. NotificationPermissionController polling lifecycle listener onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval and calls pollingWaiter.wake(), which dispatches a coroutine resume onto the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold start that hits the OneSignalDispatchers lazy chain (executor + dispatcher + scope construction) on the calling thread - 26 / 500 main-thread ANRs in logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of pushing the polling interval to 1 day to effectively pause polling. FeatureFlagsRefreshService.onFocus / onUnfocused onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO, same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused cancels the poll job; we route the cancellation through the same serial dispatcher so back-to-back focus -> unfocus stays globally ordered with onFocus's polling-job swap, and `synchronized(this)` is qualified as `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on the service instance (the same monitor restartForegroundPolling takes) rather than the no-receiver lambda object. SessionService.onFocus / onUnfocused sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the registered session-lifecycle handlers (operation repo, IAM trigger eval, etc.) synchronously, and the first one to touch OneSignalDispatchers pays the cold-init cost on the main thread - 25 / 500 ANRs in logs/2026-05-12 sit on this stack. session.startTime / session.focusTime / activeDuration accounting is preserved by capturing _time.currentTimeMillis on the caller's thread BEFORE the wrapper and passing it into the deferred handleOnFocus / handleOnUnfocused, so the timestamps reflect when Android delivered the event, not when the serial dispatcher ran the block. Rollout matrix (uniform across all five handlers) FF on -> runOnSerialIOIfBackgroundThreading { ... } dispatches to OneSignalDispatchers.SerialIO (single-thread executor). Main thread returns from handleFocus immediately. FF off -> the block runs inline on the lifecycle main thread. Legacy behavior; retains the ANR for the control cohort so the A/B comparison stays clean. Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on one path and won't bounce mid-run. Worth flagging that the production ANR samples for every handler in this PR were on FF=ON - because all five previously bypassed every threading helper, the FF did not gate any of these codepaths. This PR is what introduces the gate. Why the serial dispatcher specifically All five handlers are invoked from the same main-thread fanout (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A rapid focus burst on a multi-thread IO pool could interleave them with each other and with the BackgroundManager cancel/schedule pair. Pinning all five to the same single-thread executor keeps lifecycle work globally ordered on the main-thread submission order, and future per-event work added to any of these handlers (focus counters, notification analytics, session timing) inherits the ordering guarantee for free. Tests (all new specs pass; existing specs unchanged) * BackgroundManagerTests: existing tests + FF-on (dispatches through launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for both cancel and schedule. Includes a rapid unfocus -> focus burst test that pins both events through the serial dispatcher in submission order. * NotificationsManagerTests: dispatch contract on onFocus + rapid focus burst preserves submission order. Lambda body is observable (the test stub invokes the captured block) so JaCoCo sees the refreshNotificationState() call covered. * NotificationPermissionControllerTests: dispatch contract for the polling lifecycle listener on both onFocus and onUnfocused. Existing polling integration tests still pass under the FF-off default. * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through runOnSerialIOIfBackgroundThreading. * SessionServiceTests: existing state-mutation assertions still pass under the FF-off default (the wrapper runs inline). New assertions for the dispatch contract on onFocus + onUnfocused + the rapid burst. :OneSignal:core + :OneSignal:notifications detekt + full unit suites green. Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…old-start ANRs ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading) shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO, all bottoming out in the same OneSignalDispatchers lazy chain: ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch OneSignalDispatchers.IOScope.<init> (by lazy) OneSignalDispatchers.IO (by lazy) OneSignalDispatchers.ioExecutor (by lazy) The first IO consumer in the process pays the executor + dispatcher + scope construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan on its thread. Under sdk_background_threading whichever main-thread caller wins the race eats 5-20s before the watchdog fires. #2644 routes the five known onFocus / onUnfocused handlers through runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the deeper structural problem is the lazy chain itself - a future call site that slips past the FF gate (or a JobService delivered to main before init has run) hits the same stall. OneSignalDispatchers.prewarm() spawns a dedicated short-lived "OneSignal-prewarm" daemon thread that submits one empty launch on each of IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end so the next production caller - even on the main thread - only sees the cheap "submit work to an already-constructed executor" cost. * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls from init / suspend init / SyncJobService.onStartJob no-op cheaply. An internal resetPrewarmForTest() lets specs exercise the "first call wins" branch independently. * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO / SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes wrong, so a failed prewarm just means the first real caller pays the original cost. * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit or starves UI work. Called from: * OneSignalImp.initWithContext(context, appId) (sync variant) * OneSignalImp.initWithContextSuspend(context, appId) (suspend variant, used by re-entrant suspend callers) * SyncJobService.onStartJob BEFORE suspendifyOnIO (JobService can fire before the host app init runs) Tests (:core OneSignalDispatchersTests) * prewarm returns immediately on the caller and the daemon thread brings IO / Default / SerialIO + their scopes to Active. * prewarm is idempotent - second call does not spawn another OneSignal-prewarm thread (verified via thread-name scan). Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500 main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading helpers (47 onFocus + 23 JobService + 25 SessionService). :OneSignal:core detekt + full unit suite green. Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…dk_background_threading FF Wraps every IApplicationLifecycleHandler that does slow / blocking work on the main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All five handlers share one rollout knob, one ordering guarantee (the SerialIO single-thread executor), and one observable contract in tests. The handlers + why they were ANR-ing BackgroundManager.onFocus / onUnfocused Synchronous JobScheduler.cancel / .schedule on the main thread. Binder transactions to system_server that can block for many seconds on Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a 20,796 ms main-thread block on a 25078RA3EL / Android 15 device. NotificationsManager.onFocus refreshNotificationState() drives NotificationRestoreWorkManager .beginEnqueueingWork, which lazily constructs WorkManager (opens / migrates the SQLite store at app_data/databases/androidx.work.workdb on first call) and then writes a WorkSpec row. OTel insertId 9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 / Android 15 device. Short-circuits on `restored = true` after the first call, so only the first focus event per process eats the SQLite stall. NotificationPermissionController polling lifecycle listener onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval and calls pollingWaiter.wake(), which dispatches a coroutine resume onto the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold start that hits the OneSignalDispatchers lazy chain (executor + dispatcher + scope construction) on the calling thread - 26 / 500 main-thread ANRs in logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of pushing the polling interval to 1 day to effectively pause polling. FeatureFlagsRefreshService.onFocus / onUnfocused onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO, same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused cancels the poll job; we route the cancellation through the same serial dispatcher so back-to-back focus -> unfocus stays globally ordered with onFocus's polling-job swap, and `synchronized(this)` is qualified as `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on the service instance (the same monitor restartForegroundPolling takes) rather than the no-receiver lambda object. SessionService.onFocus / onUnfocused sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the registered session-lifecycle handlers (operation repo, IAM trigger eval, etc.) synchronously, and the first one to touch OneSignalDispatchers pays the cold-init cost on the main thread - 25 / 500 ANRs in logs/2026-05-12 sit on this stack. session.startTime / session.focusTime / activeDuration accounting is preserved by capturing _time.currentTimeMillis on the caller's thread BEFORE the wrapper and passing it into the deferred handleOnFocus / handleOnUnfocused, so the timestamps reflect when Android delivered the event, not when the serial dispatcher ran the block. Rollout matrix (uniform across all five handlers) FF on -> runOnSerialIOIfBackgroundThreading { ... } dispatches to OneSignalDispatchers.SerialIO (single-thread executor). Main thread returns from handleFocus immediately. FF off -> the block runs inline on the lifecycle main thread. Legacy behavior; retains the ANR for the control cohort so the A/B comparison stays clean. Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on one path and won't bounce mid-run. Worth flagging that the production ANR samples for every handler in this PR were on FF=ON - because all five previously bypassed every threading helper, the FF did not gate any of these codepaths. This PR is what introduces the gate. Why the serial dispatcher specifically All five handlers are invoked from the same main-thread fanout (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A rapid focus burst on a multi-thread IO pool could interleave them with each other and with the BackgroundManager cancel/schedule pair. Pinning all five to the same single-thread executor keeps lifecycle work globally ordered on the main-thread submission order, and future per-event work added to any of these handlers (focus counters, notification analytics, session timing) inherits the ordering guarantee for free. Tests (all new specs pass; existing specs unchanged) * BackgroundManagerTests: existing tests + FF-on (dispatches through launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for both cancel and schedule. Includes a rapid unfocus -> focus burst test that pins both events through the serial dispatcher in submission order. * NotificationsManagerTests: dispatch contract on onFocus + rapid focus burst preserves submission order. Lambda body is observable (the test stub invokes the captured block) so JaCoCo sees the refreshNotificationState() call covered. * NotificationPermissionControllerTests: dispatch contract for the polling lifecycle listener on both onFocus and onUnfocused. Existing polling integration tests still pass under the FF-off default. * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through runOnSerialIOIfBackgroundThreading. * SessionServiceTests: existing state-mutation assertions still pass under the FF-off default (the wrapper runs inline). New assertions for the dispatch contract on onFocus + onUnfocused + the rapid burst. :OneSignal:core + :OneSignal:notifications detekt + full unit suites green. Co-authored-by: Cursor <cursoragent@cursor.com>
…ing helper
Introduces the threading infrastructure that the follow-up PRs depend on. This
PR adds the helpers and tests; it does not change any production call sites.
What it adds
* OneSignalDispatchers.SerialIO
A single-thread, named ("OneSignal-SerialIO") CoroutineDispatcher backed
by Executors.newSingleThreadExecutor with a SupervisorJob + CoroutineScope.
Falls back to Dispatchers.IO.limitedParallelism(1) if executor construction
fails. Submission order on the dispatcher == execution order on its single
worker, which is exactly the semantics the focus / unfocus lifecycle
handlers need (see the next PR).
Companion: launchOnSerialIO { ... } and a SerialIO entry in
OneSignalDispatchers.getPerformanceMetrics() / getStatus().
* ThreadUtils.suspendifyOnSerialIO { ... }
Always-on serial dispatch. Wraps OneSignalDispatchers.launchOnSerialIO and
is intentionally NOT gated on ThreadingMode.useBackgroundThreading - some
code paths need ordered off-main execution unconditionally.
* ThreadUtils.runOnSerialIOIfBackgroundThreading { ... }
FF-gated wrapper for non-suspending blocks. When
ThreadingMode.useBackgroundThreading is true the block is dispatched to
SerialIO; when false the block runs inline on the calling thread. This is
the call shape every subsequent focus / unfocus handler in this series
uses, so the rollout matrix stays one-knob simple.
Block is non-suspending on purpose: the FF-off branch executes on the
caller's thread, and a suspending block there would force a runBlocking,
which defeats the purpose of an A/B comparison.
* IOMockHelper stubs the new helpers
suspendifyOnSerialIO + launchOnSerialIO are tracked by awaitIO() so
existing specs stay deterministic. runOnSerialIOIfBackgroundThreading is
stubbed inline-on-test-thread by default so existing call-site specs keep
their observable behavior; specs that want to exercise the FF-on (offload)
branch can override the stub.
Tests
* OneSignalDispatchersTests: new SerialIO cases - construction, lazy chain
activates on first launch, getStatus reports Active + queue size, falls
back to the limitedParallelism(1) path if executor construction fails.
getStatus + getPerformanceMetrics are refactored to extract executorStatus
+ scopeStatus inline helpers to keep them under Detekt's LongMethod /
ComplexMethod thresholds.
* ThreadUtilsFeatureFlagTests: new cases that suspendifyOnSerialIO always
routes through the serial dispatcher (FF-agnostic), and that
runOnSerialIOIfBackgroundThreading routes through the serial dispatcher
when the FF is on and runs inline when the FF is off.
Why a dedicated serial dispatcher (not just suspendifyOnIO)
Multi-thread IO pools don't guarantee submission order = execution order. A
rapid focus burst (activity restart, share flow popping the activity back/
forth) could otherwise interleave cancel/schedule pairs or session-state
mutations. Pinning order-sensitive lifecycle work to a single executor keeps
it globally ordered, and future per-event work (focus counters, session
timing, analytics) inherits the guarantee for free.
:OneSignal:core detekt + full unit suite green. No production behavior change
in this PR; the follow-up PRs land the call-site offloads (#2644) and the
dispatcher prewarm (#2645).
Co-authored-by: Cursor <cursoragent@cursor.com>
7d4a763 to
6e22c2e
Compare
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…dk_background_threading FF Wraps every IApplicationLifecycleHandler that does slow / blocking work on the main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All five handlers share one rollout knob, one ordering guarantee (the SerialIO single-thread executor), and one observable contract in tests. The handlers + why they were ANR-ing BackgroundManager.onFocus / onUnfocused Synchronous JobScheduler.cancel / .schedule on the main thread. Binder transactions to system_server that can block for many seconds on Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a 20,796 ms main-thread block on a 25078RA3EL / Android 15 device. NotificationsManager.onFocus refreshNotificationState() drives NotificationRestoreWorkManager .beginEnqueueingWork, which lazily constructs WorkManager (opens / migrates the SQLite store at app_data/databases/androidx.work.workdb on first call) and then writes a WorkSpec row. OTel insertId 9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 / Android 15 device. Short-circuits on `restored = true` after the first call, so only the first focus event per process eats the SQLite stall. NotificationPermissionController polling lifecycle listener onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval and calls pollingWaiter.wake(), which dispatches a coroutine resume onto the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold start that hits the OneSignalDispatchers lazy chain (executor + dispatcher + scope construction) on the calling thread - 26 / 500 main-thread ANRs in logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of pushing the polling interval to 1 day to effectively pause polling. FeatureFlagsRefreshService.onFocus / onUnfocused onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO, same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused cancels the poll job; we route the cancellation through the same serial dispatcher so back-to-back focus -> unfocus stays globally ordered with onFocus's polling-job swap, and `synchronized(this)` is qualified as `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on the service instance (the same monitor restartForegroundPolling takes) rather than the no-receiver lambda object. SessionService.onFocus / onUnfocused sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the registered session-lifecycle handlers (operation repo, IAM trigger eval, etc.) synchronously, and the first one to touch OneSignalDispatchers pays the cold-init cost on the main thread - 25 / 500 ANRs in logs/2026-05-12 sit on this stack. session.startTime / session.focusTime / activeDuration accounting is preserved by capturing _time.currentTimeMillis on the caller's thread BEFORE the wrapper and passing it into the deferred handleOnFocus / handleOnUnfocused, so the timestamps reflect when Android delivered the event, not when the serial dispatcher ran the block. Rollout matrix (uniform across all five handlers) FF on -> runOnSerialIOIfBackgroundThreading { ... } dispatches to OneSignalDispatchers.SerialIO (single-thread executor). Main thread returns from handleFocus immediately. FF off -> the block runs inline on the lifecycle main thread. Legacy behavior; retains the ANR for the control cohort so the A/B comparison stays clean. Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on one path and won't bounce mid-run. Worth flagging that the production ANR samples for every handler in this PR were on FF=ON - because all five previously bypassed every threading helper, the FF did not gate any of these codepaths. This PR is what introduces the gate. Why the serial dispatcher specifically All five handlers are invoked from the same main-thread fanout (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A rapid focus burst on a multi-thread IO pool could interleave them with each other and with the BackgroundManager cancel/schedule pair. Pinning all five to the same single-thread executor keeps lifecycle work globally ordered on the main-thread submission order, and future per-event work added to any of these handlers (focus counters, notification analytics, session timing) inherits the ordering guarantee for free. Tests (all new specs pass; existing specs unchanged) * BackgroundManagerTests: existing tests + FF-on (dispatches through launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for both cancel and schedule. Includes a rapid unfocus -> focus burst test that pins both events through the serial dispatcher in submission order. * NotificationsManagerTests: dispatch contract on onFocus + rapid focus burst preserves submission order. Lambda body is observable (the test stub invokes the captured block) so JaCoCo sees the refreshNotificationState() call covered. * NotificationPermissionControllerTests: dispatch contract for the polling lifecycle listener on both onFocus and onUnfocused. Existing polling integration tests still pass under the FF-off default. * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through runOnSerialIOIfBackgroundThreading. * SessionServiceTests: existing state-mutation assertions still pass under the FF-off default (the wrapper runs inline). New assertions for the dispatch contract on onFocus + onUnfocused + the rapid burst. :OneSignal:core + :OneSignal:notifications detekt + full unit suites green. Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…old-start ANRs ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading) shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO, all bottoming out in the same OneSignalDispatchers lazy chain: ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch OneSignalDispatchers.IOScope.<init> (by lazy) OneSignalDispatchers.IO (by lazy) OneSignalDispatchers.ioExecutor (by lazy) The first IO consumer in the process pays the executor + dispatcher + scope construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan on its thread. Under sdk_background_threading whichever main-thread caller wins the race eats 5-20s before the watchdog fires. #2644 routes the five known onFocus / onUnfocused handlers through runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the deeper structural problem is the lazy chain itself - a future call site that slips past the FF gate (or a JobService delivered to main before init has run) hits the same stall. OneSignalDispatchers.prewarm() spawns a dedicated short-lived "OneSignal-prewarm" daemon thread that submits one empty launch on each of IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end so the next production caller - even on the main thread - only sees the cheap "submit work to an already-constructed executor" cost. * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls from init / suspend init / SyncJobService.onStartJob no-op cheaply. An internal resetPrewarmForTest() lets specs exercise the "first call wins" branch independently. * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO / SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes wrong, so a failed prewarm just means the first real caller pays the original cost. * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit or starves UI work. Called from: * OneSignalImp.initWithContext(context, appId) (sync variant) * OneSignalImp.initWithContextSuspend(context, appId) (suspend variant, used by re-entrant suspend callers) * SyncJobService.onStartJob BEFORE suspendifyOnIO (JobService can fire before the host app init runs) Tests (:core OneSignalDispatchersTests) * prewarm returns immediately on the caller and the daemon thread brings IO / Default / SerialIO + their scopes to Active. * prewarm is idempotent - second call does not spawn another OneSignal-prewarm thread (verified via thread-name scan). Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500 main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading helpers (47 onFocus + 23 JobService + 25 SessionService). :OneSignal:core detekt + full unit suite green. Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…dk_background_threading FF Wraps every IApplicationLifecycleHandler that does slow / blocking work on the main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All five handlers share one rollout knob, one ordering guarantee (the SerialIO single-thread executor), and one observable contract in tests. The handlers + why they were ANR-ing BackgroundManager.onFocus / onUnfocused Synchronous JobScheduler.cancel / .schedule on the main thread. Binder transactions to system_server that can block for many seconds on Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a 20,796 ms main-thread block on a 25078RA3EL / Android 15 device. NotificationsManager.onFocus refreshNotificationState() drives NotificationRestoreWorkManager .beginEnqueueingWork, which lazily constructs WorkManager (opens / migrates the SQLite store at app_data/databases/androidx.work.workdb on first call) and then writes a WorkSpec row. OTel insertId 9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 / Android 15 device. Short-circuits on `restored = true` after the first call, so only the first focus event per process eats the SQLite stall. NotificationPermissionController polling lifecycle listener onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval and calls pollingWaiter.wake(), which dispatches a coroutine resume onto the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold start that hits the OneSignalDispatchers lazy chain (executor + dispatcher + scope construction) on the calling thread - 26 / 500 main-thread ANRs in logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of pushing the polling interval to 1 day to effectively pause polling. FeatureFlagsRefreshService.onFocus / onUnfocused onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO, same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused cancels the poll job; we route the cancellation through the same serial dispatcher so back-to-back focus -> unfocus stays globally ordered with onFocus's polling-job swap, and `synchronized(this)` is qualified as `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on the service instance (the same monitor restartForegroundPolling takes) rather than the no-receiver lambda object. SessionService.onFocus / onUnfocused sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the registered session-lifecycle handlers (operation repo, IAM trigger eval, etc.) synchronously, and the first one to touch OneSignalDispatchers pays the cold-init cost on the main thread - 25 / 500 ANRs in logs/2026-05-12 sit on this stack. session.startTime / session.focusTime / activeDuration accounting is preserved by capturing _time.currentTimeMillis on the caller's thread BEFORE the wrapper and passing it into the deferred handleOnFocus / handleOnUnfocused, so the timestamps reflect when Android delivered the event, not when the serial dispatcher ran the block. Rollout matrix (uniform across all five handlers) FF on -> runOnSerialIOIfBackgroundThreading { ... } dispatches to OneSignalDispatchers.SerialIO (single-thread executor). Main thread returns from handleFocus immediately. FF off -> the block runs inline on the lifecycle main thread. Legacy behavior; retains the ANR for the control cohort so the A/B comparison stays clean. Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on one path and won't bounce mid-run. Worth flagging that the production ANR samples for every handler in this PR were on FF=ON - because all five previously bypassed every threading helper, the FF did not gate any of these codepaths. This PR is what introduces the gate. Why the serial dispatcher specifically All five handlers are invoked from the same main-thread fanout (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A rapid focus burst on a multi-thread IO pool could interleave them with each other and with the BackgroundManager cancel/schedule pair. Pinning all five to the same single-thread executor keeps lifecycle work globally ordered on the main-thread submission order, and future per-event work added to any of these handlers (focus counters, notification analytics, session timing) inherits the ordering guarantee for free. Tests (all new specs pass; existing specs unchanged) * BackgroundManagerTests: existing tests + FF-on (dispatches through launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for both cancel and schedule. Includes a rapid unfocus -> focus burst test that pins both events through the serial dispatcher in submission order. * NotificationsManagerTests: dispatch contract on onFocus + rapid focus burst preserves submission order. Lambda body is observable (the test stub invokes the captured block) so JaCoCo sees the refreshNotificationState() call covered. * NotificationPermissionControllerTests: dispatch contract for the polling lifecycle listener on both onFocus and onUnfocused. Existing polling integration tests still pass under the FF-off default. * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through runOnSerialIOIfBackgroundThreading. * SessionServiceTests: existing state-mutation assertions still pass under the FF-off default (the wrapper runs inline). New assertions for the dispatch contract on onFocus + onUnfocused + the rapid burst. :OneSignal:core + :OneSignal:notifications detekt + full unit suites green. Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…old-start ANRs ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading) shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO, all bottoming out in the same OneSignalDispatchers lazy chain: ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch OneSignalDispatchers.IOScope.<init> (by lazy) OneSignalDispatchers.IO (by lazy) OneSignalDispatchers.ioExecutor (by lazy) The first IO consumer in the process pays the executor + dispatcher + scope construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan on its thread. Under sdk_background_threading whichever main-thread caller wins the race eats 5-20s before the watchdog fires. #2644 routes the five known onFocus / onUnfocused handlers through runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the deeper structural problem is the lazy chain itself - a future call site that slips past the FF gate (or a JobService delivered to main before init has run) hits the same stall. OneSignalDispatchers.prewarm() spawns a dedicated short-lived "OneSignal-prewarm" daemon thread that submits one empty launch on each of IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end so the next production caller - even on the main thread - only sees the cheap "submit work to an already-constructed executor" cost. * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls from init / suspend init / SyncJobService.onStartJob no-op cheaply. An internal resetPrewarmForTest() lets specs exercise the "first call wins" branch independently. * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO / SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes wrong, so a failed prewarm just means the first real caller pays the original cost. * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit or starves UI work. Called from: * OneSignalImp.initWithContext(context, appId) (sync variant) * OneSignalImp.initWithContextSuspend(context, appId) (suspend variant, used by re-entrant suspend callers) * SyncJobService.onStartJob BEFORE suspendifyOnIO (JobService can fire before the host app init runs) Tests (:core OneSignalDispatchersTests) * prewarm returns immediately on the caller and the daemon thread brings IO / Default / SerialIO + their scopes to Active. * prewarm is idempotent - second call does not spawn another OneSignal-prewarm thread (verified via thread-name scan). Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500 main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading helpers (47 onFocus + 23 JobService + 25 SessionService). :OneSignal:core detekt + full unit suite green. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
One Line Summary
Introduce the threading infrastructure for the focus / unfocus ANR work:
OneSignalDispatchers.SerialIO,suspendifyOnSerialIO, and the FF-gatedrunOnSerialIOIfBackgroundThreadinghelper. No production call sites move in this PR — that's #2644.Linear: SDK-4505
Project: Android: refactor loading during initialization
Base branch:
mainWhat this PR adds
OneSignalDispatchers.SerialIO— a single-thread, named (OneSignal-SerialIO)CoroutineDispatcherbacked byExecutors.newSingleThreadExecutorwith aSupervisorJob+CoroutineScope. Falls back toDispatchers.IO.limitedParallelism(1)if executor construction fails. Submission order on the dispatcher == execution order on its single worker, which is exactly the semantics the lifecycle handlers in fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF #2644 need. Companion:launchOnSerialIO { ... }and aSerialIOentry inOneSignalDispatchers.getPerformanceMetrics() / getStatus().ThreadUtils.suspendifyOnSerialIO { ... }— always-on serial dispatch. WrapsOneSignalDispatchers.launchOnSerialIOand is intentionally NOT gated onThreadingMode.useBackgroundThreading— some code paths need ordered off-main execution unconditionally.ThreadUtils.runOnSerialIOIfBackgroundThreading { ... }— FF-gated wrapper for non-suspending blocks. WhenThreadingMode.useBackgroundThreadingistruethe block is dispatched toSerialIO; whenfalsethe block runs inline on the calling thread. This is the call shape every lifecycle handler in fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF #2644 uses, so the rollout matrix stays one-knob simple. The block is non-suspending on purpose: the FF-off branch runs on the caller's thread, and a suspending block there would force arunBlocking, which defeats the purpose of an A/B comparison.IOMockHelperstubs the new helpers —suspendifyOnSerialIO+launchOnSerialIOare tracked byawaitIO()so existing specs stay deterministic.runOnSerialIOIfBackgroundThreadingis stubbed inline-on-test-thread by default so existing call-site specs keep their observable behavior; specs that want to exercise the FF-on (offload) branch can override the stub.Why a dedicated serial dispatcher (not just
suspendifyOnIO)Multi-thread IO pools don't guarantee submission order == execution order. A rapid focus burst (activity restart, share flow popping the activity back/forth) could otherwise interleave
cancel/schedulepairs or session-state mutations across worker threads. Pinning order-sensitive lifecycle work to a single executor keeps it globally ordered, and future per-event work (focus counters, session timing, analytics) inherits the guarantee for free.Testing
Static
:OneSignal:core:detekt— clean.getStatus+getPerformanceMetricswere refactored to extractexecutorStatus+scopeStatusinline helpers to keep them under Detekt'sLongMethod/ComplexMethodthresholds.Automated
:OneSignal:core:testReleaseUnitTest— full suite green, including:OneSignalDispatchersTests— newSerialIOcases (construction, lazy chain activates on first launch,getStatusreportsActive+ queue size, falls back to thelimitedParallelism(1)path if executor construction fails).ThreadUtilsFeatureFlagTests— new cases thatsuspendifyOnSerialIOalways routes through the serial dispatcher (FF-agnostic), and thatrunOnSerialIOIfBackgroundThreadingroutes through the serial dispatcher when the FF is on and runs inline when the FF is off.Scope
OneSignalDispatchers.SerialIO+launchOnSerialIO.ThreadUtils.suspendifyOnSerialIO+ThreadUtils.runOnSerialIOIfBackgroundThreading.IOMockHelpermocks for the above.OneSignalDispatchers.getStatus/getPerformanceMetricsextractexecutorStatus/scopeStatushelpers (no behavior change; brings the methods under Detekt thresholds with the new SerialIO entry).Follow-ups
BackgroundManager,NotificationsManager,NotificationPermissionController,FeatureFlagsRefreshService,SessionService).OneSignalDispatchers.prewarm()to move the lazy-chain construction cost off the main thread on cold start.Checklist
Overview
Testing
runOnSerialIOIfBackgroundThreadingcovered, plus SerialIO lifecycle / fallback / status coverageFinal pass