Skip to content

feat: introduce SerialIO dispatcher + runOnSerialIOIfBackgroundThreading helper#2643

Merged
abdulraqeeb33 merged 1 commit into
mainfrom
ar/sdk-4505
May 12, 2026
Merged

feat: introduce SerialIO dispatcher + runOnSerialIOIfBackgroundThreading helper#2643
abdulraqeeb33 merged 1 commit into
mainfrom
ar/sdk-4505

Conversation

@abdulraqeeb33
Copy link
Copy Markdown
Contributor

@abdulraqeeb33 abdulraqeeb33 commented May 11, 2026

One Line Summary

Introduce the threading infrastructure for the focus / unfocus ANR work: OneSignalDispatchers.SerialIO, suspendifyOnSerialIO, and the FF-gated runOnSerialIOIfBackgroundThreading helper. No production call sites move in this PR — that's #2644.

Linear: SDK-4505
Project: Android: refactor loading during initialization
Base branch: main

What this PR adds

  • OneSignalDispatchers.SerialIO — a single-thread, named (OneSignal-SerialIO) CoroutineDispatcher backed by Executors.newSingleThreadExecutor with a SupervisorJob + CoroutineScope. Falls back to Dispatchers.IO.limitedParallelism(1) if executor construction fails. Submission order on the dispatcher == execution order on its single worker, which is exactly the semantics the lifecycle handlers in fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF #2644 need. Companion: launchOnSerialIO { ... } and a SerialIO entry in OneSignalDispatchers.getPerformanceMetrics() / getStatus().
  • ThreadUtils.suspendifyOnSerialIO { ... } — always-on serial dispatch. Wraps OneSignalDispatchers.launchOnSerialIO and is intentionally NOT gated on ThreadingMode.useBackgroundThreading — some code paths need ordered off-main execution unconditionally.
  • ThreadUtils.runOnSerialIOIfBackgroundThreading { ... } — FF-gated wrapper for non-suspending blocks. When ThreadingMode.useBackgroundThreading is true the block is dispatched to SerialIO; when false the block runs inline on the calling thread. This is the call shape every lifecycle handler in fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF #2644 uses, so the rollout matrix stays one-knob simple. The block is non-suspending on purpose: the FF-off branch runs on the caller's thread, and a suspending block there would force a runBlocking, which defeats the purpose of an A/B comparison.
  • IOMockHelper stubs the new helpers — suspendifyOnSerialIO + launchOnSerialIO are tracked by awaitIO() so existing specs stay deterministic. runOnSerialIOIfBackgroundThreading is stubbed inline-on-test-thread by default so existing call-site specs keep their observable behavior; specs that want to exercise the FF-on (offload) branch can override the stub.

Why a dedicated serial dispatcher (not just suspendifyOnIO)

Multi-thread IO pools don't guarantee submission order == execution order. A rapid focus burst (activity restart, share flow popping the activity back/forth) could otherwise interleave cancel/schedule pairs or session-state mutations across worker threads. Pinning order-sensitive lifecycle work to a single executor keeps it globally ordered, and future per-event work (focus counters, session timing, analytics) inherits the guarantee for free.

Testing

Static

  • :OneSignal:core:detekt — clean. getStatus + getPerformanceMetrics were refactored to extract executorStatus + scopeStatus inline helpers to keep them under Detekt's LongMethod / ComplexMethod thresholds.

Automated

  • :OneSignal:core:testReleaseUnitTest — full suite green, including:
    • OneSignalDispatchersTests — new SerialIO cases (construction, lazy chain activates on first launch, getStatus reports Active + queue size, falls back to the limitedParallelism(1) path if executor construction fails).
    • ThreadUtilsFeatureFlagTests — new cases that suspendifyOnSerialIO always routes through the serial dispatcher (FF-agnostic), and that runOnSerialIOIfBackgroundThreading routes through the serial dispatcher when the FF is on and runs inline when the FF is off.

Scope

  • New: OneSignalDispatchers.SerialIO + launchOnSerialIO.
  • New: ThreadUtils.suspendifyOnSerialIO + ThreadUtils.runOnSerialIOIfBackgroundThreading.
  • New: IOMockHelper mocks for the above.
  • Refactor: OneSignalDispatchers.getStatus / getPerformanceMetrics extract executorStatus / scopeStatus helpers (no behavior change; brings the methods under Detekt thresholds with the new SerialIO entry).
  • Unchanged: every production lifecycle handler still runs inline on the main thread. The call-site offloads land in fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF #2644.

Follow-ups

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing (introduce the threading helpers + tests)
  • No public API changes

Testing

  • Both branches of runOnSerialIOIfBackgroundThreading covered, plus SerialIO lifecycle / fallback / status coverage
  • Existing automated tests still pass for the touched modules

Final pass

  • Code is as readable as possible
  • I have reviewed this PR myself

Copilot AI review requested due to automatic review settings May 11, 2026 17:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an ANR risk by moving BackgroundManager lifecycle-triggered JobScheduler operations off the main thread, reducing the chance of long synchronous Binder stalls during foreground/background transitions (notably observed on some Xiaomi/MIUI devices).

Changes:

  • Offload onFocus() cancellation of the background sync job to suspendifyOnIO.
  • Offload onUnfocused() scheduling of background work to suspendifyOnIO.
  • Add in-file documentation explaining why lifecycle-triggered JobScheduler operations are offloaded.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

📊 Diff Coverage Report

Diff Coverage Report (Changed Lines Only)

Gate: aggregate coverage on changed executable lines must be ≥ 80% (JaCoCo line data for lines touched in the diff).

Changed Files Coverage

  • OneSignalDispatchers.kt: 37/41 touched executable lines (90.2%) (77 touched lines in diff)
  • ThreadUtils.kt: 11/11 touched executable lines (100.0%) (32 touched lines in diff)
  • ⚠️ IOMockHelper.kt: Not in coverage report (may not be compiled/tested)

Overall (aggregate gate)

48/52 touched executable lines covered (92.3% — requires ≥ 80%)

📥 View workflow run

abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…dk_background_threading FF

Wraps every IApplicationLifecycleHandler that does slow / blocking work on the
main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All
five handlers share one rollout knob, one ordering guarantee (the SerialIO
single-thread executor), and one observable contract in tests.

The handlers + why they were ANR-ing

  BackgroundManager.onFocus / onUnfocused
    Synchronous JobScheduler.cancel / .schedule on the main thread. Binder
    transactions to system_server that can block for many seconds on
    Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a
    20,796 ms main-thread block on a 25078RA3EL / Android 15 device.

  NotificationsManager.onFocus
    refreshNotificationState() drives NotificationRestoreWorkManager
    .beginEnqueueingWork, which lazily constructs WorkManager (opens /
    migrates the SQLite store at app_data/databases/androidx.work.workdb on
    first call) and then writes a WorkSpec row. OTel insertId
    9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 /
    Android 15 device. Short-circuits on `restored = true` after the first
    call, so only the first focus event per process eats the SQLite stall.

  NotificationPermissionController polling lifecycle listener
    onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval
    and calls pollingWaiter.wake(), which dispatches a coroutine resume onto
    the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold
    start that hits the OneSignalDispatchers lazy chain (executor + dispatcher
    + scope construction) on the calling thread - 26 / 500 main-thread ANRs in
    logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of
    pushing the polling interval to 1 day to effectively pause polling.

  FeatureFlagsRefreshService.onFocus / onUnfocused
    onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO,
    same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused
    cancels the poll job; we route the cancellation through the same serial
    dispatcher so back-to-back focus -> unfocus stays globally ordered with
    onFocus's polling-job swap, and `synchronized(this)` is qualified as
    `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on
    the service instance (the same monitor restartForegroundPolling takes)
    rather than the no-receiver lambda object.

  SessionService.onFocus / onUnfocused
    sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the
    registered session-lifecycle handlers (operation repo, IAM trigger eval,
    etc.) synchronously, and the first one to touch OneSignalDispatchers
    pays the cold-init cost on the main thread - 25 / 500 ANRs in
    logs/2026-05-12 sit on this stack. session.startTime / session.focusTime
    / activeDuration accounting is preserved by capturing
    _time.currentTimeMillis on the caller's thread BEFORE the wrapper and
    passing it into the deferred handleOnFocus / handleOnUnfocused, so the
    timestamps reflect when Android delivered the event, not when the serial
    dispatcher ran the block.

Rollout matrix (uniform across all five handlers)

  FF on  -> runOnSerialIOIfBackgroundThreading { ... } dispatches to
            OneSignalDispatchers.SerialIO (single-thread executor). Main
            thread returns from handleFocus immediately.
  FF off -> the block runs inline on the lifecycle main thread. Legacy
            behavior; retains the ANR for the control cohort so the A/B
            comparison stays clean.

Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on
one path and won't bounce mid-run. Worth flagging that the production ANR
samples for every handler in this PR were on FF=ON - because all five
previously bypassed every threading helper, the FF did not gate any of these
codepaths. This PR is what introduces the gate.

Why the serial dispatcher specifically

  All five handlers are invoked from the same main-thread fanout
  (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A
  rapid focus burst on a multi-thread IO pool could interleave them with
  each other and with the BackgroundManager cancel/schedule pair. Pinning all
  five to the same single-thread executor keeps lifecycle work globally
  ordered on the main-thread submission order, and future per-event work
  added to any of these handlers (focus counters, notification analytics,
  session timing) inherits the ordering guarantee for free.

Tests (all new specs pass; existing specs unchanged)

  * BackgroundManagerTests: existing tests + FF-on (dispatches through
    launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for
    both cancel and schedule. Includes a rapid unfocus -> focus burst test
    that pins both events through the serial dispatcher in submission order.
  * NotificationsManagerTests: dispatch contract on onFocus + rapid focus
    burst preserves submission order. Lambda body is observable (the test
    stub invokes the captured block) so JaCoCo sees the
    refreshNotificationState() call covered.
  * NotificationPermissionControllerTests: dispatch contract for the polling
    lifecycle listener on both onFocus and onUnfocused. Existing polling
    integration tests still pass under the FF-off default.
  * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through
    runOnSerialIOIfBackgroundThreading.
  * SessionServiceTests: existing state-mutation assertions still pass under
    the FF-off default (the wrapper runs inline). New assertions for the
    dispatch contract on onFocus + onUnfocused + the rapid burst.

:OneSignal:core + :OneSignal:notifications detekt + full unit suites green.

Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…old-start ANRs

ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading)
shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO,
all bottoming out in the same OneSignalDispatchers lazy chain:

  ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer
  CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch
  OneSignalDispatchers.IOScope.<init>   (by lazy)
  OneSignalDispatchers.IO               (by lazy)
  OneSignalDispatchers.ioExecutor       (by lazy)

The first IO consumer in the process pays the executor + dispatcher + scope
construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan
on its thread. Under sdk_background_threading whichever main-thread caller wins
the race eats 5-20s before the watchdog fires.

#2644 routes the five known onFocus / onUnfocused handlers through
runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the
deeper structural problem is the lazy chain itself - a future call site that
slips past the FF gate (or a JobService delivered to main before init has run)
hits the same stall.

OneSignalDispatchers.prewarm() spawns a dedicated short-lived
"OneSignal-prewarm" daemon thread that submits one empty launch on each of
IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end
so the next production caller - even on the main thread - only sees the cheap
"submit work to an already-constructed executor" cost.

  * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls
    from init / suspend init / SyncJobService.onStartJob no-op cheaply. An
    internal resetPrewarmForTest() lets specs exercise the "first call wins"
    branch independently.
  * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO /
    SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes
    wrong, so a failed prewarm just means the first real caller pays the
    original cost.
  * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit
    or starves UI work.

Called from:
  * OneSignalImp.initWithContext(context, appId)            (sync variant)
  * OneSignalImp.initWithContextSuspend(context, appId)     (suspend variant,
                                                             used by re-entrant
                                                             suspend callers)
  * SyncJobService.onStartJob BEFORE suspendifyOnIO         (JobService can fire
                                                             before the host app
                                                             init runs)

Tests (:core OneSignalDispatchersTests)
  * prewarm returns immediately on the caller and the daemon thread brings
    IO / Default / SerialIO + their scopes to Active.
  * prewarm is idempotent - second call does not spawn another
    OneSignal-prewarm thread (verified via thread-name scan).

Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500
main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading
helpers (47 onFocus + 23 JobService + 25 SessionService).

:OneSignal:core detekt + full unit suite green.

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 changed the title fix: SDK-4505: offload BackgroundManager lifecycle JobScheduler calls off the main thread feat: introduce SerialIO dispatcher + runOnSerialIOIfBackgroundThreading helper May 12, 2026
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…dk_background_threading FF

Wraps every IApplicationLifecycleHandler that does slow / blocking work on the
main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All
five handlers share one rollout knob, one ordering guarantee (the SerialIO
single-thread executor), and one observable contract in tests.

The handlers + why they were ANR-ing

  BackgroundManager.onFocus / onUnfocused
    Synchronous JobScheduler.cancel / .schedule on the main thread. Binder
    transactions to system_server that can block for many seconds on
    Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a
    20,796 ms main-thread block on a 25078RA3EL / Android 15 device.

  NotificationsManager.onFocus
    refreshNotificationState() drives NotificationRestoreWorkManager
    .beginEnqueueingWork, which lazily constructs WorkManager (opens /
    migrates the SQLite store at app_data/databases/androidx.work.workdb on
    first call) and then writes a WorkSpec row. OTel insertId
    9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 /
    Android 15 device. Short-circuits on `restored = true` after the first
    call, so only the first focus event per process eats the SQLite stall.

  NotificationPermissionController polling lifecycle listener
    onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval
    and calls pollingWaiter.wake(), which dispatches a coroutine resume onto
    the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold
    start that hits the OneSignalDispatchers lazy chain (executor + dispatcher
    + scope construction) on the calling thread - 26 / 500 main-thread ANRs in
    logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of
    pushing the polling interval to 1 day to effectively pause polling.

  FeatureFlagsRefreshService.onFocus / onUnfocused
    onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO,
    same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused
    cancels the poll job; we route the cancellation through the same serial
    dispatcher so back-to-back focus -> unfocus stays globally ordered with
    onFocus's polling-job swap, and `synchronized(this)` is qualified as
    `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on
    the service instance (the same monitor restartForegroundPolling takes)
    rather than the no-receiver lambda object.

  SessionService.onFocus / onUnfocused
    sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the
    registered session-lifecycle handlers (operation repo, IAM trigger eval,
    etc.) synchronously, and the first one to touch OneSignalDispatchers
    pays the cold-init cost on the main thread - 25 / 500 ANRs in
    logs/2026-05-12 sit on this stack. session.startTime / session.focusTime
    / activeDuration accounting is preserved by capturing
    _time.currentTimeMillis on the caller's thread BEFORE the wrapper and
    passing it into the deferred handleOnFocus / handleOnUnfocused, so the
    timestamps reflect when Android delivered the event, not when the serial
    dispatcher ran the block.

Rollout matrix (uniform across all five handlers)

  FF on  -> runOnSerialIOIfBackgroundThreading { ... } dispatches to
            OneSignalDispatchers.SerialIO (single-thread executor). Main
            thread returns from handleFocus immediately.
  FF off -> the block runs inline on the lifecycle main thread. Legacy
            behavior; retains the ANR for the control cohort so the A/B
            comparison stays clean.

Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on
one path and won't bounce mid-run. Worth flagging that the production ANR
samples for every handler in this PR were on FF=ON - because all five
previously bypassed every threading helper, the FF did not gate any of these
codepaths. This PR is what introduces the gate.

Why the serial dispatcher specifically

  All five handlers are invoked from the same main-thread fanout
  (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A
  rapid focus burst on a multi-thread IO pool could interleave them with
  each other and with the BackgroundManager cancel/schedule pair. Pinning all
  five to the same single-thread executor keeps lifecycle work globally
  ordered on the main-thread submission order, and future per-event work
  added to any of these handlers (focus counters, notification analytics,
  session timing) inherits the ordering guarantee for free.

Tests (all new specs pass; existing specs unchanged)

  * BackgroundManagerTests: existing tests + FF-on (dispatches through
    launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for
    both cancel and schedule. Includes a rapid unfocus -> focus burst test
    that pins both events through the serial dispatcher in submission order.
  * NotificationsManagerTests: dispatch contract on onFocus + rapid focus
    burst preserves submission order. Lambda body is observable (the test
    stub invokes the captured block) so JaCoCo sees the
    refreshNotificationState() call covered.
  * NotificationPermissionControllerTests: dispatch contract for the polling
    lifecycle listener on both onFocus and onUnfocused. Existing polling
    integration tests still pass under the FF-off default.
  * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through
    runOnSerialIOIfBackgroundThreading.
  * SessionServiceTests: existing state-mutation assertions still pass under
    the FF-off default (the wrapper runs inline). New assertions for the
    dispatch contract on onFocus + onUnfocused + the rapid burst.

:OneSignal:core + :OneSignal:notifications detekt + full unit suites green.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ing helper

Introduces the threading infrastructure that the follow-up PRs depend on. This
PR adds the helpers and tests; it does not change any production call sites.

What it adds

  * OneSignalDispatchers.SerialIO
    A single-thread, named ("OneSignal-SerialIO") CoroutineDispatcher backed
    by Executors.newSingleThreadExecutor with a SupervisorJob + CoroutineScope.
    Falls back to Dispatchers.IO.limitedParallelism(1) if executor construction
    fails. Submission order on the dispatcher == execution order on its single
    worker, which is exactly the semantics the focus / unfocus lifecycle
    handlers need (see the next PR).

    Companion: launchOnSerialIO { ... } and a SerialIO entry in
    OneSignalDispatchers.getPerformanceMetrics() / getStatus().

  * ThreadUtils.suspendifyOnSerialIO { ... }
    Always-on serial dispatch. Wraps OneSignalDispatchers.launchOnSerialIO and
    is intentionally NOT gated on ThreadingMode.useBackgroundThreading - some
    code paths need ordered off-main execution unconditionally.

  * ThreadUtils.runOnSerialIOIfBackgroundThreading { ... }
    FF-gated wrapper for non-suspending blocks. When
    ThreadingMode.useBackgroundThreading is true the block is dispatched to
    SerialIO; when false the block runs inline on the calling thread. This is
    the call shape every subsequent focus / unfocus handler in this series
    uses, so the rollout matrix stays one-knob simple.

    Block is non-suspending on purpose: the FF-off branch executes on the
    caller's thread, and a suspending block there would force a runBlocking,
    which defeats the purpose of an A/B comparison.

  * IOMockHelper stubs the new helpers
    suspendifyOnSerialIO + launchOnSerialIO are tracked by awaitIO() so
    existing specs stay deterministic. runOnSerialIOIfBackgroundThreading is
    stubbed inline-on-test-thread by default so existing call-site specs keep
    their observable behavior; specs that want to exercise the FF-on (offload)
    branch can override the stub.

Tests

  * OneSignalDispatchersTests: new SerialIO cases - construction, lazy chain
    activates on first launch, getStatus reports Active + queue size, falls
    back to the limitedParallelism(1) path if executor construction fails.
    getStatus + getPerformanceMetrics are refactored to extract executorStatus
    + scopeStatus inline helpers to keep them under Detekt's LongMethod /
    ComplexMethod thresholds.
  * ThreadUtilsFeatureFlagTests: new cases that suspendifyOnSerialIO always
    routes through the serial dispatcher (FF-agnostic), and that
    runOnSerialIOIfBackgroundThreading routes through the serial dispatcher
    when the FF is on and runs inline when the FF is off.

Why a dedicated serial dispatcher (not just suspendifyOnIO)

  Multi-thread IO pools don't guarantee submission order = execution order. A
  rapid focus burst (activity restart, share flow popping the activity back/
  forth) could otherwise interleave cancel/schedule pairs or session-state
  mutations. Pinning order-sensitive lifecycle work to a single executor keeps
  it globally ordered, and future per-event work (focus counters, session
  timing, analytics) inherits the guarantee for free.

:OneSignal:core detekt + full unit suite green. No production behavior change
in this PR; the follow-up PRs land the call-site offloads (#2644) and the
dispatcher prewarm (#2645).

Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…dk_background_threading FF

Wraps every IApplicationLifecycleHandler that does slow / blocking work on the
main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All
five handlers share one rollout knob, one ordering guarantee (the SerialIO
single-thread executor), and one observable contract in tests.

The handlers + why they were ANR-ing

  BackgroundManager.onFocus / onUnfocused
    Synchronous JobScheduler.cancel / .schedule on the main thread. Binder
    transactions to system_server that can block for many seconds on
    Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a
    20,796 ms main-thread block on a 25078RA3EL / Android 15 device.

  NotificationsManager.onFocus
    refreshNotificationState() drives NotificationRestoreWorkManager
    .beginEnqueueingWork, which lazily constructs WorkManager (opens /
    migrates the SQLite store at app_data/databases/androidx.work.workdb on
    first call) and then writes a WorkSpec row. OTel insertId
    9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 /
    Android 15 device. Short-circuits on `restored = true` after the first
    call, so only the first focus event per process eats the SQLite stall.

  NotificationPermissionController polling lifecycle listener
    onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval
    and calls pollingWaiter.wake(), which dispatches a coroutine resume onto
    the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold
    start that hits the OneSignalDispatchers lazy chain (executor + dispatcher
    + scope construction) on the calling thread - 26 / 500 main-thread ANRs in
    logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of
    pushing the polling interval to 1 day to effectively pause polling.

  FeatureFlagsRefreshService.onFocus / onUnfocused
    onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO,
    same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused
    cancels the poll job; we route the cancellation through the same serial
    dispatcher so back-to-back focus -> unfocus stays globally ordered with
    onFocus's polling-job swap, and `synchronized(this)` is qualified as
    `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on
    the service instance (the same monitor restartForegroundPolling takes)
    rather than the no-receiver lambda object.

  SessionService.onFocus / onUnfocused
    sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the
    registered session-lifecycle handlers (operation repo, IAM trigger eval,
    etc.) synchronously, and the first one to touch OneSignalDispatchers
    pays the cold-init cost on the main thread - 25 / 500 ANRs in
    logs/2026-05-12 sit on this stack. session.startTime / session.focusTime
    / activeDuration accounting is preserved by capturing
    _time.currentTimeMillis on the caller's thread BEFORE the wrapper and
    passing it into the deferred handleOnFocus / handleOnUnfocused, so the
    timestamps reflect when Android delivered the event, not when the serial
    dispatcher ran the block.

Rollout matrix (uniform across all five handlers)

  FF on  -> runOnSerialIOIfBackgroundThreading { ... } dispatches to
            OneSignalDispatchers.SerialIO (single-thread executor). Main
            thread returns from handleFocus immediately.
  FF off -> the block runs inline on the lifecycle main thread. Legacy
            behavior; retains the ANR for the control cohort so the A/B
            comparison stays clean.

Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on
one path and won't bounce mid-run. Worth flagging that the production ANR
samples for every handler in this PR were on FF=ON - because all five
previously bypassed every threading helper, the FF did not gate any of these
codepaths. This PR is what introduces the gate.

Why the serial dispatcher specifically

  All five handlers are invoked from the same main-thread fanout
  (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A
  rapid focus burst on a multi-thread IO pool could interleave them with
  each other and with the BackgroundManager cancel/schedule pair. Pinning all
  five to the same single-thread executor keeps lifecycle work globally
  ordered on the main-thread submission order, and future per-event work
  added to any of these handlers (focus counters, notification analytics,
  session timing) inherits the ordering guarantee for free.

Tests (all new specs pass; existing specs unchanged)

  * BackgroundManagerTests: existing tests + FF-on (dispatches through
    launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for
    both cancel and schedule. Includes a rapid unfocus -> focus burst test
    that pins both events through the serial dispatcher in submission order.
  * NotificationsManagerTests: dispatch contract on onFocus + rapid focus
    burst preserves submission order. Lambda body is observable (the test
    stub invokes the captured block) so JaCoCo sees the
    refreshNotificationState() call covered.
  * NotificationPermissionControllerTests: dispatch contract for the polling
    lifecycle listener on both onFocus and onUnfocused. Existing polling
    integration tests still pass under the FF-off default.
  * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through
    runOnSerialIOIfBackgroundThreading.
  * SessionServiceTests: existing state-mutation assertions still pass under
    the FF-off default (the wrapper runs inline). New assertions for the
    dispatch contract on onFocus + onUnfocused + the rapid burst.

:OneSignal:core + :OneSignal:notifications detekt + full unit suites green.

Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…old-start ANRs

ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading)
shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO,
all bottoming out in the same OneSignalDispatchers lazy chain:

  ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer
  CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch
  OneSignalDispatchers.IOScope.<init>   (by lazy)
  OneSignalDispatchers.IO               (by lazy)
  OneSignalDispatchers.ioExecutor       (by lazy)

The first IO consumer in the process pays the executor + dispatcher + scope
construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan
on its thread. Under sdk_background_threading whichever main-thread caller wins
the race eats 5-20s before the watchdog fires.

#2644 routes the five known onFocus / onUnfocused handlers through
runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the
deeper structural problem is the lazy chain itself - a future call site that
slips past the FF gate (or a JobService delivered to main before init has run)
hits the same stall.

OneSignalDispatchers.prewarm() spawns a dedicated short-lived
"OneSignal-prewarm" daemon thread that submits one empty launch on each of
IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end
so the next production caller - even on the main thread - only sees the cheap
"submit work to an already-constructed executor" cost.

  * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls
    from init / suspend init / SyncJobService.onStartJob no-op cheaply. An
    internal resetPrewarmForTest() lets specs exercise the "first call wins"
    branch independently.
  * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO /
    SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes
    wrong, so a failed prewarm just means the first real caller pays the
    original cost.
  * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit
    or starves UI work.

Called from:
  * OneSignalImp.initWithContext(context, appId)            (sync variant)
  * OneSignalImp.initWithContextSuspend(context, appId)     (suspend variant,
                                                             used by re-entrant
                                                             suspend callers)
  * SyncJobService.onStartJob BEFORE suspendifyOnIO         (JobService can fire
                                                             before the host app
                                                             init runs)

Tests (:core OneSignalDispatchersTests)
  * prewarm returns immediately on the caller and the daemon thread brings
    IO / Default / SerialIO + their scopes to Active.
  * prewarm is idempotent - second call does not spawn another
    OneSignal-prewarm thread (verified via thread-name scan).

Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500
main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading
helpers (47 onFocus + 23 JobService + 25 SessionService).

:OneSignal:core detekt + full unit suite green.

Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…dk_background_threading FF

Wraps every IApplicationLifecycleHandler that does slow / blocking work on the
main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All
five handlers share one rollout knob, one ordering guarantee (the SerialIO
single-thread executor), and one observable contract in tests.

The handlers + why they were ANR-ing

  BackgroundManager.onFocus / onUnfocused
    Synchronous JobScheduler.cancel / .schedule on the main thread. Binder
    transactions to system_server that can block for many seconds on
    Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a
    20,796 ms main-thread block on a 25078RA3EL / Android 15 device.

  NotificationsManager.onFocus
    refreshNotificationState() drives NotificationRestoreWorkManager
    .beginEnqueueingWork, which lazily constructs WorkManager (opens /
    migrates the SQLite store at app_data/databases/androidx.work.workdb on
    first call) and then writes a WorkSpec row. OTel insertId
    9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 /
    Android 15 device. Short-circuits on `restored = true` after the first
    call, so only the first focus event per process eats the SQLite stall.

  NotificationPermissionController polling lifecycle listener
    onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval
    and calls pollingWaiter.wake(), which dispatches a coroutine resume onto
    the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold
    start that hits the OneSignalDispatchers lazy chain (executor + dispatcher
    + scope construction) on the calling thread - 26 / 500 main-thread ANRs in
    logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of
    pushing the polling interval to 1 day to effectively pause polling.

  FeatureFlagsRefreshService.onFocus / onUnfocused
    onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO,
    same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused
    cancels the poll job; we route the cancellation through the same serial
    dispatcher so back-to-back focus -> unfocus stays globally ordered with
    onFocus's polling-job swap, and `synchronized(this)` is qualified as
    `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on
    the service instance (the same monitor restartForegroundPolling takes)
    rather than the no-receiver lambda object.

  SessionService.onFocus / onUnfocused
    sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the
    registered session-lifecycle handlers (operation repo, IAM trigger eval,
    etc.) synchronously, and the first one to touch OneSignalDispatchers
    pays the cold-init cost on the main thread - 25 / 500 ANRs in
    logs/2026-05-12 sit on this stack. session.startTime / session.focusTime
    / activeDuration accounting is preserved by capturing
    _time.currentTimeMillis on the caller's thread BEFORE the wrapper and
    passing it into the deferred handleOnFocus / handleOnUnfocused, so the
    timestamps reflect when Android delivered the event, not when the serial
    dispatcher ran the block.

Rollout matrix (uniform across all five handlers)

  FF on  -> runOnSerialIOIfBackgroundThreading { ... } dispatches to
            OneSignalDispatchers.SerialIO (single-thread executor). Main
            thread returns from handleFocus immediately.
  FF off -> the block runs inline on the lifecycle main thread. Legacy
            behavior; retains the ANR for the control cohort so the A/B
            comparison stays clean.

Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on
one path and won't bounce mid-run. Worth flagging that the production ANR
samples for every handler in this PR were on FF=ON - because all five
previously bypassed every threading helper, the FF did not gate any of these
codepaths. This PR is what introduces the gate.

Why the serial dispatcher specifically

  All five handlers are invoked from the same main-thread fanout
  (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A
  rapid focus burst on a multi-thread IO pool could interleave them with
  each other and with the BackgroundManager cancel/schedule pair. Pinning all
  five to the same single-thread executor keeps lifecycle work globally
  ordered on the main-thread submission order, and future per-event work
  added to any of these handlers (focus counters, notification analytics,
  session timing) inherits the ordering guarantee for free.

Tests (all new specs pass; existing specs unchanged)

  * BackgroundManagerTests: existing tests + FF-on (dispatches through
    launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for
    both cancel and schedule. Includes a rapid unfocus -> focus burst test
    that pins both events through the serial dispatcher in submission order.
  * NotificationsManagerTests: dispatch contract on onFocus + rapid focus
    burst preserves submission order. Lambda body is observable (the test
    stub invokes the captured block) so JaCoCo sees the
    refreshNotificationState() call covered.
  * NotificationPermissionControllerTests: dispatch contract for the polling
    lifecycle listener on both onFocus and onUnfocused. Existing polling
    integration tests still pass under the FF-off default.
  * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through
    runOnSerialIOIfBackgroundThreading.
  * SessionServiceTests: existing state-mutation assertions still pass under
    the FF-off default (the wrapper runs inline). New assertions for the
    dispatch contract on onFocus + onUnfocused + the rapid burst.

:OneSignal:core + :OneSignal:notifications detekt + full unit suites green.

Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…old-start ANRs

ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading)
shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO,
all bottoming out in the same OneSignalDispatchers lazy chain:

  ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer
  CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch
  OneSignalDispatchers.IOScope.<init>   (by lazy)
  OneSignalDispatchers.IO               (by lazy)
  OneSignalDispatchers.ioExecutor       (by lazy)

The first IO consumer in the process pays the executor + dispatcher + scope
construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan
on its thread. Under sdk_background_threading whichever main-thread caller wins
the race eats 5-20s before the watchdog fires.

#2644 routes the five known onFocus / onUnfocused handlers through
runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the
deeper structural problem is the lazy chain itself - a future call site that
slips past the FF gate (or a JobService delivered to main before init has run)
hits the same stall.

OneSignalDispatchers.prewarm() spawns a dedicated short-lived
"OneSignal-prewarm" daemon thread that submits one empty launch on each of
IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end
so the next production caller - even on the main thread - only sees the cheap
"submit work to an already-constructed executor" cost.

  * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls
    from init / suspend init / SyncJobService.onStartJob no-op cheaply. An
    internal resetPrewarmForTest() lets specs exercise the "first call wins"
    branch independently.
  * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO /
    SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes
    wrong, so a failed prewarm just means the first real caller pays the
    original cost.
  * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit
    or starves UI work.

Called from:
  * OneSignalImp.initWithContext(context, appId)            (sync variant)
  * OneSignalImp.initWithContextSuspend(context, appId)     (suspend variant,
                                                             used by re-entrant
                                                             suspend callers)
  * SyncJobService.onStartJob BEFORE suspendifyOnIO         (JobService can fire
                                                             before the host app
                                                             init runs)

Tests (:core OneSignalDispatchersTests)
  * prewarm returns immediately on the caller and the daemon thread brings
    IO / Default / SerialIO + their scopes to Active.
  * prewarm is idempotent - second call does not spawn another
    OneSignal-prewarm thread (verified via thread-name scan).

Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500
main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading
helpers (47 onFocus + 23 JobService + 25 SessionService).

:OneSignal:core detekt + full unit suite green.

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 merged commit ebb93a3 into main May 12, 2026
5 checks passed
@abdulraqeeb33 abdulraqeeb33 deleted the ar/sdk-4505 branch May 12, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants