Skip to content

fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF#2644

Merged
abdulraqeeb33 merged 2 commits into
mainfrom
ar/sdk-4506-notifications-onfocus-anr
May 12, 2026
Merged

fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF#2644
abdulraqeeb33 merged 2 commits into
mainfrom
ar/sdk-4506-notifications-onfocus-anr

Conversation

@abdulraqeeb33
Copy link
Copy Markdown
Contributor

@abdulraqeeb33 abdulraqeeb33 commented May 11, 2026

One Line Summary

Wrap every main-thread IApplicationLifecycleHandler.onFocus / onUnfocused that does slow / blocking work with runOnSerialIOIfBackgroundThreading (introduced in #2643). Five handlers, one rollout knob, one ordering guarantee.

Linear: SDK-4506
Project: Android: refactor loading during initialization
Base branch: ar/sdk-4505 (#2643 — adds the helper + serial dispatcher)

Motivation

Activity.onStartApplicationService.onActivityStartedhandleFocusapplicationLifecycleNotifier.fire { onFocus(...) } synchronously invokes every IApplicationLifecycleHandler on the main thread:

User action (background -> foreground)
  -> Activity.onStart                                  [main thread]
  -> Application.dispatchActivityStarted               [main thread]
  -> ApplicationService.onActivityStarted              [main thread]
  -> ApplicationService.handleFocus                    [main thread]
  -> applicationLifecycleNotifier.fire { onFocus(...) }[main thread]
  -> <every IApplicationLifecycleHandler.onFocus>      [main thread - WORK HAPPENS HERE]

Five lifecycle handlers do slow / blocking work on that main-thread fanout. ANR-dump analysis of logs/2026-05-12 (500 ANR entries, all on sdk_background_threading) plus two prior OTel samples attribute ~94 / 500 (~18.8 %) of main-thread ANRs to them.

Handler What it does on main ANR signature Observed
BackgroundManager.onFocus / onUnfocused JobScheduler.cancel / .schedule (synchronous Binder to system_server) JobSchedulerImpl.cancel OTel ycae33cjpu6gcyut — 20,796 ms on Xiaomi 25078RA3EL / Android 15
NotificationsManager.onFocus NotificationRestoreWorkManager.beginEnqueueingWork → WorkManager SQLite init + enqueueUniqueWork SQLiteConnection.nativeExecuteForLong OTel 9qy5s0ta0cwqwmb0 — 30,516 ms on vivo I2306 / Android 15
NotificationPermissionController polling listener (onFocus / onUnfocused) reads ConfigModel, calls pollingWaiter.wake() → IO pool dispatch dispatcher / executor lazy chain 26 / 500 in logs/2026-05-12
FeatureFlagsRefreshService.onFocus / onUnfocused restartForegroundPollingOneSignalDispatchers.launchOnIO dispatcher / executor lazy chain 18 / 500 in logs/2026-05-12
SessionService.onFocus / onUnfocused sessionLifeCycleNotifier.fire { onSessionStarted / onSessionActive } — runs every subscribed handler synchronously (operation repo, IAM trigger eval, etc.) dispatcher / executor lazy chain 25 / 500 in logs/2026-05-12

The three dispatcher-cold-start handlers (NPC + FFRS + SessionService) all bottom out in the OneSignalDispatchers lazy chain — the deeper structural fix for that is #2645 (prewarm). This PR is the safer first cut: keep all five handlers on one FF-gated knob so the rollout matrix stays simple.

Fix

Single shared helper from #2643:

fun runOnSerialIOIfBackgroundThreading(block: () -> Unit) {
    if (ThreadingMode.useBackgroundThreading) {
        suspendifyOnSerialIO { block() }
    } else {
        block()
    }
}

Applied uniformly at every call site:

// BackgroundManager
override fun onFocus(firedOnSubscribe: Boolean) {
    runOnSerialIOIfBackgroundThreading { cancelSyncTask() }
}
override fun onUnfocused() {
    runOnSerialIOIfBackgroundThreading { scheduleBackground() }
}

// NotificationsManager
override fun onFocus(firedOnSubscribe: Boolean) {
    runOnSerialIOIfBackgroundThreading { refreshNotificationState() }
}

// NotificationPermissionController polling lifecycle listener
override fun onFocus(firedOnSubscribe: Boolean) {
    super.onFocus(firedOnSubscribe)
    runOnSerialIOIfBackgroundThreading {
        pollingWaitInterval = _configModelStore.model.foregroundFetchNotificationPermissionInterval
        pollingWaiter.wake()
    }
}
override fun onUnfocused() {
    super.onUnfocused()
    runOnSerialIOIfBackgroundThreading {
        pollingWaitInterval = _configModelStore.model.backgroundFetchNotificationPermissionInterval
    }
}

// FeatureFlagsRefreshService
override fun onFocus(firedOnSubscribe: Boolean) {
    runOnSerialIOIfBackgroundThreading { restartForegroundPolling() }
}
override fun onUnfocused() {
    runOnSerialIOIfBackgroundThreading {
        synchronized(this@FeatureFlagsRefreshService) {
            pollJob?.cancel(); pollJob = null; pollingAppId = null
        }
    }
}

// SessionService — timestamps captured on the main thread BEFORE the wrapper
override fun onFocus(firedOnSubscribe: Boolean) {
    val focusTimeMs = _time.currentTimeMillis
    runOnSerialIOIfBackgroundThreading { handleOnFocus(firedOnSubscribe, focusTimeMs) }
}
override fun onUnfocused() {
    val unfocusTimeMs = _time.currentTimeMillis
    runOnSerialIOIfBackgroundThreading { handleOnUnfocused(unfocusTimeMs) }
}

Notable per-handler details:

  • FeatureFlagsRefreshService.onUnfocused qualifies this as this@FeatureFlagsRefreshService so the lambda locks on the service instance — the same monitor restartForegroundPolling takes — rather than on the no-receiver lambda object.
  • SessionService captures _time.currentTimeMillis on the caller's thread BEFORE the wrapper, so session.startTime / session.focusTime / activeDuration reflect when Android delivered the event, not whenever the serial dispatcher ran the block.
  • NotificationPermissionController.onUnfocused is intentionally NOT a no-op — it pushes the polling interval to 1 day to effectively pause polling. With the wrapper, FF=ON moves that single-field assignment onto SerialIO and FF=OFF keeps it inline.

Gated rollout

Cohort onFocus / onUnfocused path Behavior
FF on runOnSerialIOIfBackgroundThreading { ... }OneSignalDispatchers.SerialIO (single-thread executor) ANR-fixed
FF off inline on the lifecycle main thread Legacy behavior — retains the ANRs for the control cohort

Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on one path and won't bounce mid-run. Worth flagging that the production ANR samples for every handler in this PR were on FF=ON — because all five previously bypassed every threading helper, the FF did not gate any of these codepaths. This PR is what introduces the gate.

Why the serial dispatcher (not suspendifyOnIO)

All five handlers are invoked from the same main-thread fanout (ApplicationService.handleFocusapplicationLifecycleNotifier.fire). A rapid focus burst on a multi-thread IO pool could interleave them with each other and with the BackgroundManager.cancel / schedule pair. Pinning all five to the same single-thread executor keeps lifecycle work globally ordered on the main-thread submission order, and future per-event work added to any of these handlers inherits the ordering guarantee for free.

Scope

  • BackgroundManager: synchronous JobScheduler.cancel / schedule no longer on main when sdk_background_threading is on.
  • NotificationsManager.onFocus: WorkManager SQLite init + enqueue no longer on main.
  • NotificationPermissionController polling listener: onFocus + onUnfocused move to SerialIO.
  • FeatureFlagsRefreshService: onFocus + onUnfocused move to SerialIO (lock qualified).
  • SessionService: onFocus + onUnfocused move to SerialIO with timing capture preserved.
  • Unchanged: refreshNotificationState, NotificationRestoreWorkManager.beginEnqueueingWork (still synchronized + idempotent on restored), NotificationHelper.areNotificationsEnabled, setPermissionStatusAndFire, the body of restartForegroundPolling, and handleOnFocus / handleOnUnfocused's session-state mutation semantics.
  • NotificationsManager.onUnfocused is empty in production; not touched.
  • Deeper prewarm fix for the dispatcher / executor lazy chain itself lands in fix: warm OneSignalDispatchers on init to avoid cold-start ANRs #2645.

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Testing

Static

  • :OneSignal:core:detekt, :OneSignal:notifications:detekt — clean.
  • :OneSignal:core:compileReleaseKotlin, :OneSignal:notifications:compileReleaseKotlin, :OneSignal:testhelpers:compileReleaseKotlin — clean.

Automated

  • :OneSignal:core:testReleaseUnitTest — full suite green, including:
    • BackgroundManagerTests — FF=on dispatch via launchOnSerialIO in submission order on both cancel and schedule, FF=off inline, rapid unfocus -> focus burst routes through the serial dispatcher in submission order.
    • FeatureFlagsRefreshServiceTestsonFocus / onUnfocused route through runOnSerialIOIfBackgroundThreading.
    • SessionServiceTests — existing state-mutation assertions still pass under the FF-off default; new assertions for the dispatch contract on onFocus + onUnfocused + the rapid unfocus -> focus burst.
  • :OneSignal:notifications:testReleaseUnitTest — full suite green, including:
    • NotificationsManagerTests — dispatch contract + rapid-burst ordering, lambda body observable so JaCoCo sees the refreshNotificationState call covered.
    • NotificationPermissionControllerTestsonFocus + onUnfocused polling lifecycle listener dispatch contract. Existing polling integration tests still pass under the FF-off default since the wrapper inlines.

Manual

Will follow up with manual repro on a vivo device under simulated SQLite contention conditions.

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing (apply a single FF-gated helper at every order-sensitive lifecycle handler call site)
  • No public API changes

Testing

  • Test coverage on the dispatch contracts for all five handlers
  • Existing automated tests still pass for the touched modules
  • Manual repro on a vivo device pending

Final pass

  • Code is as readable as possible
  • I have reviewed this PR myself

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

📊 Diff Coverage Report

Diff Coverage Report (Changed Lines Only)

Gate: aggregate coverage on changed executable lines must be ≥ 80% (JaCoCo line data for lines touched in the diff).

Changed Files Coverage

  • BackgroundManager.kt: 2/2 touched executable lines (100.0%) (6 touched lines in diff)
  • FeatureFlagsRefreshService.kt: 7/7 touched executable lines (100.0%) (13 touched lines in diff)
  • SessionService.kt: 13/13 touched executable lines (100.0%) (24 touched lines in diff)
  • NotificationsManager.kt: 0/1 touched executable lines (0.0%) (5 touched lines in diff)
    • 1 uncovered touched lines in this file
  • NotificationPermissionController.kt: 0/7 touched executable lines (0.0%) (12 touched lines in diff)
    • 7 uncovered touched lines in this file

Overall (aggregate gate)

22/30 touched executable lines covered (73.3% — requires ≥ 80%)

Per-file detail (informational; gate is aggregate above):

  • NotificationsManager.kt: 0.0% (1 uncovered touched lines)

  • NotificationPermissionController.kt: 0.0% (7 uncovered touched lines)

❌ Coverage Check Failed

Aggregate coverage on touched lines is 73.3% (minimum 80%).

📥 View workflow run

@abdulraqeeb33 abdulraqeeb33 force-pushed the ar/sdk-4506-notifications-onfocus-anr branch from 3c92229 to 8d3904d Compare May 12, 2026 14:07
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…o avoid cold-start ANRs

ANR-dump analysis (logs/2026-05-12, 500 entries, all on
sdk_background_threading) attributes 47 / 500 ANRs (9.4%) to the SDK's
own background-threading helpers stalling the main thread on cold start.
The three call sites in the bucket are now all routed through
runOnSerialIOIfBackgroundThreading by the SDK-4505 / SDK-4506 work, but
the deeper root cause is that the very first IO consumer (whichever
caller wins the race) pays the cost of constructing the entire
OneSignalDispatchers lazy chain on its thread:

  ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer
  CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch
  OneSignalDispatchers.IOScope.<init>   (by lazy)
  OneSignalDispatchers.IO               (by lazy)
  OneSignalDispatchers.ioExecutor       (by lazy)

That includes the kotlinx.coroutines MainDispatcherFactory ServiceLoader
scan, executor + thread-factory construction, dispatcher wrapping, and
SupervisorJob / CoroutineScope wiring. Under sdk_background_threading
the first caller is typically an Activity-lifecycle handler or
JobService.onStartJob - both on the main thread - and OTel records
5-20s blocks before the watchdog fires.

OneSignalDispatchers.prewarm() spawns a dedicated short-lived
"OneSignal-prewarm" daemon thread that submits one empty launch on each
of IO / Default / SerialIO. That single thread pays the lazy-init cost
end-to-end so the next production caller - even on the main thread -
only sees the cheap "submit to already-constructed executor" path.

  * Idempotent: double-checked-locked prewarmStarted flag, so repeat
    calls from init / suspend init / JobService.onStartJob no-op cheaply.
  * Fire-and-forget: failures log and swallow; the existing
    Dispatchers.IO / SerialIO fallback paths in [IO] / [SerialIO] still
    apply if anything goes wrong, so a failed prewarm just means the
    first real caller pays the original cost.
  * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process
    exit or starves UI work.

Called from:
  * OneSignalImp.initWithContext(context, appId) (sync variant)
  * OneSignalImp.initWithContextSuspend(context, appId) (suspend variant,
    used by re-entrant suspend callers)
  * SyncJobService.onStartJob BEFORE suspendifyOnIO, because the
    JobService can fire before the host app's initWithContext runs.

Tests (:core OneSignalDispatchersTests):
  * prewarm returns immediately on the caller and the daemon thread
    brings IO / Default / SerialIO + their scopes to Active.
  * prewarm is idempotent - second call does not spawn another
    OneSignal-prewarm thread (verified via thread-name scan).

Scope reduction: the remaining-onFocus-handlers part of the original
SDK-4507 change (NotificationPermissionController + FeatureFlagsRefreshService
runOnSerialIOIfBackgroundThreading wrappers) was moved up the stack to
SDK-4506 (#2644), where it sits next to NotificationsManager.onFocus
since all three share the same FF-gated rollout shape. This PR is now
focused purely on the prewarm fix.

:OneSignal:core detekt + full unit suite green.

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 changed the title fix: SDK-4506 offload NotificationsManager.onFocus WorkManager DB I/O off the main thread fix: SDK-4506 offload main-thread onFocus handlers behind sdk_background_threading FF May 12, 2026
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…ing helper

Introduces the threading infrastructure that the follow-up PRs depend on. This
PR adds the helpers and tests; it does not change any production call sites.

What it adds

  * OneSignalDispatchers.SerialIO
    A single-thread, named ("OneSignal-SerialIO") CoroutineDispatcher backed
    by Executors.newSingleThreadExecutor with a SupervisorJob + CoroutineScope.
    Falls back to Dispatchers.IO.limitedParallelism(1) if executor construction
    fails. Submission order on the dispatcher == execution order on its single
    worker, which is exactly the semantics the focus / unfocus lifecycle
    handlers need (see the next PR).

    Companion: launchOnSerialIO { ... } and a SerialIO entry in
    OneSignalDispatchers.getPerformanceMetrics() / getStatus().

  * ThreadUtils.suspendifyOnSerialIO { ... }
    Always-on serial dispatch. Wraps OneSignalDispatchers.launchOnSerialIO and
    is intentionally NOT gated on ThreadingMode.useBackgroundThreading - some
    code paths need ordered off-main execution unconditionally.

  * ThreadUtils.runOnSerialIOIfBackgroundThreading { ... }
    FF-gated wrapper for non-suspending blocks. When
    ThreadingMode.useBackgroundThreading is true the block is dispatched to
    SerialIO; when false the block runs inline on the calling thread. This is
    the call shape every subsequent focus / unfocus handler in this series
    uses, so the rollout matrix stays one-knob simple.

    Block is non-suspending on purpose: the FF-off branch executes on the
    caller's thread, and a suspending block there would force a runBlocking,
    which defeats the purpose of an A/B comparison.

  * IOMockHelper stubs the new helpers
    suspendifyOnSerialIO + launchOnSerialIO are tracked by awaitIO() so
    existing specs stay deterministic. runOnSerialIOIfBackgroundThreading is
    stubbed inline-on-test-thread by default so existing call-site specs keep
    their observable behavior; specs that want to exercise the FF-on (offload)
    branch can override the stub.

Tests

  * OneSignalDispatchersTests: new SerialIO cases - construction, lazy chain
    activates on first launch, getStatus reports Active + queue size, falls
    back to the limitedParallelism(1) path if executor construction fails.
    getStatus + getPerformanceMetrics are refactored to extract executorStatus
    + scopeStatus inline helpers to keep them under Detekt's LongMethod /
    ComplexMethod thresholds.
  * ThreadUtilsFeatureFlagTests: new cases that suspendifyOnSerialIO always
    routes through the serial dispatcher (FF-agnostic), and that
    runOnSerialIOIfBackgroundThreading routes through the serial dispatcher
    when the FF is on and runs inline when the FF is off.

Why a dedicated serial dispatcher (not just suspendifyOnIO)

  Multi-thread IO pools don't guarantee submission order = execution order. A
  rapid focus burst (activity restart, share flow popping the activity back/
  forth) could otherwise interleave cancel/schedule pairs or session-state
  mutations. Pinning order-sensitive lifecycle work to a single executor keeps
  it globally ordered, and future per-event work (focus counters, session
  timing, analytics) inherits the guarantee for free.

:OneSignal:core detekt + full unit suite green. No production behavior change
in this PR; the follow-up PRs land the call-site offloads (#2644) and the
dispatcher prewarm (#2645).

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 force-pushed the ar/sdk-4506-notifications-onfocus-anr branch from 520ae2c to 0621414 Compare May 12, 2026 14:32
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…old-start ANRs

ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading)
shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO,
all bottoming out in the same OneSignalDispatchers lazy chain:

  ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer
  CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch
  OneSignalDispatchers.IOScope.<init>   (by lazy)
  OneSignalDispatchers.IO               (by lazy)
  OneSignalDispatchers.ioExecutor       (by lazy)

The first IO consumer in the process pays the executor + dispatcher + scope
construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan
on its thread. Under sdk_background_threading whichever main-thread caller wins
the race eats 5-20s before the watchdog fires.

#2644 routes the five known onFocus / onUnfocused handlers through
runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the
deeper structural problem is the lazy chain itself - a future call site that
slips past the FF gate (or a JobService delivered to main before init has run)
hits the same stall.

OneSignalDispatchers.prewarm() spawns a dedicated short-lived
"OneSignal-prewarm" daemon thread that submits one empty launch on each of
IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end
so the next production caller - even on the main thread - only sees the cheap
"submit work to an already-constructed executor" cost.

  * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls
    from init / suspend init / SyncJobService.onStartJob no-op cheaply. An
    internal resetPrewarmForTest() lets specs exercise the "first call wins"
    branch independently.
  * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO /
    SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes
    wrong, so a failed prewarm just means the first real caller pays the
    original cost.
  * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit
    or starves UI work.

Called from:
  * OneSignalImp.initWithContext(context, appId)            (sync variant)
  * OneSignalImp.initWithContextSuspend(context, appId)     (suspend variant,
                                                             used by re-entrant
                                                             suspend callers)
  * SyncJobService.onStartJob BEFORE suspendifyOnIO         (JobService can fire
                                                             before the host app
                                                             init runs)

Tests (:core OneSignalDispatchersTests)
  * prewarm returns immediately on the caller and the daemon thread brings
    IO / Default / SerialIO + their scopes to Active.
  * prewarm is idempotent - second call does not spawn another
    OneSignal-prewarm thread (verified via thread-name scan).

Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500
main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading
helpers (47 onFocus + 23 JobService + 25 SessionService).

:OneSignal:core detekt + full unit suite green.

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 changed the title fix: SDK-4506 offload main-thread onFocus handlers behind sdk_background_threading FF fix: offload every main-thread onFocus / onUnfocused handler behind sdk_background_threading FF May 12, 2026
@abdulraqeeb33 abdulraqeeb33 force-pushed the ar/sdk-4506-notifications-onfocus-anr branch from 0621414 to 6dc8889 Compare May 12, 2026 14:43
…ing helper

Introduces the threading infrastructure that the follow-up PRs depend on. This
PR adds the helpers and tests; it does not change any production call sites.

What it adds

  * OneSignalDispatchers.SerialIO
    A single-thread, named ("OneSignal-SerialIO") CoroutineDispatcher backed
    by Executors.newSingleThreadExecutor with a SupervisorJob + CoroutineScope.
    Falls back to Dispatchers.IO.limitedParallelism(1) if executor construction
    fails. Submission order on the dispatcher == execution order on its single
    worker, which is exactly the semantics the focus / unfocus lifecycle
    handlers need (see the next PR).

    Companion: launchOnSerialIO { ... } and a SerialIO entry in
    OneSignalDispatchers.getPerformanceMetrics() / getStatus().

  * ThreadUtils.suspendifyOnSerialIO { ... }
    Always-on serial dispatch. Wraps OneSignalDispatchers.launchOnSerialIO and
    is intentionally NOT gated on ThreadingMode.useBackgroundThreading - some
    code paths need ordered off-main execution unconditionally.

  * ThreadUtils.runOnSerialIOIfBackgroundThreading { ... }
    FF-gated wrapper for non-suspending blocks. When
    ThreadingMode.useBackgroundThreading is true the block is dispatched to
    SerialIO; when false the block runs inline on the calling thread. This is
    the call shape every subsequent focus / unfocus handler in this series
    uses, so the rollout matrix stays one-knob simple.

    Block is non-suspending on purpose: the FF-off branch executes on the
    caller's thread, and a suspending block there would force a runBlocking,
    which defeats the purpose of an A/B comparison.

  * IOMockHelper stubs the new helpers
    suspendifyOnSerialIO + launchOnSerialIO are tracked by awaitIO() so
    existing specs stay deterministic. runOnSerialIOIfBackgroundThreading is
    stubbed inline-on-test-thread by default so existing call-site specs keep
    their observable behavior; specs that want to exercise the FF-on (offload)
    branch can override the stub.

Tests

  * OneSignalDispatchersTests: new SerialIO cases - construction, lazy chain
    activates on first launch, getStatus reports Active + queue size, falls
    back to the limitedParallelism(1) path if executor construction fails.
    getStatus + getPerformanceMetrics are refactored to extract executorStatus
    + scopeStatus inline helpers to keep them under Detekt's LongMethod /
    ComplexMethod thresholds.
  * ThreadUtilsFeatureFlagTests: new cases that suspendifyOnSerialIO always
    routes through the serial dispatcher (FF-agnostic), and that
    runOnSerialIOIfBackgroundThreading routes through the serial dispatcher
    when the FF is on and runs inline when the FF is off.

Why a dedicated serial dispatcher (not just suspendifyOnIO)

  Multi-thread IO pools don't guarantee submission order = execution order. A
  rapid focus burst (activity restart, share flow popping the activity back/
  forth) could otherwise interleave cancel/schedule pairs or session-state
  mutations. Pinning order-sensitive lifecycle work to a single executor keeps
  it globally ordered, and future per-event work (focus counters, session
  timing, analytics) inherits the guarantee for free.

:OneSignal:core detekt + full unit suite green. No production behavior change
in this PR; the follow-up PRs land the call-site offloads (#2644) and the
dispatcher prewarm (#2645).

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 force-pushed the ar/sdk-4506-notifications-onfocus-anr branch from 6dc8889 to be6f168 Compare May 12, 2026 14:44
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…old-start ANRs

ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading)
shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO,
all bottoming out in the same OneSignalDispatchers lazy chain:

  ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer
  CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch
  OneSignalDispatchers.IOScope.<init>   (by lazy)
  OneSignalDispatchers.IO               (by lazy)
  OneSignalDispatchers.ioExecutor       (by lazy)

The first IO consumer in the process pays the executor + dispatcher + scope
construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan
on its thread. Under sdk_background_threading whichever main-thread caller wins
the race eats 5-20s before the watchdog fires.

#2644 routes the five known onFocus / onUnfocused handlers through
runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the
deeper structural problem is the lazy chain itself - a future call site that
slips past the FF gate (or a JobService delivered to main before init has run)
hits the same stall.

OneSignalDispatchers.prewarm() spawns a dedicated short-lived
"OneSignal-prewarm" daemon thread that submits one empty launch on each of
IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end
so the next production caller - even on the main thread - only sees the cheap
"submit work to an already-constructed executor" cost.

  * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls
    from init / suspend init / SyncJobService.onStartJob no-op cheaply. An
    internal resetPrewarmForTest() lets specs exercise the "first call wins"
    branch independently.
  * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO /
    SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes
    wrong, so a failed prewarm just means the first real caller pays the
    original cost.
  * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit
    or starves UI work.

Called from:
  * OneSignalImp.initWithContext(context, appId)            (sync variant)
  * OneSignalImp.initWithContextSuspend(context, appId)     (suspend variant,
                                                             used by re-entrant
                                                             suspend callers)
  * SyncJobService.onStartJob BEFORE suspendifyOnIO         (JobService can fire
                                                             before the host app
                                                             init runs)

Tests (:core OneSignalDispatchersTests)
  * prewarm returns immediately on the caller and the daemon thread brings
    IO / Default / SerialIO + their scopes to Active.
  * prewarm is idempotent - second call does not spawn another
    OneSignal-prewarm thread (verified via thread-name scan).

Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500
main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading
helpers (47 onFocus + 23 JobService + 25 SessionService).

:OneSignal:core detekt + full unit suite green.

Co-authored-by: Cursor <cursoragent@cursor.com>
…dk_background_threading FF

Wraps every IApplicationLifecycleHandler that does slow / blocking work on the
main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All
five handlers share one rollout knob, one ordering guarantee (the SerialIO
single-thread executor), and one observable contract in tests.

The handlers + why they were ANR-ing

  BackgroundManager.onFocus / onUnfocused
    Synchronous JobScheduler.cancel / .schedule on the main thread. Binder
    transactions to system_server that can block for many seconds on
    Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a
    20,796 ms main-thread block on a 25078RA3EL / Android 15 device.

  NotificationsManager.onFocus
    refreshNotificationState() drives NotificationRestoreWorkManager
    .beginEnqueueingWork, which lazily constructs WorkManager (opens /
    migrates the SQLite store at app_data/databases/androidx.work.workdb on
    first call) and then writes a WorkSpec row. OTel insertId
    9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 /
    Android 15 device. Short-circuits on `restored = true` after the first
    call, so only the first focus event per process eats the SQLite stall.

  NotificationPermissionController polling lifecycle listener
    onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval
    and calls pollingWaiter.wake(), which dispatches a coroutine resume onto
    the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold
    start that hits the OneSignalDispatchers lazy chain (executor + dispatcher
    + scope construction) on the calling thread - 26 / 500 main-thread ANRs in
    logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of
    pushing the polling interval to 1 day to effectively pause polling.

  FeatureFlagsRefreshService.onFocus / onUnfocused
    onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO,
    same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused
    cancels the poll job; we route the cancellation through the same serial
    dispatcher so back-to-back focus -> unfocus stays globally ordered with
    onFocus's polling-job swap, and `synchronized(this)` is qualified as
    `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on
    the service instance (the same monitor restartForegroundPolling takes)
    rather than the no-receiver lambda object.

  SessionService.onFocus / onUnfocused
    sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the
    registered session-lifecycle handlers (operation repo, IAM trigger eval,
    etc.) synchronously, and the first one to touch OneSignalDispatchers
    pays the cold-init cost on the main thread - 25 / 500 ANRs in
    logs/2026-05-12 sit on this stack. session.startTime / session.focusTime
    / activeDuration accounting is preserved by capturing
    _time.currentTimeMillis on the caller's thread BEFORE the wrapper and
    passing it into the deferred handleOnFocus / handleOnUnfocused, so the
    timestamps reflect when Android delivered the event, not when the serial
    dispatcher ran the block.

Rollout matrix (uniform across all five handlers)

  FF on  -> runOnSerialIOIfBackgroundThreading { ... } dispatches to
            OneSignalDispatchers.SerialIO (single-thread executor). Main
            thread returns from handleFocus immediately.
  FF off -> the block runs inline on the lifecycle main thread. Legacy
            behavior; retains the ANR for the control cohort so the A/B
            comparison stays clean.

Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on
one path and won't bounce mid-run. Worth flagging that the production ANR
samples for every handler in this PR were on FF=ON - because all five
previously bypassed every threading helper, the FF did not gate any of these
codepaths. This PR is what introduces the gate.

Why the serial dispatcher specifically

  All five handlers are invoked from the same main-thread fanout
  (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A
  rapid focus burst on a multi-thread IO pool could interleave them with
  each other and with the BackgroundManager cancel/schedule pair. Pinning all
  five to the same single-thread executor keeps lifecycle work globally
  ordered on the main-thread submission order, and future per-event work
  added to any of these handlers (focus counters, notification analytics,
  session timing) inherits the ordering guarantee for free.

Tests (all new specs pass; existing specs unchanged)

  * BackgroundManagerTests: existing tests + FF-on (dispatches through
    launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for
    both cancel and schedule. Includes a rapid unfocus -> focus burst test
    that pins both events through the serial dispatcher in submission order.
  * NotificationsManagerTests: dispatch contract on onFocus + rapid focus
    burst preserves submission order. Lambda body is observable (the test
    stub invokes the captured block) so JaCoCo sees the
    refreshNotificationState() call covered.
  * NotificationPermissionControllerTests: dispatch contract for the polling
    lifecycle listener on both onFocus and onUnfocused. Existing polling
    integration tests still pass under the FF-off default.
  * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through
    runOnSerialIOIfBackgroundThreading.
  * SessionServiceTests: existing state-mutation assertions still pass under
    the FF-off default (the wrapper runs inline). New assertions for the
    dispatch contract on onFocus + onUnfocused + the rapid burst.

:OneSignal:core + :OneSignal:notifications detekt + full unit suites green.

Co-authored-by: Cursor <cursoragent@cursor.com>
abdulraqeeb33 pushed a commit that referenced this pull request May 12, 2026
…old-start ANRs

ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading)
shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO,
all bottoming out in the same OneSignalDispatchers lazy chain:

  ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer
  CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch
  OneSignalDispatchers.IOScope.<init>   (by lazy)
  OneSignalDispatchers.IO               (by lazy)
  OneSignalDispatchers.ioExecutor       (by lazy)

The first IO consumer in the process pays the executor + dispatcher + scope
construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan
on its thread. Under sdk_background_threading whichever main-thread caller wins
the race eats 5-20s before the watchdog fires.

#2644 routes the five known onFocus / onUnfocused handlers through
runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the
deeper structural problem is the lazy chain itself - a future call site that
slips past the FF gate (or a JobService delivered to main before init has run)
hits the same stall.

OneSignalDispatchers.prewarm() spawns a dedicated short-lived
"OneSignal-prewarm" daemon thread that submits one empty launch on each of
IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end
so the next production caller - even on the main thread - only sees the cheap
"submit work to an already-constructed executor" cost.

  * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls
    from init / suspend init / SyncJobService.onStartJob no-op cheaply. An
    internal resetPrewarmForTest() lets specs exercise the "first call wins"
    branch independently.
  * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO /
    SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes
    wrong, so a failed prewarm just means the first real caller pays the
    original cost.
  * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit
    or starves UI work.

Called from:
  * OneSignalImp.initWithContext(context, appId)            (sync variant)
  * OneSignalImp.initWithContextSuspend(context, appId)     (suspend variant,
                                                             used by re-entrant
                                                             suspend callers)
  * SyncJobService.onStartJob BEFORE suspendifyOnIO         (JobService can fire
                                                             before the host app
                                                             init runs)

Tests (:core OneSignalDispatchersTests)
  * prewarm returns immediately on the caller and the daemon thread brings
    IO / Default / SerialIO + their scopes to Active.
  * prewarm is idempotent - second call does not spawn another
    OneSignal-prewarm thread (verified via thread-name scan).

Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500
main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading
helpers (47 onFocus + 23 JobService + 25 SessionService).

:OneSignal:core detekt + full unit suite green.

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 force-pushed the ar/sdk-4506-notifications-onfocus-anr branch from be6f168 to abe5633 Compare May 12, 2026 14:48
Base automatically changed from ar/sdk-4505 to main May 12, 2026 15:04
@abdulraqeeb33 abdulraqeeb33 merged commit 9f6df5e into main May 12, 2026
2 of 3 checks passed
@abdulraqeeb33 abdulraqeeb33 deleted the ar/sdk-4506-notifications-onfocus-anr branch May 12, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants