fix: [SDK-4474] self-heal SDK-4388-stuck push subscriptions on session start#2636
Open
sherwinski wants to merge 1 commit intoar/sdk-4388-android-sdk-subscription-state-permanently-stuck-at-neverfrom
Conversation
…n start
The SDK-4388 fix prevents new installs from getting stuck (POST -> PATCH
flip in SubscriptionOperationExecutor), but already-stuck users on older
SDK builds will not self-heal on a plain in-place upgrade: the buggy POST
already returned 200 {} and the local op queue is empty by the time the
fixed SDK runs.
Add a divergence check to RefreshUserOperationExecutor.getUser. The SDK
already calls GET /users/by/onesignal_id/{osid} on every session start,
and the response carries the server view of `enabled` /
`notification_types`. The "device is source of truth" policy for push
subscriptions is preserved (we still don't hydrate the server's push
subscription into the local model), but when the cached local model
resolves to enabled-and-opted-in while the server's record of the same
subscription id reports disabled, we now emit a follow-up
UpdateSubscriptionOperation. That op runs through the standard executor
path and re-asserts local truth via PATCH /subscriptions/{id} on the
next session start - one extra request per stuck device, no consumer
code changes, no migration script, no toggle.
Tests cover the four boundary cases in RefreshUserOperationExecutorTests:
divergence (server stuck-disabled, local enabled) emits the follow-up
op; healthy (server matches local) does not; locally-opted-out does not
(opt-out is intentional); local NO_PERMISSION does not (OS denial is
real).
Refs: SDK-4388
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
One Line Summary
On session start, detect when the server's record of the device's push subscription is disabled while the local model says enabled-and-opted-in, and re-assert local truth via
PATCH /subscriptions/{id}so users already stuck by SDK-4388 self-heal on next launch — no consumer code changes, no migration script, no toggle.Details
Motivation
The fix in #2627 (parent PR) prevents new installs from getting stuck by flipping
POST /subscriptionstoPATCH /subscriptions/{id}inSubscriptionOperationExecutorwhen the subscription id is server-assigned. That covers the forward path, but it does not rescue the cohort that already triggered the bug on an older SDK build:POST /subscriptionsreturned200 {}(server-side no-op for an existing id).create-subscription + update-subscription(SUBSCRIBED)group was therefore acked and dropped from the local op queue.enabled:false, notification_types:0permanently.I confirmed this empirically with the SDK-4388 demo-app repro: force-stop on the buggy build (server stuck), upgrade to the fixed build, relaunch — no
PATCH /subscriptions/{id}is ever observed.Scope
RefreshUserOperationExecutor.getUser. On every session start the SDK already issuesGET /users/by/onesignal_id/{osid}, whose response carriesenabled/notification_typesper subscription. We now inspect the entry whoseidmatches the device's cached push subscription id, and iflocal.enabled && !server.enabled, we emit a follow-upUpdateSubscriptionOperationreturned viaExecutionResponse.operations. That op runs through the standard executor path and produces onePATCH /subscriptions/{id}carrying the device's true state.SubscriptionModelStore(the existingif (subscriptionModel.type != SubscriptionType.PUSH)filter is preserved). The new code only enforces that policy across the wire when divergence is detected.optedIn=false, statusUNSUBSCRIBE) or one withNO_PERMISSIONdoes not trigger a reconcile — those server-side disables are correct, not stuck.Cost
One extra
PATCH /subscriptions/{id}per stuck device on its first foreground session under the new SDK build. Healthy devices pay nothing. Once the PATCH succeeds, server matches local and no further reconciles fire.Testing
Unit testing
Added 4 cases to
RefreshUserOperationExecutorTests:enabled:false, notification_types:0+ local opted-in + SUBSCRIBED + non-empty token → returns exactly oneUpdateSubscriptionOperationwithenabled=true, status=SUBSCRIBED, correct ids.enabled:true, notification_types:1+ local opted-in → no follow-up op.enabled:false, notification_types:-2+ localoptedIn=false→ no follow-up op (UNSUBSCRIBE is intentional).enabled:false, notification_types:0+ localstatus=NO_PERMISSION→ no follow-up op (OS denial is real).The existing happy-path test
refresh user is successful and models are hydrated properlywas updated to setnotificationTypes = 1on the cached push subscription so it stays on the new healthy path.Test gate:
./gradlew :OneSignal:core:testReleaseUnitTest --tests \"...RefreshUserOperationExecutorTests\"→ 10/10 PASS. Two unrelatedSDKInitTestsfailures exist on the parent branch and are not introduced by this PR.Detekt + production assemble (
:OneSignal:core:detekt,:OneSignal:core:assembleRelease) clean.Manual testing
End-to-end repro and validation are intended to run on the demo app shipped with the parent SDK-4388 branch (which has the "Reproduce SDK-4388" + "Self-heal test" sections):
enabled:false, notification_types:0on dashboard).GET /users/by/onesignal_id/{osid}followed by a singlePATCH /subscriptions/{id}withenabled:true, notification_types:1.GET(no further PATCH) — divergence check is a no-op once reconciled.Affected code checklist
PATCH /subscriptions/{id}per stuck device on first session post-upgradeChecklist
Overview
RefreshUserOperationExecutor.getUserTesting
SDKInitTestsfailures are unrelated to this change)Final pass
buildPushSelfHealOperationOrNullwith explanatory comments referencing SDK-4388 / SDK-4474Branching: This PR targets the SDK-4388 fix branch (#2627 parent) so the two changes can land together as a complete "prevent new + heal old" pair. If the team prefers to ship the SDK-4388 fix first and follow up with this independently, retarget to
main.Refs: SDK-4474, SDK-4388