Skip to content

Fix gateway not receiving delete events for artifact removal#1602

Merged
Krishanx92 merged 8 commits intowso2:mainfrom
dushaniw:main-gw-restart
Mar 31, 2026
Merged

Fix gateway not receiving delete events for artifact removal#1602
Krishanx92 merged 8 commits intowso2:mainfrom
dushaniw:main-gw-restart

Conversation

@dushaniw
Copy link
Copy Markdown
Contributor

@dushaniw dushaniw commented Mar 31, 2026

Problem

When following the standard deletion flow (undeploy → delete deployment → delete resource), the gateway never receives the WebSocket delete event. This causes stale "undeployed" artifacts to remain on the gateway, blocking future deployments of the same handle with a "configuration already exists" error.
Root cause: Deleting a deployment cascade-deletes its deployment_status row. When the resource is subsequently deleted, GetDeployedGatewayIDs() queries deployment_status and finds zero rows, so no delete event is sent to any gateway.
Additionally, when a deployment is deleted (but the resource still exists) and the gateway restarts, the sync treats the artifact as an orphan and removes it along with its API keys and subscriptions. If the user later redeploys the same resource, the keys are lost until the next gateway restart.

Changes

Issue 1 — Resource delete now broadcasts to all org gateways:

All four resource delete services (REST API, LLM Provider, LLM Proxy, MCP Proxy) now use gatewayRepo.GetByOrganizationID() instead of deploymentRepo.GetDeployedGatewayIDs() to find target
gateways. This ensures delete events are always delivered regardless of deployment status. The gateway delete handler safely no-ops if the artifact doesn't exist locally.

  • Added gatewayRepo dependency to LLMProviderService, LLMProxyService, and MCPProxyService (REST API service already had it)
  • Updated all four delete methods and their constructors
  • Updated wiring in server initialization

Issue 3 — New /artifacts/exists endpoint for sync-time orphan detection:

Added POST /api/internal/v1/artifacts/exists endpoint that accepts a list of artifact UUIDs and returns whether each still exists on the platform. The gateway calls this during startup sync
before deleting orphaned artifacts. Artifacts that still exist (but have no active deployment) are retained, preserving their API keys and subscriptions.

  • New repository method ExistsByUUIDs for bulk artifact existence check
  • New service method, handler, DTO types, and route registration
  • Added path to JWT skip paths for API key authentication
  • Gateway calls the endpoint during sync, filtering orphans before deletion
  • Skipped for on-prem control planes (endpoint not available, and on-prem only has deployed state)
  • Updated OpenAPI spec with the new endpoint, request/response schemas

Test plan

  • Deploy all 4 artifact types → undeploy → delete deployment → delete resource → verify gateway returns 404 (not stale undeployed)
  • After resource delete, recreate with same handle → deploy → verify gateway accepts it
  • Deploy → undeploy → delete deployment (resource still exists) → restart gateway → verify artifact retained with keys intact on gateway
  • Redeploy same resource after restart → verify keys are present
  • Verify on-prem gateway skips /artifacts/exists call and deletes orphans normally

Summary by CodeRabbit

  • Improvements

    • Deployment sync now verifies artifact existence (skipping deletions if check fails or artifacts still on platform).
    • API key sync reads configs from the database and removes orphaned keys when configs are missing; deletion handlers now log at Info and emit DELETE events.
    • Deletion/event notifications now target all gateways in an organization.
  • New Features

    • Added internal POST /api/internal/v1/artifacts/exists to report per-artifact existence (with API docs).
  • Config

    • Authentication skip paths updated to include the artifact-existence endpoint.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Walkthrough

Gateway controller now queries the platform API to check artifact existence before deleting deployments and loads artifact configs from the DB for API-key reconciliation. Platform API adds an internal POST /api/internal/v1/artifacts/exists endpoint, repository support for multi-UUID existence checks, and rewires services to broadcast deletions to all organization gateways.

Changes

Cohort / File(s) Summary
Gateway Controller — utils, sync, client
gateway/gateway-controller/pkg/utils/api_utils.go, gateway/gateway-controller/pkg/controlplane/sync.go, gateway/gateway-controller/pkg/controlplane/client.go
Added APIUtilsService.CheckArtifactsExist. syncDeployments conditionally calls it (skips on-prem) and filters diff.toDelete to only artifacts reported missing on platform; on API error skips deletions. syncAPIKeysForExistingArtifacts now reads configs from DB via GetAllConfigs() and on DB error aborts with log; LLM provider/proxy deleted handlers remove orphaned API keys and publish DELETE events when config not found.
Platform API — internal handler, DTOs, OpenAPI
platform-api/src/internal/handler/gateway_internal.go, platform-api/src/internal/dto/gateway_internal.go, platform-api/src/resources/gateway-internal-api.yaml
Added CheckArtifactsExist handler and route POST /api/internal/v1/artifacts/exists; introduced DTOs ArtifactsExistRequest, ArtifactExistenceInfo, ArtifactsExistResponse; added corresponding OpenAPI schemas and endpoint.
Platform API — repository & interfaces
platform-api/src/internal/repository/artifact.go, platform-api/src/internal/repository/interfaces.go
Added ArtifactRepository.ExistsByUUIDs(uuids []string, orgUUID string) ([]string, error) and implementation to return which provided UUIDs exist for an organization.
Platform API — service layer & wiring
platform-api/src/internal/service/gateway_internal.go, platform-api/src/internal/service/llm.go, platform-api/src/internal/service/mcp.go, platform-api/src/internal/service/api.go, platform-api/src/internal/server/server.go
Wired artifactRepo into GatewayInternalAPIService and added CheckArtifactsExist. Added gatewayRepo dependency to LLM/LLMProxy/MCP services and changed deletion broadcasting to iterate all gateways from gatewayRepo.GetByOrganizationID (use gateway.ID) instead of deployment-derived gateway IDs. Updated server startup wiring and API deletion broadcast targeting.
Platform API — config, tests
platform-api/src/config/config.go, platform-api/src/internal/service/llm_test.go
Appended /api/internal/v1/artifacts to JWT SKIP_PATHS default. Updated unit tests to pass the new constructor argument (additional nil) for LLM services.

Sequence Diagram

sequenceDiagram
    actor GC as Gateway Controller
    participant APIUTIL as APIUtilsService
    participant HANDLER as GatewayInternalHandler
    participant SVC as GatewayInternalAPIService
    participant REPO as ArtifactRepository
    participant DB as Database

    GC->>APIUTIL: CheckArtifactsExist(artifactIDs)
    APIUTIL->>HANDLER: POST /api/internal/v1/artifacts/exists {artifactIds}
    HANDLER->>SVC: CheckArtifactsExist(orgID, artifactIDs)
    SVC->>REPO: ExistsByUUIDs(artifactIDs, orgUUID)
    REPO->>DB: SELECT uuid FROM artifacts WHERE uuid IN (...) AND organization_uuid=?
    DB-->>REPO: existing UUIDs
    REPO-->>SVC: []string (existing UUIDs)
    SVC-->>HANDLER: ArtifactsExistResponse{artifacts}
    HANDLER-->>APIUTIL: 200 {artifacts}
    APIUTIL-->>GC: []string (existing IDs)
    GC->>GC: filter deletions (remove IDs reported existing)
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • Krishanx92

Poem

🐰 I hopped through tables, checked each seed,
Which artifacts sprout and which recede.
Gateways trumpet, events take flight,
Deletions pruned with careful might,
Hooray — the sync is tidy tonight! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately summarizes the main purpose of the PR: fixing the gateway not receiving delete events when artifacts are removed, which is the primary issue being addressed.
Description check ✅ Passed The PR description comprehensively covers all required template sections: Problem statement with root cause analysis, detailed explanation of changes across four services, new endpoint implementation details, test plan with multiple scenarios, and clear reasoning for design decisions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@gateway/gateway-controller/pkg/controlplane/sync.go`:
- Around line 103-107: The current error path logs a failure to call
/artifacts/exists and then proceeds to delete everything in diff.toDelete;
instead, make this path conservative: when the artifacts-existence check returns
an error (the branch where c.logger.Warn is called), do NOT hard-delete
candidates in diff.toDelete — either abort the deletion step (return/skip) or
filter diff.toDelete to only include items explicitly confirmed absent; update
the code around the existence-check call and the deletion consumer that uses
diff.toDelete so it skips deletions on error. Apply the same change to the other
identical block around lines handling diff.toDelete (the block referenced in the
review at 128-129) so both places avoid destructive fallback on transient
errors.

In `@platform-api/src/internal/dto/gateway_internal.go`:
- Around line 90-104: The ArtifactsExistRequest can contain an unbounded
ArtifactIDs slice which causes ArtifactRepo.ExistsByUUIDs to build a single
large IN query and hit DB parameter limits; update the handler/service that
processes ArtifactsExistRequest to either enforce a hard cap on len(ArtifactIDs)
(e.g., return 400 when > MAX_IDS) or batch calls to ArtifactRepo.ExistsByUUIDs
in fixed-size chunks (e.g., CHUNK_SIZE = 500–1000) and merge results into
ArtifactsExistResponse.Artifacts; locate the logic that consumes
ArtifactsExistRequest and modify it to split ArtifactIDs into chunks, call
ArtifactRepo.ExistsByUUIDs for each chunk, collect ArtifactsExistenceInfo
entries, and return the aggregated response (or validate and reject oversized
requests) to avoid single huge IN (...) queries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 02691465-206f-4d6a-adef-4ab5e2620f00

📥 Commits

Reviewing files that changed from the base of the PR and between d2e1c6e and 700a488.

📒 Files selected for processing (15)
  • gateway/gateway-controller/pkg/controlplane/client.go
  • gateway/gateway-controller/pkg/controlplane/sync.go
  • gateway/gateway-controller/pkg/utils/api_utils.go
  • platform-api/src/config/config.go
  • platform-api/src/internal/dto/gateway_internal.go
  • platform-api/src/internal/handler/gateway_internal.go
  • platform-api/src/internal/repository/artifact.go
  • platform-api/src/internal/repository/interfaces.go
  • platform-api/src/internal/server/server.go
  • platform-api/src/internal/service/api.go
  • platform-api/src/internal/service/gateway_internal.go
  • platform-api/src/internal/service/llm.go
  • platform-api/src/internal/service/llm_test.go
  • platform-api/src/internal/service/mcp.go
  • platform-api/src/resources/gateway-internal-api.yaml

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b6e5ca5d-2a7c-443c-9c11-bcd24b3ba688

📥 Commits

Reviewing files that changed from the base of the PR and between 0ceffd9 and 20a488f.

📒 Files selected for processing (3)
  • gateway/gateway-controller/pkg/controlplane/client.go
  • gateway/gateway-controller/pkg/controlplane/llm_deletion_test.go
  • gateway/gateway-controller/pkg/controlplane/sync.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/gateway-controller/pkg/controlplane/client.go

@dushaniw
Copy link
Copy Markdown
Contributor Author

Gateway Delete Fix — Test Results

Date: 2026-03-31
Branch: main-gw-restart

Fix Verification Tests

Issue 1 — Resource delete sends WS event (live)

Deploy → undeploy → delete deployment → delete resource → verify gateway returns 404

Artifact Result
REST API PASS
LLM Provider PASS
LLM Proxy PASS
MCP Proxy PASS

Issue 3 — Artifact retained after deployment delete + restart

Deploy → undeploy → delete deployment (resource stays) → restart GW → verify artifact retained as undeployed → redeploy → verify deployed

Step Result
Artifact retained as undeployed after restart PASS
Sync logs show "Retaining artifact" via /artifacts/exists PASS
Redeploy same resource after restart PASS
Route accessible after redeploy PASS

Phase 1: Live Tests (WebSocket path)

Test REST API LLM Provider LLM Proxy MCP Proxy
Deploy PASS PASS PASS PASS
Undeploy PASS PASS PASS PASS
Restore PASS PASS PASS PASS
Delete (from deployed) PASS PASS PASS PASS
Delete (from undeployed) PASS PASS PASS PASS

Phase 2: Sync Tests (Gateway restart)

Test REST API LLM Provider LLM Proxy MCP Proxy
Fresh deploy while GW down PASS PASS PASS PASS
Undeploy while GW down PASS PASS PASS PASS
Restore while GW down PASS PASS PASS PASS
Delete (from deployed) while GW down PASS PASS PASS PASS
Delete (from undeployed) while GW down PASS PASS PASS PASS
No change (deployedAt unchanged) PASS N/A N/A N/A

@Krishanx92 Krishanx92 merged commit 8618e32 into wso2:main Mar 31, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants