Fix gateway not receiving delete events for artifact removal#1602
Fix gateway not receiving delete events for artifact removal#1602Krishanx92 merged 8 commits intowso2:mainfrom
Conversation
WalkthroughGateway controller now queries the platform API to check artifact existence before deleting deployments and loads artifact configs from the DB for API-key reconciliation. Platform API adds an internal POST /api/internal/v1/artifacts/exists endpoint, repository support for multi-UUID existence checks, and rewires services to broadcast deletions to all organization gateways. Changes
Sequence DiagramsequenceDiagram
actor GC as Gateway Controller
participant APIUTIL as APIUtilsService
participant HANDLER as GatewayInternalHandler
participant SVC as GatewayInternalAPIService
participant REPO as ArtifactRepository
participant DB as Database
GC->>APIUTIL: CheckArtifactsExist(artifactIDs)
APIUTIL->>HANDLER: POST /api/internal/v1/artifacts/exists {artifactIds}
HANDLER->>SVC: CheckArtifactsExist(orgID, artifactIDs)
SVC->>REPO: ExistsByUUIDs(artifactIDs, orgUUID)
REPO->>DB: SELECT uuid FROM artifacts WHERE uuid IN (...) AND organization_uuid=?
DB-->>REPO: existing UUIDs
REPO-->>SVC: []string (existing UUIDs)
SVC-->>HANDLER: ArtifactsExistResponse{artifacts}
HANDLER-->>APIUTIL: 200 {artifacts}
APIUTIL-->>GC: []string (existing IDs)
GC->>GC: filter deletions (remove IDs reported existing)
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/gateway-controller/pkg/controlplane/sync.go`:
- Around line 103-107: The current error path logs a failure to call
/artifacts/exists and then proceeds to delete everything in diff.toDelete;
instead, make this path conservative: when the artifacts-existence check returns
an error (the branch where c.logger.Warn is called), do NOT hard-delete
candidates in diff.toDelete — either abort the deletion step (return/skip) or
filter diff.toDelete to only include items explicitly confirmed absent; update
the code around the existence-check call and the deletion consumer that uses
diff.toDelete so it skips deletions on error. Apply the same change to the other
identical block around lines handling diff.toDelete (the block referenced in the
review at 128-129) so both places avoid destructive fallback on transient
errors.
In `@platform-api/src/internal/dto/gateway_internal.go`:
- Around line 90-104: The ArtifactsExistRequest can contain an unbounded
ArtifactIDs slice which causes ArtifactRepo.ExistsByUUIDs to build a single
large IN query and hit DB parameter limits; update the handler/service that
processes ArtifactsExistRequest to either enforce a hard cap on len(ArtifactIDs)
(e.g., return 400 when > MAX_IDS) or batch calls to ArtifactRepo.ExistsByUUIDs
in fixed-size chunks (e.g., CHUNK_SIZE = 500–1000) and merge results into
ArtifactsExistResponse.Artifacts; locate the logic that consumes
ArtifactsExistRequest and modify it to split ArtifactIDs into chunks, call
ArtifactRepo.ExistsByUUIDs for each chunk, collect ArtifactsExistenceInfo
entries, and return the aggregated response (or validate and reject oversized
requests) to avoid single huge IN (...) queries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 02691465-206f-4d6a-adef-4ab5e2620f00
📒 Files selected for processing (15)
gateway/gateway-controller/pkg/controlplane/client.gogateway/gateway-controller/pkg/controlplane/sync.gogateway/gateway-controller/pkg/utils/api_utils.goplatform-api/src/config/config.goplatform-api/src/internal/dto/gateway_internal.goplatform-api/src/internal/handler/gateway_internal.goplatform-api/src/internal/repository/artifact.goplatform-api/src/internal/repository/interfaces.goplatform-api/src/internal/server/server.goplatform-api/src/internal/service/api.goplatform-api/src/internal/service/gateway_internal.goplatform-api/src/internal/service/llm.goplatform-api/src/internal/service/llm_test.goplatform-api/src/internal/service/mcp.goplatform-api/src/resources/gateway-internal-api.yaml
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b6e5ca5d-2a7c-443c-9c11-bcd24b3ba688
📒 Files selected for processing (3)
gateway/gateway-controller/pkg/controlplane/client.gogateway/gateway-controller/pkg/controlplane/llm_deletion_test.gogateway/gateway-controller/pkg/controlplane/sync.go
🚧 Files skipped from review as they are similar to previous changes (1)
- gateway/gateway-controller/pkg/controlplane/client.go
Gateway Delete Fix — Test ResultsDate: 2026-03-31 Fix Verification TestsIssue 1 — Resource delete sends WS event (live)Deploy → undeploy → delete deployment → delete resource → verify gateway returns 404
Issue 3 — Artifact retained after deployment delete + restartDeploy → undeploy → delete deployment (resource stays) → restart GW → verify artifact retained as undeployed → redeploy → verify deployed
Phase 1: Live Tests (WebSocket path)
Phase 2: Sync Tests (Gateway restart)
|
Problem
When following the standard deletion flow (undeploy → delete deployment → delete resource), the gateway never receives the WebSocket delete event. This causes stale "undeployed" artifacts to remain on the gateway, blocking future deployments of the same handle with a "configuration already exists" error.
Root cause: Deleting a deployment cascade-deletes its deployment_status row. When the resource is subsequently deleted, GetDeployedGatewayIDs() queries deployment_status and finds zero rows, so no delete event is sent to any gateway.
Additionally, when a deployment is deleted (but the resource still exists) and the gateway restarts, the sync treats the artifact as an orphan and removes it along with its API keys and subscriptions. If the user later redeploys the same resource, the keys are lost until the next gateway restart.
Changes
Issue 1 — Resource delete now broadcasts to all org gateways:
All four resource delete services (REST API, LLM Provider, LLM Proxy, MCP Proxy) now use gatewayRepo.GetByOrganizationID() instead of deploymentRepo.GetDeployedGatewayIDs() to find target
gateways. This ensures delete events are always delivered regardless of deployment status. The gateway delete handler safely no-ops if the artifact doesn't exist locally.
Issue 3 — New /artifacts/exists endpoint for sync-time orphan detection:
Added POST /api/internal/v1/artifacts/exists endpoint that accepts a list of artifact UUIDs and returns whether each still exists on the platform. The gateway calls this during startup sync
before deleting orphaned artifacts. Artifacts that still exist (but have no active deployment) are retained, preserving their API keys and subscriptions.
Test plan
Summary by CodeRabbit
Improvements
New Features
Config