Scheduled & batched Lagoon redeploys (upgrade rollouts)#205
Open
dan2k3k4 wants to merge 2 commits into
Open
Conversation
0690ee7 to
4d32c02
Compare
0690ee7 to
2609166
Compare
Comment on lines
+91
to
+107
| $client = $this->lagoon->getAuthenticatedClient(); | ||
| $result = $client->bulkDeployEnvironments( | ||
| environments: $environments, | ||
| name: 'Polydock redeploy '.$run->uuid, | ||
| buildVariables: $buildVariables, | ||
| ); | ||
| } catch (\Throwable $e) { | ||
| Log::error('Redeploy trigger failed', ['run' => $run->uuid, 'error' => $e->getMessage()]); | ||
| $this->failRun($run, 'Trigger failed: '.$e->getMessage()); | ||
|
|
||
| return $run; | ||
| } | ||
|
|
||
| if (isset($result['error'])) { | ||
| $error = is_array($result['error']) ? json_encode($result['error']) : (string) $result['error']; | ||
| Log::error('Redeploy trigger returned error', ['run' => $run->uuid, 'error' => $error]); | ||
| $this->failRun($run, 'Trigger error: '.$error); |
Contributor
There was a problem hiding this comment.
bulk_chunk_size config is defined but never applied
config/polydock.php documents polydock.deploy.bulk_chunk_size as "Max environments per bulkDeployEnvironments mutation (avoids one huge call)", and the feature plan docs describe chunking by this value. However, PolydockDeploymentService::redeploy() sends all environments in a single bulkDeployEnvironments call without reading the config key or chunking the $environments array. With default settings (max_per_run = 50, bulk_chunk_size = 50) this is benign, but raising POLYDOCK_DEPLOY_MAX_PER_RUN via env var would produce oversized single mutations. The chunk config is effectively dead code.
9ab756b to
850b27c
Compare
- docs: add scheduled Lagoon redeploy feature plans - feat(deploy): deployment-tracking data model (plan 001) - feat(deploy): redeploy service + poll job (plan 002) - feat(deploy): scheduled cadence redeploy dispatch (plan 003) - feat(deploy): admin UI + permission for redeploys (plan 004) - test(deploy): update trigger-deploy command test for service refactor
850b27c to
91c2740
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Roll out Lagoon redeploys to running app instances — manually in bulk, and automatically on a per-app cadence — tracked in the admin panel. Redeploy-latest only; instances stay in
RUNNING_HEALTHY_*(does not use the UPGRADE state machine).Built from
plans/feature/(001–004; 005 deployment-windows is deferred). One commit per plan.What's included
polydock_deployment_runstable + model, cached per-instance deploy fields + indexednext_redeploy_at, store-app cadence columns,user_groups.is_beta.PolydockDeploymentService(single deploy code path: filters eligible/non-in-flight, onebulkDeployEnvironments, claims instances only after a successful trigger; poll maps deployments back by project+branch, a failed build never mutates instance status).PollDeploymentRunJob+polydock:deployments:poll(every 5m).polydock:dispatch-scheduled-redeploys(every 10m): due selection, per-run cap, group-by-store-app, beta cadence override, deterministic jitter, trials excluded.manage_polydock_deploymentspermission, instance list Last Deploy / Next Redeploy columns, gated "Redeploy selected" bulk action, store-app "Redeploy Schedule" form + "Redeploy all",is_betatoggle, read-only Deployments dashboard.Tests
Full suite green (316 passed); ~30 new tests cover eligibility, idempotency, failed-build safety, poll transitions, cadence/beta/jitter, and the admin gate.
Deploy notes
php artisan db:seed --class=SuperAdminRoleSeederto create/grant the new permission.redeploy_enableddefaults off — opt store apps in per app (recommend exercising on the pre-warm pool before enabling for claimed apps).Greptile Summary
This PR introduces scheduled and manual bulk Lagoon redeploys ("upgrade rollouts") for running app instances, tracked via a new
polydock_deployment_runstable and surfaced in the admin panel. It does not use the existing UPGRADE state machine — instances remain in theirRUNNING_HEALTHY_*statuses throughout.polydock_deployment_runstable + model, cached deploy columns onpolydock_app_instanceswithnext_redeploy_atindex, cadence columns onpolydock_store_apps, anduser_groups.is_beta.PolydockDeploymentServiceis the single redeploy code path — filters ineligible/in-flight instances, claims instances only after a successful trigger, and tracks run status viaPollDeploymentRunJob. A failed build never mutates instance lifecycle status.polydock:dispatch-scheduled-redeploys(every 10 min) selects due non-trial instances, caps per-run volume, groups by store app, applies beta cadence overrides, and spreads schedules with deterministic per-instance jitter.manage_polydock_deploymentspermission gate on bulk "Redeploy selected" action, store-app cadence form, "Redeploy all" action,is_betatoggle on user groups, and a read-only Deployments dashboard.Confidence Score: 4/5
The feature is well-structured and safe to merge for initial rollout with
redeploy_enabledoff by default; one config key documents a chunking behaviour that was never implemented.The core redeploy path is solid: post-trigger instance claiming prevents ghost in-flight locks, failed builds never mutate instance lifecycle status, and the poll loop is correctly bounded. One gap stands out:
config/polydock.phpdefinesbulk_chunk_sizewith an explicit comment that it prevents oversized single mutations to the Lagoon API, and the feature plans describe chunking by that value — butPolydockDeploymentService::redeploy()never reads the key and sends all environments in a single call. With default config (max_per_run = 50,bulk_chunk_size = 50) this is invisible, but operators raisingPOLYDOCK_DEPLOY_MAX_PER_RUNvia env var would produce uncapped single mutations.app/Services/PolydockDeploymentService.php (missing bulk chunk loop), app/Filament/Admin/Resources/PolydockStoreAppResource.php (missing with('deploymentRun') in redeploy_all action)
Important Files Changed
bulk_chunk_sizeconfig is documented but never read — all environments go in a single Lagoon API call.refresh()in the confirmation loop is an N+1 pattern; a single batch query would be cleaner.isRedeployEligible(),hasInFlightDeployment(), and thedeploymentRunBelongsTo relation. All new behaviour is additive and non-breaking.redeploy_allaction. Instance fetch for the action is missingwith('deploymentRun'), triggering N+1 queries viahasInFlightDeployment()in the service.currentUserCanManage(). Clean implementation.poll_attempts < maxAttemptsguard andlast_polled_atbackoff. Correct and well-bounded.polydock_deployment_runstable: appropriate nullable FKs withnullOnDelete, composite index on(status, last_polled_at)for the poll command query, and reversibledown().deployment_run_idFK topolydock_app_instanceswithnullOnDelete. Composite index on(polydock_store_app_id, next_redeploy_at)supports the cadence query. Fully reversible.Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant Scheduler as Cron / Admin UI participant Cmd as DispatchScheduledRedeploysCommand participant Svc as PolydockDeploymentService participant Lagoon as Lagoon API participant DB as Database Scheduler->>Cmd: polydock:dispatch-scheduled-redeploys (every 10m) Cmd->>DB: SELECT eligible, due, non-trial instances (limit max_per_run) DB-->>Cmd: instances grouped by store_app_id loop per store-app group Cmd->>Svc: redeploy(group, SCHEDULED) Svc->>Svc: filter ineligible + in-flight Svc->>DB: INSERT polydock_deployment_runs (RUNNING) Svc->>Lagoon: bulkDeployEnvironments(environments[]) Lagoon-->>Svc: bulkDeployEnvironmentLatest (bulk_id) Svc->>DB: UPDATE instances SET deployment_run_id Svc->>DB: dispatch PollDeploymentRunJob Svc-->>Cmd: PolydockDeploymentRun Cmd->>DB: UPDATE next_redeploy_at for claimed instances end note over Scheduler: polydock:deployments:poll (every 5m) Scheduler->>DB: SELECT non-terminal runs due for poll DB-->>Scheduler: in-flight runs loop per run Scheduler->>Lagoon: getDeploymentsByBulkId(bulk_id) Lagoon-->>Scheduler: deployment statuses Scheduler->>DB: UPDATE instances cached deploy state alt all terminal Scheduler->>DB: UPDATE run status COMPLETED / PARTIAL_FAILED / FAILED else poll exhausted Scheduler->>DB: UPDATE run status FAILED end end%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant Scheduler as Cron / Admin UI participant Cmd as DispatchScheduledRedeploysCommand participant Svc as PolydockDeploymentService participant Lagoon as Lagoon API participant DB as Database Scheduler->>Cmd: polydock:dispatch-scheduled-redeploys (every 10m) Cmd->>DB: SELECT eligible, due, non-trial instances (limit max_per_run) DB-->>Cmd: instances grouped by store_app_id loop per store-app group Cmd->>Svc: redeploy(group, SCHEDULED) Svc->>Svc: filter ineligible + in-flight Svc->>DB: INSERT polydock_deployment_runs (RUNNING) Svc->>Lagoon: bulkDeployEnvironments(environments[]) Lagoon-->>Svc: bulkDeployEnvironmentLatest (bulk_id) Svc->>DB: UPDATE instances SET deployment_run_id Svc->>DB: dispatch PollDeploymentRunJob Svc-->>Cmd: PolydockDeploymentRun Cmd->>DB: UPDATE next_redeploy_at for claimed instances end note over Scheduler: polydock:deployments:poll (every 5m) Scheduler->>DB: SELECT non-terminal runs due for poll DB-->>Scheduler: in-flight runs loop per run Scheduler->>Lagoon: getDeploymentsByBulkId(bulk_id) Lagoon-->>Scheduler: deployment statuses Scheduler->>DB: UPDATE instances cached deploy state alt all terminal Scheduler->>DB: UPDATE run status COMPLETED / PARTIAL_FAILED / FAILED else poll exhausted Scheduler->>DB: UPDATE run status FAILED end endReviews (1): Last reviewed commit: "feat(deploy): add deploy management" | Re-trigger Greptile