Skip to content

[New Permission 5/5] smartcontract: enforce topology/resource/index permission flags#3943

Merged
juan-malbeclabs merged 13 commits into
mainfrom
jo/permission-trindex-enforce
Jun 30, 2026
Merged

[New Permission 5/5] smartcontract: enforce topology/resource/index permission flags#3943
juan-malbeclabs merged 13 commits into
mainfrom
jo/permission-trindex-enforce

Conversation

@juan-malbeclabs

@juan-malbeclabs juan-malbeclabs commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Permission rollout — stacked PR series. Review/merge order:

  1. #3206 — enforce Permission-based authorization in existing instructions
  2. #3942 — define topology/resource/index permission flags (3/4) — stacked on [New Permission 3/5] smartcontract: enforce Permission-based authorization in existing instructions #3206
  3. #3943 — enforce topology/resource/index permission flags (4/4) — stacked on [New Permission 4/5] smartcontract: define topology/resource/index permission flags #3942

Retarget each PR's base to main as its upstream merges.
👉 You are here: #3943 (PR 4/4).

Summary

  • Replaces the direct foundation_allowlist checks with authorize() in topology (create / delete / clear / assign-node-segments) → TOPOLOGY_ADMIN, resource (create / allocate / deallocate / close) → RESOURCE_ADMIN, and index (create / delete) → INDEX_ADMIN.
  • The Permission account is read as the optional trailing account; the variable-length clear and assign-node-segments layouts detect it by PDA match before consuming their link/device lists.
  • Backward compatible: with no Permission account the legacy foundation path still applies, so existing callers (controlplane, current SDK) keep working.
  • Switches the corresponding SDK commands to execute_authorized_transaction so the payer's Permission PDA is appended when it exists onchain.

Behavior change: the topology instructions now reject unauthorized callers with NotAllowed (Custom(8)) instead of Unauthorized (Custom(22)), since authorize() is the single rejection path; affected tests updated.

Depends on the 3/4 PR (stacked).

Testing Verification

  • New end-to-end test test_topology_create_with_permission_account_allowed: a non-foundation key holding TOPOLOGY_ADMIN creates a topology via its Permission account — exercises the new authorization path through a real processor.
  • topology / index / resource / permission integration suites pass; topology error-code assertions updated UnauthorizedNotAllowed.

Reference: smartcontract/programs/doublezero-serviceability/PERMISSION.md

@juan-malbeclabs juan-malbeclabs changed the title [New Permission 4/4] smartcontract: enforce topology/resource/index permission flags [New Permission 5/5] smartcontract: enforce topology/resource/index permission flags Jun 27, 2026
@bgm-malbeclabs

Copy link
Copy Markdown
Contributor

Review

This swaps the direct foundation_allowlist.contains(payer) checks for the shared authorize() helper gated on TOPOLOGY_ADMIN/RESOURCE_ADMIN/INDEX_ADMIN, and points the SDK commands at execute_authorized_transaction so the payer's Permission PDA is appended when it exists. Backward-compat (no Permission account → legacy foundation path) looks right.

The fixed-layout processors (topology/create+delete, index/create+delete, all four resource/*) are correct: each consumes system_program in its prologue before calling authorize(accounts_iter, …), so authorize reads the optional trailing Permission account in the right slot. index/delete correctly adds the _system_program consume that was previously missing. The variable-length assign_node_segments rewrite is also correct — it collects all_remaining and parses payer/system/permission from the end, which matches what build_and_send produces.

1. clear reads payer/system_program from fixed front positions, but the SDK appends them after the variable link list

Unlike assign_node_segments (which parses the trailing accounts from the end), clear reads payer at [2] and system_program at [3] in its prologue, then treats everything after as links + the optional trailing Permission. But the production path is SDK clear (fixed = [topology, globalstate, payer] + links)build_and_send appends [payer, system, permission?], yielding on-wire:

[topology, globalstate, payer, link1..linkN, payer, system, permission?]

So the processor reads link1 as _system_program, and the link-processing loop then hits the trailing payer/system accounts and runs validate_program_account!(…, "Link") on them → fails (a wallet/system account is not a program-owned Link). Effect: doublezero topology clear against a topology with ≥1 link reverts; even with 0 links the appended payer/system get validated as links and revert.

assign_node_segments is the model to follow — parse payer/system/permission off the tail of all_remaining. (The redundant payer already sitting in the SDK clear command's fixed_accounts at [2], duplicated by build_and_send, is the tell that this layout never lined up.) This likely predates the PR, but it now layers the new Permission-detection logic on top of the broken layout and documents [last] permission … after the links as if it were correct.

2. The integration-test helper builds a different account order than the production SDK, which is why #1 isn't caught

create_transaction_with_extra_accounts builds [accounts, payer, system, extra_accounts] — placing payer/system before the link accounts, matching the clear processor's prologue assumption but not build_and_send's [accounts, payer, system, permission] (extras before trailing). For create the two orders coincide (no variable accounts), so the new test_topology_create_with_permission_account_allowed test is valid; for clear/assign they diverge, so clear passes in tests while failing through the CLI. Worth an end-to-end test that exercises clear (and the new Permission path) through execute_authorized_transaction rather than the hand-rolled helper.


Everything else checks out: the authorize() OR-semantics, the Unauthorized (22)NotAllowed (8) migration and updated assertions, the Option::into_iter() adapter feeding authorize for the variable-length processors, and the test_topology_delete_fails_when_link_references_it change (dropping the stray link account that authorize would now reject as a non-matching Permission PDA).

I'd treat #1 as the one to confirm before merge — run a clear with at least one link through the actual CLI/SDK.

- Add execute_authorized_transaction_quiet to DoubleZeroClient trait and
  DZClient impl, restoring quiet mode for ban and closeaccount commands
  that was lost when switching to execute_authorized_transaction
- Restore onchain allocation support in CreateSubscribeUserCommand SDK
  command (feature-flag-gated ResourceExtension PDA logic removed in
  permission enforcement refactor)
- Restore atomic path tests and fixture resource extension PDA fields in
  create_subscribe_user_test.rs
- SDK: route execute_transaction and execute_authorized_transaction through a
  single execute_transaction_inner, restoring the protocol-max compute-budget /
  heap-frame requests, skip_preflight, and structured error handling that the
  build_and_send detour had dropped for the whole SDK (C1/H1).
- SDK: append the Permission PDA as the trailing account so it stays after
  payer+system, matching authorize()'s read order; the quiet authorized variant
  now shares the same builder (H2/H3/L3).
- SDK: memoize the Permission PDA lookup and retry it on transient RPC errors,
  removing the per-send un-retried get_account (M3).
- SDK: add DZClient-level tests asserting compute-budget presence and the
  trailing Permission account, with and without a permission account (M2).
- authorize: drop the retired activator authority from USER_ADMIN/NETWORK_ADMIN/
  MULTICAST_ADMIN legacy fallbacks; document the ACCESS_PASS_ADMIN expansion (M1).
- authorize: verify program ownership before inspecting account data (L1).
- accesspass/set: document the optional-account count invariant (M4).
- CHANGELOG: correct the SDK entry to match the unified builder.
…delete/ban

DeleteUserCommand and RequestBanUserCommand authorize the final instruction
with USER_ADMIN, but first strip the user's multicast roles through
UpdateMulticastGroupRoles, whose processor only accepted the access-pass owner
or a foundation member. A USER_ADMIN-only payer was therefore blocked on the
prerequisite cleanup.

Allow the role-update processor to accept USER_ADMIN for the removal-only case
(no roles being granted), and route UpdateMulticastGroupRolesCommand through
execute_authorized_transaction so the payer's Permission account is appended
when present. Adding roles still requires the owner or foundation.
…to-injection

The SDK auto-appends the payer's Permission PDA whenever it exists on-chain, so
a foundation member whose own Permission account was suspended, under-privileged,
or uninitialized was routed through the Some branch of authorize() and denied
the PERMISSION_ADMIN instruction needed to repair it — the None-branch lockout
fallback never ran.

Extract foundation_permission_recovery() and apply it in the Some branch too:
when the supplied Permission account does not grant the flag, a foundation
member requesting PERMISSION_ADMIN is still allowed. PDA-address and ownership
checks remain hard errors; recovery is scoped to PERMISSION_ADMIN only.
…rmission flags

Add TOPOLOGY_ADMIN (1<<15), RESOURCE_ADMIN (1<<16), and INDEX_ADMIN (1<<17)
to permission_flags, with legacy authorization mapping each to the foundation
allowlist in authorize() (plus unit tests). Expose the new names in the
serviceability CLI (topology-admin/resource-admin/index-admin) and add the
matching constants to the Go, TypeScript, and Python SDKs.

Documents the flags in PERMISSION.md and the serviceability README. This
defines the permissions; enforcement in the processors lands in 4/4.
@juan-malbeclabs juan-malbeclabs force-pushed the jo/permission-trindex-flags branch from 696cfc3 to 0c9e787 Compare June 30, 2026 13:23
…ermission flags

Replace the direct foundation_allowlist checks in the topology
(create/delete/clear/assign-node-segments), resource
(create/allocate/deallocate/close), and index (create/delete) instructions with
authorize() calls requesting TOPOLOGY_ADMIN/RESOURCE_ADMIN/INDEX_ADMIN. The
Permission account is read as the optional trailing account; the variable-length
clear and assign-node-segments layouts detect it by PDA match before consuming
their link/device lists. Backward compatible: with no Permission account the
legacy foundation path still applies.

Switch the corresponding SDK commands to execute_authorized_transaction so the
payer's Permission PDA is appended when it exists, and add an end-to-end test
covering topology creation via a TOPOLOGY_ADMIN Permission account.
@juan-malbeclabs juan-malbeclabs force-pushed the jo/permission-trindex-enforce branch from 1a8d0db to 929958f Compare June 30, 2026 13:33
… accounts

The clear processor read payer/system_program at fixed front positions [2]/[3]
while the SDK client appends them (plus the optional Permission account) after
the variable-length link list, so doublezero topology clear reverted whenever
links were passed. Parse payer/system/permission off the tail instead, mirroring
assign_node_segments, and drop the redundant payer from the SDK fixed_accounts.

Add a production-layout test helper and a TOPOLOGY_ADMIN permission clear test so
the real on-wire layout is exercised (the previous helper placed payer/system
before the extras, masking the mismatch).
@juan-malbeclabs

juan-malbeclabs commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review, both findings are valid and I've pushed a fix in 77e86d4.

#1 ClearTopology account layout. Confirmed. The processor read payer/system_program at fixed positions [2]/[3], but assemble_instructions appends [payer, system, permission?] after the caller's accounts (i.e. after the variable-length link list), so the on-wire layout is [topology, globalstate, payer, link1..N, payer, system, permission?]. The processor read link1 as system_program and then validated the trailing payer/system as Link accounts → revert (even with 0 links). As you noted, this predated this PR, but the PR layered the Permission-detection logic on top of the broken layout. Fix mirrors assign_node_segments: parse payer/system/permission off the tail of the remaining accounts, and drop the redundant payer from the SDK clear fixed_accounts (the duplicate was the tell that the layout never lined up).

#2 test helper masked it. Confirmed. create_transaction_with_extra_accounts builds [accounts, payer, system, extra_accounts], placing payer/system before the extras, which matched the broken prologue but not production. Added create_authorized_transaction (mirrors assemble_instructions exactly), reworked the clear_topology test helper to use it so the existing clear tests now exercise the real on-wire order, and added test_topology_clear_with_permission_account_allowed a non-foundation TOPOLOGY_ADMIN holder clearing a topology from a link with the Permission PDA as the trailing account, ≥1 link, through the production layout.

create was unaffected (no variable accounts, so the two orders coincide) that test stays valid.

juan-malbeclabs added a commit that referenced this pull request Jun 30, 2026
…ation in existing instructions (#3206)

> **Permission rollout — stacked PR series.** Review/merge order:
> 1. [#3206 — enforce Permission-based authorization in existing
instructions](#3206)
> 2. [#3942 — define topology/resource/index permission flags
(3/4)](#3942) — stacked on
#3206
> 3. [#3943 — enforce topology/resource/index permission flags
(4/4)](#3943) — stacked on
#3942
>
> Retarget each PR's base to `main` as its upstream merges.
> **👉 You are here: #3206 (base of the follow-up stack).**

## Summary of Changes
- Wire `authorize()` into the privileged instruction processors that
gate on it: `accesspass/{close,set}` (`ACCESS_PASS_ADMIN`) and
`user/{delete,requestban}` (`USER_ADMIN`). Each appends the caller's
Permission PDA as an optional trailing account, read last by
`authorize()`. The user owner can still delete their own account without
a Permission account.
- Add `execute_authorized_transaction` (and its `_quiet` variant) to the
Rust SDK. They share the existing `execute_transaction_inner` builder,
so compute-budget, preflight, and error-reporting behavior are identical
to `execute_transaction`; the only difference is the optional trailing
Permission PDA, which is resolved at most once per client (retried on
transient RPC errors, then memoized).
- Switch the affected SDK commands (`accesspass/{close,set}`,
`user/{delete,requestban}`, `tenant/delete`, and the `permission/*` CRUD
commands) to `execute_authorized_transaction` so the Permission account
is included transparently.
- Drop the retired activator authority from the
`USER_ADMIN`/`NETWORK_ADMIN`/`MULTICAST_ADMIN` legacy fallbacks.
`ACCESS_PASS_ADMIN` legacy authority (foundation + sentinel + feed) is
applied uniformly across its instructions and documented in
`authorize.rs`.

## Testing Verification
- New DZClient-level tests assert the assembled instruction list carries
the protocol-max compute-budget/heap-frame requests and that the
Permission account is appended after `payer`+`system`, with and without
an on-chain permission account.
- `authorize` unit tests cover the legacy fallbacks (including the
activator now being denied for the admin roles) and the
Permission-account path (PDA, ownership, discriminator, status, and flag
checks).
- `cargo test -p doublezero_sdk` and `cargo test -p
doublezero-serviceability` pass.

Related RFC/series: the Permission account and `authorize()` mechanism
land in the earlier PRs of this series; this PR enforces them in
existing instructions.
@elitegreg

Copy link
Copy Markdown
Contributor

🚫 Blocking (verified): topology clear reverts via the SDK/CLI — account ordering mismatch

ClearTopologyCommand and process_topology_clear disagree on where payer/system sit, and the SDK/CLI path reverts. This appears pre-existing (introduced with the topology commands in RFC-18 PR 2/5), but clear is one of the four topology instructions this PR migrates to authorize(), and the PR states existing callers keep working — so flagging it here.

Mismatch

  • process_topology_clear expects [topology, globalstate, payer, system, links..] — payer/system before the link list.
  • ClearTopologyCommand builds fixed_accounts = [topology, globalstate, payer] + links, then execute_authorized_transactionassemble_instructions appends [payer, system, permission?] at the end. Runtime layout:
    [topology, globalstate, payer, links.., payer, system, permission?]
    
    payer appears twice and system lands after the links, so the builder-appended accounts fall into the link-validation loop.

Why the existing tests don't catch it: the integration tests build accounts via create_transaction_with_extra_accounts, which inserts [payer, system] before the extras — matching the processor, not the SDK. The CLI/SDK unit tests use a mock client and never run the processor.

Verified by feeding the exact SDK ordering to the real processor through the banks-client harness:

thread 'solBankForksCli' panicked at processors/topology/clear.rs:87:9:
assertion `left == right` failed: Invalid Link Account Owner
  left: 11111111111111111111111111111111   (system program)
=> Err(TransactionError(InstructionError(0, ProgramFailedToComplete)))

Contrast with assign_node_segments (which is correct): its command omits payer from fixed_accounts and its processor reads payer/system/permission from the end of the account list, fully consistent with the builder.

Suggested fix: align clear with assign_node_segments — drop payer from ClearTopologyCommand::fixed_accounts, and have process_topology_clear read [links.., payer, system, permission?] from the end (same end-detection it already uses for the optional permission account) instead of expecting payer/system at indices 2/3.

The rest of the PR looks good — the seven fixed-layout processors all consume system before authorize(), assign_node_segments permission detection is sound, and the legacy/foundation fallback preserves existing behavior.

@elitegreg elitegreg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock the stack. One blocking-quality issue noted in a separate comment: topology clear reverts via the SDK/CLI due to an account-ordering mismatch in ClearTopologyCommand (verified against the real processor). It looks pre-existing (RFC-18 PR 2/5) rather than introduced here, so approving — but please address it (here or a fast follow-up) since this PR migrates clear to the new authorization path. Everything else looks correct.

UPDATE: I think this was basically Ben's comment and you fixed it.

Base automatically changed from jo/permission-trindex-flags to main June 30, 2026 15:43
juan-malbeclabs added a commit that referenced this pull request Jun 30, 2026
…rmission flags (#3942)

> **Permission rollout — stacked PR series.** Review/merge order:
> 1. [#3206 — enforce Permission-based authorization in existing
instructions](#3206)
> 2. [#3942 — define topology/resource/index permission flags
(3/4)](#3942) — stacked on
#3206
> 3. [#3943 — enforce topology/resource/index permission flags
(4/4)](#3943) — stacked on
#3942
>
> Retarget each PR's base to `main` as its upstream merges.
> **👉 You are here: #3942 (PR 3/4).**

## Summary

- Adds three permission flags — `TOPOLOGY_ADMIN` (`1<<15`),
`RESOURCE_ADMIN` (`1<<16`), `INDEX_ADMIN` (`1<<17`) — so segment-routing
topologies, `ResourceExtension` accounts, and internal `Index` accounts
can be delegated without granting the broad `FOUNDATION` flag (today
they are foundation-only).
- Maps each flag to the `foundation_allowlist` in `authorize()`'s legacy
path, so authorization behavior is unchanged until a `Permission`
account is supplied.
- Exposes the names (`topology-admin` / `resource-admin` /
`index-admin`) in the serviceability CLI for `permission set --add` /
`--remove`, and adds matching constants to the Go, TypeScript, and
Python SDKs.
- Documents the flags in `PERMISSION.md` and the serviceability
`README.md`.

This PR is definition-only; processor enforcement lands in the follow-up
4/4. Stacked on #3206.

## Testing Verification

- `authorize()` legacy-mapping unit tests for each new flag (allowed via
a foundation member, denied for others).
- CLI permission name↔bitmask round-trip tests cover the three new
names.

Reference:
`smartcontract/programs/doublezero-serviceability/PERMISSION.md`
…auth fallback

Extract the duplicated trailing-account parsing in topology clear and
assign-node-segments into a shared authorize::split_trailing_permission
helper, so the variable-length [leading.., payer, system, permission?]
layout is parsed in one tested place.

Also broaden authorize(): when a payer's auto-injected Permission account
exists but does not grant the requested flag, fall back to the legacy
allowlist check as long as RequirePermissionAccounts is off. This makes the
Some/None branches symmetric until strict mode is enabled, so a foundation
(or other legacy-authorized) key is not locked out of an instruction merely
because it also holds an unrelated, under-privileged Permission account.
@juan-malbeclabs

Copy link
Copy Markdown
Contributor Author

Follow-up: shared trailing-account helper + broadened legacy auth fallback

Two changes pushed in 641791e3:

1. Extract split_trailing_permission helper

The variable-length processors (topology clear, assign-node-segments) each carried an identical ~20-line block that peels payer / system_program / optional permission off the tail of the account list (the [leading.., payer, system, permission?] layout the SDK emits). This is subtle, security-sensitive parsing, so I moved it into a single tested helper, authorize::split_trailing_permission, and replaced both copies with one call. Any future layout fix now happens in one place.

2. Broaden the legacy fallback in authorize() while strict mode is off

Because the SDK auto-injects the payer's Permission PDA whenever one exists on-chain, a payer who holds a Permission account that doesn't grant the requested flag was routed through the Some branch and denied — even when RequirePermissionAccounts is off and the legacy allowlist would have authorized them. Concretely, a foundation member who also holds an unrelated, under-privileged Permission account would lose access to topology/resource/index management once these instructions enforce authorize().

The Some branch now falls back to the full legacy allowlist check (check_legacy_any) when the supplied Permission account is insufficient and RequirePermissionAccounts is off — mirroring the None branch. The two branches are now symmetric until strict mode is enabled; once it is on, only the Permission account (or the foundation PERMISSION_ADMIN recovery) authorizes.

Tests

  • Updated test_permission_account_foundation_recovery_only_for_permission_admin to run in strict mode, where its "recovery is PERMISSION_ADMIN-only" assertion is the one that actually holds.
  • Added test_permission_account_insufficient_falls_back_to_legacy_when_flag_off (allowed with flag off) and test_permission_account_insufficient_denied_when_flag_on (denied with flag on).
  • Full doublezero-serviceability suite green; authorize unit tests 60/60.

@juan-malbeclabs juan-malbeclabs enabled auto-merge (squash) June 30, 2026 18:04
@juan-malbeclabs juan-malbeclabs merged commit bb5c354 into main Jun 30, 2026
38 of 40 checks passed
@juan-malbeclabs juan-malbeclabs deleted the jo/permission-trindex-enforce branch June 30, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants