Skip to content

cranelift(aarch64): share base+index across offset loads via uimm12 amode#13766

Open
darmie wants to merge 3 commits into
bytecodealliance:mainfrom
darmie:aarch64-amode-shared-base-cse
Open

cranelift(aarch64): share base+index across offset loads via uimm12 amode#13766
darmie wants to merge 3 commits into
bytecodealliance:mainfrom
darmie:aarch64-amode-shared-base-cse

Conversation

@darmie

@darmie darmie commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

When several loads share base + (extended index) and differ only by a
uimm12-scaled offset (e.g. fields read through a computed index), AArch64 amode
selection emitted a separate add base, #offset per load, defeating CSE of the
shared base + index.

This folds the offset into the load's immediate and keeps base + index as one
value, so a single add base, index, uxtw is materialized and reused. A zero
offset keeps the single-instruction RegExtended form, so it is
neutral-or-better: a cluster of N such loads drops N address-adds to 1, a lone
load is unchanged.

Tested: new aarch64 precise-output filetest + a stack round-trip runtest
(aarch64/x86_64/s390x); the full aarch64 filetest and runtest suites pass.

…mode

When several loads share `base + (extended index)` and differ only by a
uimm12-scaled offset, keep `base + index` as one CSE-able value and fold each
offset into the load's immediate, so a single `add base, index, uxtw` is
reused instead of a per-load `add base, #offset`. A zero offset keeps the
single-instruction RegExtended form.
@darmie darmie requested a review from a team as a code owner June 30, 2026 08:04
@darmie darmie requested review from alexcrichton and removed request for a team June 30, 2026 08:04
The amode CSE folds constant offsets into load/store immediates and keeps
base+index as one add. Single-use cases are neutral (offset moves into the
immediate); debug-exceptions sheds 2 instructions by reusing base+index
across four offsets.
@darmie darmie requested a review from a team as a code owner June 30, 2026 08:25
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Jun 30, 2026
@alexcrichton

Copy link
Copy Markdown
Member

Thanks for this! I'm wary of the duplication being added here though, and amode rules are some of the most sensitive in Cranelift so we ideally want as little duplication as possible. For example the offset=0 rules are duplicates of above rules, and even using AMode.UnsignedOffset is a duplicate of prior rules as well. Would it be possible to shuffle rules around and add if-let clauses perhaps to tweak existing rules? For example rules could use if-let or some sort of extraction matcher to ensure that offset is 0. Either that or the rule using AMode.UnsignedOffset could be given higher priority.

Address review feedback: the offset==0 rules only existed to beat the new
UnsignedOffset rules for a zero offset, duplicating rules 4/5. Replace them
with a `uimm12_scaled_nonzero_from_i64` guard on rules 8/9 so a zero offset
falls through to the existing RegExtended rules instead. The UnsignedOffset
rules simply take higher priority than 4/5 for non-zero offsets.

Codegen and all aarch64 filetests are unchanged.
@darmie

darmie commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

For example the offset=0 rules are duplicates of above rules, and even using AMode.UnsignedOffset is a duplicate of prior rules as well. Would it be possible to shuffle rules around and add if-let clauses perhaps to tweak existing rules? For example rules could use if-let or some sort of extraction matcher to ensure that offset is 0. Either that or the rule using AMode.UnsignedOffset could be given higher priority.

I have removed the duplication. Also added uimm12_scaled_nonzero_from_i64 (+ its verifier spec) so rules 8/9's UnsignedOffset only matches non-zero offsets and takes higher priority than 4/5; a zero offset falls through to the existing RegExtended rules naturally.

@github-actions github-actions Bot added the isle Related to the ISLE domain-specific language label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Subscribe to Label Action

cc @cfallin, @fitzgen

Details This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton

Copy link
Copy Markdown
Member

This helps remove some duplication, yeah, but this is still effectively a duplication of rule (2), however, and it's also a duplication of 3/4/5 in terms of matching pattern. My thinking is that what you want can be achieved by raising the priority of rule 2 because the premise here is that the static offset should have the highest priority for getting folded into the instruction itself while leaving all other computations in a CSE-able form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants