Skip to content

JIT: extend loop cloning for span+stride>1 and ±const limits#129309

Open
AndyAyersMS wants to merge 3 commits into
dotnet:mainfrom
AndyAyersMS:loop-clone-non-unit-iv
Open

JIT: extend loop cloning for span+stride>1 and ±const limits#129309
AndyAyersMS wants to merge 3 commits into
dotnet:mainfrom
AndyAyersMS:loop-clone-non-unit-iv

Conversation

@AndyAyersMS

@AndyAyersMS AndyAyersMS commented Jun 11, 2026

Copy link
Copy Markdown
Member

Enable cloning of loops over Span when the stride s is greater than 1, guarded by a runtime limit <= INT_MAX - s + 1 condition on the increasing var-limit case. Other forms are already implicitly safe.

Extend MatchLimit to peel a constant offset off the limit so loops like

  for (int i = 0; i < span.Length - K; i += K) { ... }

become clonable. Add a LimitOffset to NaturalLoopIterInfo that feeds into LC_Ident, the zero-trip / per-access / NE / overflow guards, and a new arr.Length + offset >= 0 guard (for very short array lengths and negative offsets).

Enable cloning of loops over Span<T> when the stride is greater than 1,
guarded by a runtime `limit <= INT_MAX - s + 1` condition on the
increasing var-limit case (other forms are already implicitly safe).

Extend MatchLimit to peel a constant offset off the limit so loops like
`for (i; i < arr.Length - K; i += K)` -- the common Vector<T>.Count
vectorization warm-up -- become clonable. NaturalLoopIterInfo gains a
LimitOffset that flows through LC_Ident (Var and ArrAccess gain an
offset), the zero-trip / per-access / NE / overflow guards, and a new
`arr.Length + offset >= 0` guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 20:35
@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 11, 2026
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS

Copy link
Copy Markdown
Member Author

@EgorBo PTAL
fyi @dotnet/jit-contrib

Impacts ~200 methods across SPMI. If you know of specific cases that should be handled in BCL / etc let me know and I'll verify.

@AndyAyersMS

Copy link
Copy Markdown
Member Author

Also want to see if IV opts can now understand the IV does not overflow in the fast path.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends CoreCLR JIT loop cloning’s ability to reason about loop limits and spans by (1) modeling limitBase ± const limits as a base + offset and (2) enabling span-based loop cloning for non-unit strides with an added overflow safety guard. It also adds new JIT tests that exercise non-unit stride span loops and offset limits like Length - K.

Changes:

  • Add NaturalLoopIterInfo::LimitOffset plus LimitBase() to represent and consume base ± const loop limits.
  • Extend LC_Ident to carry an offset for Var and ArrAccess, and plumb it through condition generation.
  • Update loop cloning condition derivation to support span stride > 1 (increasing loops) with an overflow bound, and add a guard for negative arr.Length + offset limits; add new tests for these scenarios.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/coreclr/jit/compiler.h Adds LimitOffset and declares LimitBase() on NaturalLoopIterInfo.
src/coreclr/jit/flowgraph.cpp Implements limit peeling into LimitOffset, adds debug printing, and implements LimitBase(); updates limit accessors to use it.
src/coreclr/jit/loopcloning.h Extends LC_Ident with offset and updates equality/printing/constructors.
src/coreclr/jit/loopcloning.cpp Materializes LC_Ident offsets, enables span stride>1 cloning with an overflow guard, and threads LimitOffset into emitted conditions.
src/tests/JIT/opt/Cloning/SpanNonUnitStride.csproj New JIT test project for span non-unit stride cloning scenarios.
src/tests/JIT/opt/Cloning/SpanNonUnitStride.cs New tests covering span loops with stride 2/3 and various comparison forms.
src/tests/JIT/opt/Cloning/OffsetLimit.csproj New JIT test project for limit-offset recognition scenarios.
src/tests/JIT/opt/Cloning/OffsetLimit.cs New tests covering Length - K, SIMD-count offsets, and related limit forms.

Comment thread src/coreclr/jit/flowgraph.cpp Outdated
Morph normally folds `x + 0` / `x - 0` before MatchLimit runs, but if
such a tree slips through the peel would set HasArrayLengthLimit /
HasInvariantLocalLimit based on the peeled base while LimitOffset
stayed 0 -- then LimitBase() would short-circuit to the original
GT_ADD tree and trip the GT_ARR_LENGTH / GT_LCL_VAR asserts in
ArrLenLimit / VarLimit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tannergooding

Copy link
Copy Markdown
Member

Will this also work for cases like:

  for (int i = 0; i >= 0 && i < arr.Length; i += K) { ... }
// or its functional equivalent
  for (int i = 0; (uint)i < (uint)arr.Length; i += K) { ... }

In both cases, overflow can happen but the guard check ensures that i cannot be 0, so it must also be safe and so the arr.Length - K shouldn't be required.

@AndyAyersMS

AndyAyersMS commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

We already handled arrays with K less than 58 because the max array length limits limit when overflow can happen (I'll double check for your cases just to be sure).

This PR enables something similar for span where there aren't those same guarantees. (will edit the commit comment).

@AndyAyersMS

Copy link
Copy Markdown
Member Author

Note

AI-generated comment.

Prototyped the IV no-overflow analysis as a follow-up (draft branch iv-no-overflow-from-dom). The new code does let SCEV prove the fast clone's IV cannot overflow using the cloning guard, but it does not yet unblock strength reduction or downcounting on stride>1 loops: SCEV's trip-count formula at scev.cpp:2093 still requires |step| == 1 (the standing TODO about adding a division operator), and a secondary lower-bound proof (step <= rhs + 1) goes via optRelopImpliesRelop which doesn't bridge additive transforms. Parking the change until those two are addressed.

@AndyAyersMS

Copy link
Copy Markdown
Member Author

Turns out we can't yet handle

  for (int i = 0; i >= 0 && i < arr.Length; i += K) { ... }

because the loop has two exit edges when we analyze it for cloning. Do you think this pattern is common?

The other case is handled provided K <= 57. I suppose we could now extend it (and arrays in general to larger K with extra cloning checks like we're doing here for spans).

@AndyAyersMS

Copy link
Copy Markdown
Member Author

The other case is handled provided K <= 57. I suppose we could now extend it (and arrays in general to larger K with extra cloning checks like we're doing here for spans).

I will do that as a follow-on PR.

@tannergooding

Copy link
Copy Markdown
Member

Just to be clear, I don't think this is pressing for this PR, more just an interest as to what we are and aren't covering at this point.

because the loop has two exit edges when we analyze it for cloning. Do you think this pattern is common?

I'm not sure exactly how common it is, but it's one of the patterns that I've seen users write for SIMD code.

In general I'd prefer if users didn't have to write stuff like (uint)i <= (uint)span.Length, as I think its functionally harder to reason about than i >= 0 && i < span.Length, where you don't have to think about "this is two's complement, so negatives become large positives, so it is safe".

But, due to historical JIT pessimizations, we have a lot of our own code and have pushed users towards the casting pattern instead. I think we even have a pass that transforms the i >= 0 && i < span.Length check into (uint)i <= (uint)span.Length in the JIT, but it might be happening after we do the loop checks.

Comment thread src/coreclr/jit/flowgraph.cpp Outdated
Comment thread src/coreclr/jit/flowgraph.cpp Outdated

@tannergooding tannergooding left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good/correct to me. Left a small comment about a potential future opt (not a priority) and a question about whether a given handling path is necessary since morph should canonicalize most trees.

@AndyAyersMS

Copy link
Copy Markdown
Member Author

diffs

More like 300 methods impacted. More cloning, more "unprofitable rejected" cloning.

Morph canonicalizes commutative ops to put any constant on op2 and folds base + 0,
so the cns + base branch in the limit-offset peel is unreachable. Remove it and
update the comment to cite the morph invariants we're relying on.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 12, 2026 01:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

@@ -1401,12 +1465,14 @@ bool Compiler::optDeriveLoopCloningConditions(FlowGraphNaturalLoop* loop, LoopCl
if (isIncreasingLoop)
{
// For increasing loop, thelimit value needs to be checked against the array length
int want = ExpectedIncLe(0, n - 5, 2, a);
Assert.Equal(want, got);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants