JIT: extend loop cloning for span+stride>1 and ±const limits#129309
JIT: extend loop cloning for span+stride>1 and ±const limits#129309AndyAyersMS wants to merge 3 commits into
Conversation
Enable cloning of loops over Span<T> when the stride is greater than 1, guarded by a runtime `limit <= INT_MAX - s + 1` condition on the increasing var-limit case (other forms are already implicitly safe). Extend MatchLimit to peel a constant offset off the limit so loops like `for (i; i < arr.Length - K; i += K)` -- the common Vector<T>.Count vectorization warm-up -- become clonable. NaturalLoopIterInfo gains a LimitOffset that flows through LC_Ident (Var and ArrAccess gain an offset), the zero-trip / per-access / NE / overflow guards, and a new `arr.Length + offset >= 0` guard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
@EgorBo PTAL Impacts ~200 methods across SPMI. If you know of specific cases that should be handled in BCL / etc let me know and I'll verify. |
|
Also want to see if IV opts can now understand the IV does not overflow in the fast path. |
There was a problem hiding this comment.
Pull request overview
This PR extends CoreCLR JIT loop cloning’s ability to reason about loop limits and spans by (1) modeling limitBase ± const limits as a base + offset and (2) enabling span-based loop cloning for non-unit strides with an added overflow safety guard. It also adds new JIT tests that exercise non-unit stride span loops and offset limits like Length - K.
Changes:
- Add
NaturalLoopIterInfo::LimitOffsetplusLimitBase()to represent and consumebase ± constloop limits. - Extend
LC_Identto carry anoffsetforVarandArrAccess, and plumb it through condition generation. - Update loop cloning condition derivation to support span stride > 1 (increasing loops) with an overflow bound, and add a guard for negative
arr.Length + offsetlimits; add new tests for these scenarios.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/compiler.h | Adds LimitOffset and declares LimitBase() on NaturalLoopIterInfo. |
| src/coreclr/jit/flowgraph.cpp | Implements limit peeling into LimitOffset, adds debug printing, and implements LimitBase(); updates limit accessors to use it. |
| src/coreclr/jit/loopcloning.h | Extends LC_Ident with offset and updates equality/printing/constructors. |
| src/coreclr/jit/loopcloning.cpp | Materializes LC_Ident offsets, enables span stride>1 cloning with an overflow guard, and threads LimitOffset into emitted conditions. |
| src/tests/JIT/opt/Cloning/SpanNonUnitStride.csproj | New JIT test project for span non-unit stride cloning scenarios. |
| src/tests/JIT/opt/Cloning/SpanNonUnitStride.cs | New tests covering span loops with stride 2/3 and various comparison forms. |
| src/tests/JIT/opt/Cloning/OffsetLimit.csproj | New JIT test project for limit-offset recognition scenarios. |
| src/tests/JIT/opt/Cloning/OffsetLimit.cs | New tests covering Length - K, SIMD-count offsets, and related limit forms. |
Morph normally folds `x + 0` / `x - 0` before MatchLimit runs, but if such a tree slips through the peel would set HasArrayLengthLimit / HasInvariantLocalLimit based on the peeled base while LimitOffset stayed 0 -- then LimitBase() would short-circuit to the original GT_ADD tree and trip the GT_ARR_LENGTH / GT_LCL_VAR asserts in ArrLenLimit / VarLimit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Will this also work for cases like: for (int i = 0; i >= 0 && i < arr.Length; i += K) { ... }
// or its functional equivalent
for (int i = 0; (uint)i < (uint)arr.Length; i += K) { ... }In both cases, overflow can happen but the guard check ensures that |
|
We already handled arrays with K less than 58 because the max array length limits limit when overflow can happen (I'll double check for your cases just to be sure). This PR enables something similar for span where there aren't those same guarantees. (will edit the commit comment). |
|
Note AI-generated comment. Prototyped the IV no-overflow analysis as a follow-up (draft branch |
|
Turns out we can't yet handle because the loop has two exit edges when we analyze it for cloning. Do you think this pattern is common? The other case is handled provided K <= 57. I suppose we could now extend it (and arrays in general to larger K with extra cloning checks like we're doing here for spans). |
I will do that as a follow-on PR. |
|
Just to be clear, I don't think this is pressing for this PR, more just an interest as to what we are and aren't covering at this point.
I'm not sure exactly how common it is, but it's one of the patterns that I've seen users write for SIMD code. In general I'd prefer if users didn't have to write stuff like But, due to historical JIT pessimizations, we have a lot of our own code and have pushed users towards the casting pattern instead. I think we even have a pass that transforms the |
tannergooding
left a comment
There was a problem hiding this comment.
Changes look good/correct to me. Left a small comment about a potential future opt (not a priority) and a question about whether a given handling path is necessary since morph should canonicalize most trees.
|
More like 300 methods impacted. More cloning, more "unprofitable rejected" cloning. |
Morph canonicalizes commutative ops to put any constant on op2 and folds base + 0, so the cns + base branch in the limit-offset peel is unreachable. Remove it and update the comment to cite the morph invariants we're relying on. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| @@ -1401,12 +1465,14 @@ bool Compiler::optDeriveLoopCloningConditions(FlowGraphNaturalLoop* loop, LoopCl | |||
| if (isIncreasingLoop) | |||
| { | |||
| // For increasing loop, thelimit value needs to be checked against the array length | |||
| int want = ExpectedIncLe(0, n - 5, 2, a); | ||
| Assert.Equal(want, got); | ||
| } | ||
| } |
Enable cloning of loops over Span when the stride
sis greater than 1, guarded by a runtimelimit <= INT_MAX - s + 1condition on the increasing var-limit case. Other forms are already implicitly safe.Extend MatchLimit to peel a constant offset off the limit so loops like
become clonable. Add a LimitOffset to NaturalLoopIterInfo that feeds into LC_Ident, the zero-trip / per-access / NE / overflow guards, and a new
arr.Length + offset >= 0guard (for very short array lengths and negative offsets).