Skip to content

[SPARK-57171][SQL] Simplify Slice codegen by extracting index arithmetic into a static Java helper#56221

Open
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-slice-codegen
Open

[SPARK-57171][SQL] Simplify Slice codegen by extracting index arithmetic into a static Java helper#56221
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:spark-slice-codegen

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Add ArrayExpressionUtils.sliceStartIndex(int start, int numElements, String functionName) and ArrayExpressionUtils.sliceLength(int length, int numElements, int startIdx, String functionName), and route Slice's codegen through them.

Slice.doGenCode previously emitted ~17 lines of inline, element-type-independent index arithmetic (1-based -> 0-based start resolution, the start == 0 / length < 0 validations, and the result-length clamp). It now emits two helper calls. The eval path reuses sliceStartIndex for the shared start resolution.

Unlike the earlier SPARK-56908 sub-tasks, this is neither ANSI-specific nor a try/catch wrapper -- it is a plain, type-independent block of generated logic, which is exactly the kind of boilerplate the umbrella aims to deduplicate into static Java helpers.

Why are the changes needed?

Part of SPARK-56908 (umbrella). Moving the fixed index arithmetic out of the generated Java shrinks the per-stage source for every plan that uses slice, helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes. The codegen path keeps its existing result-length clamp (sliceLength); the eval path keeps its existing data.slice(...) length handling unchanged.

How was this patch tested?

build/sbt "catalyst/testOnly *CollectionExpressionsSuite"

59/59 pass, including Slice (exercised both with and without whole-stage codegen).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

…tic into a static Java helper

### What changes were proposed in this pull request?

Add `ArrayExpressionUtils.sliceStartIndex(int start, int numElements, String
functionName)` and `sliceLength(int length, int numElements, int startIdx,
String functionName)`, and route `Slice`'s codegen through them.
`Slice.doGenCode` previously emitted ~17 lines of inline, element-type-independent
index arithmetic (1-based -> 0-based start resolution, the `start == 0` /
`length < 0` validations, and the result-length clamp). It now emits two helper
calls. The eval path reuses `sliceStartIndex` for the shared start resolution.

Unlike the existing SPARK-56908 sub-tasks, this is neither ANSI-specific nor a
try/catch wrapper -- it is a plain, type-independent block of generated logic,
which is exactly the kind of boilerplate the umbrella aims to deduplicate into
static Java helpers.

### Why are the changes needed?

Part of SPARK-56908 (umbrella). Moving the fixed index arithmetic out of the
generated Java shrinks the per-stage source for every plan that uses `slice`.

### Does this PR introduce _any_ user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.
The codegen path keeps its existing result-length clamp (`sliceLength`); the eval
path keeps its existing `data.slice(...)` length handling unchanged.

### How was this patch tested?

```
build/sbt "catalyst/testOnly *CollectionExpressionsSuite"
```

59/59 pass, including `Slice` (exercised both with and without whole-stage codegen).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant