[AURON #1891] Implement randn() function#1938
[AURON #1891] Implement randn() function#1938robreeves wants to merge 20 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements the randn() function to improve Spark function coverage in Auron. The function generates random values from a standard normal distribution with optional seed support.
Changes:
- Added Rust implementation of
spark_randnfunction with seed handling - Registered the new function in the Scala converter and Rust function registry
- Added
rand_distrdependency for normal distribution sampling
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| spark-extension/src/main/scala/org/apache/spark/sql/auron/NativeConverters.scala | Added case handler for Randn expression to route to native implementation |
| native-engine/datafusion-ext-functions/src/spark_randn.rs | New implementation of randn function with seed handling and unit tests |
| native-engine/datafusion-ext-functions/src/lib.rs | Registered Spark_Randn function in the extension function factory |
| native-engine/datafusion-ext-functions/Cargo.toml | Added rand and rand_distr dependencies |
| Cargo.toml | Added rand_distr workspace dependency |
| Cargo.lock | Updated lock file with rand_distr package metadata |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolve conflicts between randn and spark_partition_id features: - Proto: spark_partition_id_expr at 20101, randn_expr at 20102 - Planner: include both expression handlers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@robreeves Nice work! LGTM. |
Resolved conflicts by assigning separate IDs to randn and monotonically_increasing_id: - MonotonicIncreasingIdExprNode: ID 20102 - RandnExprNode: ID 20103 Both expressions are now supported in the proto definitions and planner. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add test to AuronFunctionSuite to verify randn functionality with seeds. The test validates that Auron's native randn implementation produces the same reproducible results as Spark's baseline when using explicit seeds. Test covers: - randn with seed 42 - randn with seed 100 - Validates against Spark baseline using checkSparkAnswerAndOperator Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
I added a |
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@richox can you run the PR checks again? |
|
@cxzl25 can you run the PR checks? Thanks |
ShreyeshArangath
left a comment
There was a problem hiding this comment.
Changes LGTM, just one comment about the naming of this rust function.
…ntion Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…nces Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Which issue does this PR close?
Closes #1891
Rationale for this change
This improves function coverage in Auron by creating a native randn implementation.
What changes are included in this PR?
Adds a native randn implementation.
Are there any user-facing changes?
Yes, it adds the randn function.
How was this patch tested?
Added unit tests and manually tested in spark-shell.
Output: