feat(python/sedonadb): add aggregate expression methods on Expr#882
feat(python/sedonadb): add aggregate expression methods on Expr#882jiayuasu wants to merge 1 commit into
Conversation
First step toward grouping/aggregation (per the reprioritization on apache#791). Adds aggregate builder methods to Expr: col("x").sum() -> Expr(sum(x)) col("x").count() -> Expr(count(x)) col("x").mean() -> Expr(avg(x)) col("x").min() -> Expr(min(x)) col("x").max() -> Expr(max(x)) These follow the operator-method pattern already on Expr (matching Polars / PySpark) rather than free functions. `mean` maps to DataFusion's `avg` aggregate but is named to match the pandas/Polars vocabulary. Each wraps the corresponding helper from `datafusion::functions_aggregate::expr_fn`. The resulting Expr is an aggregate AST node — valid only inside an aggregation context, which arrives in the follow-up `DataFrame.agg` / `group_by().agg()` PRs. This PR ships the builders + repr-level tests in isolation, the same way the original Expr foundation (apache#807) landed before DataFrame integration existed. Tests: parametrized exact-repr check for all five aggregates plus an aggregate-over-compound-expression case.
There was a problem hiding this comment.
Pull request overview
Adds aggregate expression builder methods (sum, count, mean, min, max) to the Python Expr type, wrapping DataFusion's aggregate function helpers. This is a small, isolated step toward the upcoming DataFrame.agg / group_by().agg() integration tracked in #791.
Changes:
- Rust
PyExprgains five aggregate methods that wrapdatafusion::functions_aggregate::expr_fnhelpers. - Python
Exprexposes corresponding methods with docstring examples;meanmaps to DataFusion'savg. - Tests verify
repr()output for each aggregate and for an aggregate over a compound expression.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| python/sedonadb/src/expr.rs | Adds sum/count/mean/min/max PyO3 methods on PyExpr delegating to DataFusion aggregate helpers. |
| python/sedonadb/python/sedonadb/expr/expression.py | Adds the Python-side Expr aggregate methods with docstring examples. |
| python/sedonadb/tests/expr/test_expression.py | Parametrized repr tests for the five aggregates plus a compound-expression case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Closing in favor of the approach in #885 — exposing scalar and aggregate UDFs through a registry-walking
Once #885 lands I'll build |
First step toward grouping/aggregation, per the reprioritization recorded on #791 (aggregation/join/UDFs ahead of the remaining small schema ops).
What's new
Aggregate builder methods on
Expr:Expr(matching Polars / PySpark) rather than free functions — consistent with the operator-method pattern already onExpr.meanmaps to DataFusion'savgaggregate but is named for the pandas/Polars vocabulary.datafusion::functions_aggregate::expr_fn.Scope
This PR ships only the expression builders. The resulting
Expris an aggregate AST node — valid only inside an aggregation context (DataFrame.agg(...)/group_by().agg()), which come in follow-up PRs. Shipping the builders in isolation mirrors how the originalExprfoundation (#807) landed before any DataFrame integration existed; tests are repr-level.Planned follow-ups:
df.agg(*exprs)— global (ungrouped) aggregation.df.group_by(*keys).agg(*exprs)— theGroupedDataFramelayer.Test plan
6 tests in
tests/expr/test_expression.py:repr()check for all five aggregates.(col("x") + col("y")).sum()→Expr(sum(x + y))).Local: 54 expression tests + 15 doctests +
ruff format+ruff checkall clean.