Skip to content

[IOTDB-17797] Support lateral column aliases in table SELECT list#17960

Open
DaZuiZui wants to merge 6 commits into
apache:masterfrom
DaZuiZui:feat/select-list-lca
Open

[IOTDB-17797] Support lateral column aliases in table SELECT list#17960
DaZuiZui wants to merge 6 commits into
apache:masterfrom
DaZuiZui:feat/select-list-lca

Conversation

@DaZuiZui

Copy link
Copy Markdown
Contributor

Description

Support lateral column aliases in table SELECT lists

This PR implements Part 2 of #17797 for the table-model analyzer: later SingleColumn SELECT items can reference aliases explicitly defined by earlier SingleColumn items in the same SELECT list.

Examples now supported:

SELECT s1 AS x, x + 1 AS y FROM table1;
SELECT s1 AS x, x + 1 AS y FROM table1 ORDER BY y;
SELECT s1 AS x, row_number() OVER (ORDER BY x) AS rn FROM table1;
SELECT s1 AS x, avg(s2) OVER (PARTITION BY x) AS a FROM table1;

Name-resolution behavior

The LCA rewrite is left-to-right and only applies to unqualified identifiers in later SELECT items. It does not rewrite qualified references such as t.x, dereference expressions such as x.y, or identifiers inside subqueries.

Resolution order is:

  1. local source column
  2. visible previous SELECT aliases
  3. existing analyzer resolution

If multiple previous aliases with the same canonical name are visible and no local source column wins, the analyzer raises a clear ambiguity error. AllColumns and COLUMNS(...) do not register reusable LCA aliases.

GROUP BY alias reuse uses the LCA-rewritten SELECT expression, while ORDER BY keeps the existing output-alias precedence. WHERE and HAVING still do not see SELECT aliases.

For CAST(value AS type), only value participates in LCA rewriting. Type names are not treated as alias references.

Implementation notes

The original AST is kept unchanged. The analyzer records each SingleColumn's semantic expression after LCA rewriting and uses it for type analysis, source-column tracking, output expression analysis, and GROUP BY alias reuse. Output field names without explicit aliases are still derived from the original SELECT item text rather than from the rewritten expression.

Inline window specifications in later SELECT items are rewritten and registered against the rewritten FunctionCall nodes before expression analysis. Referencing an alias whose expression contains a window function is rejected explicitly.

Fixes #17797


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.

Key changed/added classes (or packages if there are too many classes) in this PR
  • StatementAnalyzer: SELECT-list LCA rewrite, rewritten SELECT expression tracking, output-scope name handling, and window metadata registration for rewritten expressions.
  • ExpressionRewriter / ExpressionTreeRewriter: support for leaf expression rewrite hooks used by LCA expression copying.
  • SelectAliasReuseTest: coverage for LCA chaining, collisions, ambiguity, aggregates, GROUP BY/ORDER BY reuse, window specs, CAST type-name collisions, WHERE/HAVING isolation, and output field names.
  • relational/analyzer/README.md: documented table-model SELECT alias resolution rules.
Tests
./mvnw spotless:apply -pl iotdb-core/datanode
./mvnw test -pl iotdb-core/datanode -Dtest=SelectAliasReuseTest

@DaZuiZui

Copy link
Copy Markdown
Contributor Author

I have completed the implementation for Issue #17797 Part 2: supporting lateral column aliases in the table-model SELECT list.

The PR is available here:

#17960

This PR adds left-to-right SELECT-list alias resolution, keeps local input columns at higher priority than SELECT aliases, avoids rewriting qualified expressions and subqueries, preserves WHERE/HAVING semantics, and keeps the existing ORDER BY alias behavior. It also includes analyzer tests and a plan-level test to ensure reused aliases do not cause duplicated projection computation.

I have verified the changes with:

./mvnw spotless:apply -pl iotdb-core/datanode
./mvnw -nsu test -pl iotdb-core/datanode -Dtest=SelectAliasReuseTest
The PR is ready for review. It will close #17797 after being merged.

outputPosition++;
}
} else {
SelectAliasLookup visibleAliases = visibleAliasBuilder.build();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / follow-up suggestion: this builds a fresh immutable SelectAliasLookup for every SingleColumn. Since the SelectAliasLookup constructor copies the whole current alias map and each alias list, analyzing a very wide SELECT list with many reusable aliases ends up doing repeated prefix copies, i.e. roughly O(N^2) alias-map copying.

This is probably fine for normal SELECT lists, but it may be worth avoiding the repeated snapshots by using an incremental/shared immutable lookup structure, or by exposing a read-only lookup view over the builder whose visible state grows left-to-right.

Could you also add IT/regression coverage for this area? In particular, it would be useful to cover a wide SELECT-list LCA chain/reuse case, plus the edge cases around delimited/quoted alias case sensitivity, named WINDOW definitions not seeing SELECT aliases, DISTINCT with LCA and no ORDER BY, and LCA references in window frame bounds.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Addressed in 3bd3a8d: SELECT-list LCA rewriting now reads from a builder-backed SelectAliasResolver while walking the SELECT list, so it avoids rebuilding immutable snapshots for each SingleColumn and only snapshots once for later clauses. I also added regression coverage for a wide SELECT-list LCA chain, delimited alias case sensitivity, named WINDOW definitions not seeing SELECT aliases, DISTINCT with LCA and no ORDER BY, and LCA references in window frame bounds.

Verified with:

./mvnw spotless:apply -pl iotdb-core/datanode
./mvnw -nsu test -pl iotdb-core/datanode -Dtest=SelectAliasReuseTest

@DaZuiZui

Copy link
Copy Markdown
Contributor Author

Hi @JackieTien97 ,
Thanks for the careful review and helpful suggestions.

I have addressed the latest review comments in PR #17960. The SELECT-list LCA alias lookup now avoids rebuilding immutable snapshots for each SingleColumn, and I added regression coverage for the wide SELECT-list LCA chain, delimited alias case sensitivity, named WINDOW definitions, DISTINCT without ORDER BY, and LCA references in window frame bounds.

I verified the changes with:

./mvnw spotless:apply -pl iotdb-core/datanode
./mvnw -nsu test -pl iotdb-core/datanode -Dtest=SelectAliasReuseTest
Thanks again for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Allow referencing SELECT column aliases in GROUP BY / ORDER BY (and support lateral column alias in SELECT)

2 participants