[SPARK-56125][SQL] Simplify schema calculation for Merge Into Schema Evolution by szehon-ho · Pull Request #54934 · apache/spark

szehon-ho · 2026-03-21T00:54:19Z

What changes were proposed in this pull request?

Replace 'sourceSchemaForSchemaEvolution' with simply 'pendingChanges'. Also reduced path based comparisons where possible (where have the resolved type from target/source)

Why are the changes needed?

This was suggested by @aokolnychyi after the initial pr was merged. The 'sourceSchemaForSchemaEvolution' is confusing, it is supposed to be a view of the source schema, pruned by the fields actually referred by the MERGE into statement. It is used by the subsequent logic (that compares it with the target table schema) but it is hard to explain.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Run existing tests

Was this patch authored or co-authored using generative AI tooling?

Yes cursor

…hema Evolution

szehon-ho · 2026-03-21T01:45:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala

+   * @param root type schema
+   * @param path name segments
+   */
+  def fieldExistsAtPath(


unfortunately this is still needed, but only for the top level unresolved reference case

xiaoxuandev · 2026-03-22T20:25:51Z

sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala

+  private def fieldExistsAtPathInternal(
+      dt: DataType,
+      parts: Seq[String]): Boolean = {
+    def checkAndRecurse(


This looks correct.
nit: checkAndRecurse seems unnecessary, can we inline the logic? Also, can we consider rewriting the recursion using pattern matching on parts so the base case is handled in one place?

xiaoxuandev · 2026-03-22T20:26:52Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

+   * @param valueType type of the assignment value at this path (typically source column)
+   * @param changes accumulator for [[TableChange]] instances
+   * @param fieldPath qualified path segments for nested columns (`element` / `key` / `value`
+   *                  under arrays and mapss)


typo? mapss → maps

xiaoxuandev · 2026-03-22T20:27:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala

+    val changes = mutable.LinkedHashSet.empty[TableChange]
+    val failIncompatible: () => Nothing = () =>
+      throw QueryExecutionErrors.failedToMergeIncompatibleSchemasError(
+        originalTarget, originalSource, null)


nit: failIncompatible passes null as the cause, the error only shows full target/source schemas with no hint about which field path actually conflicts. Since fieldPath, keyType, and valueType are already available at the call site, should we include them in the exception? Would make debugging much easier for deeply nested schemas.

szehon-ho · 2026-04-01T00:23:34Z

Actually I will close this pr, because #54488 makes another dependency on the method computeSchemaChanges() (it's called by MERGE INTO and INSERT), so can't refactor like this. May target a smaller refactor later

szehon-ho · 2026-04-01T01:57:11Z

closing in favor of: #55124

[SPARK-56125][SQL] Simplify schema calculation code for Merge Into Sc…

4ba9dba

…hema Evolution

szehon-ho changed the title ~~[SPARK-56125][SQL] Simplify schema calculation code for Merge Into Schema Evolution~~ [SPARK-56125][SQL] Simplify schema calculation for Merge Into Schema Evolution Mar 21, 2026

Some more cleanup

58e3283

szehon-ho commented Mar 21, 2026

View reviewed changes

xiaoxuandev reviewed Mar 22, 2026

View reviewed changes

szehon-ho closed this Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56125][SQL] Simplify schema calculation for Merge Into Schema Evolution#54934

[SPARK-56125][SQL] Simplify schema calculation for Merge Into Schema Evolution#54934
szehon-ho wants to merge 2 commits intoapache:masterfrom
szehon-ho:SPARK-55690-merge-pending-schema-changes-refactor

szehon-ho commented Mar 21, 2026 •

edited

Loading

Uh oh!

szehon-ho Mar 21, 2026 •

edited

Loading

Uh oh!

xiaoxuandev Mar 22, 2026

Uh oh!

xiaoxuandev Mar 22, 2026

Uh oh!

xiaoxuandev Mar 22, 2026

Uh oh!

szehon-ho commented Apr 1, 2026

Uh oh!

szehon-ho commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

szehon-ho commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

szehon-ho Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaoxuandev Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

xiaoxuandev Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

xiaoxuandev Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Apr 1, 2026

Uh oh!

szehon-ho commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

szehon-ho commented Mar 21, 2026 •

edited

Loading

szehon-ho Mar 21, 2026 •

edited

Loading