[SPARK-56177][SQL] V2 file bucketing write support by LuciferYang · Pull Request #55128 · apache/spark

LuciferYang · 2026-04-01T03:54:47Z

What changes were proposed in this pull request?

Enable bucketed writes for V2 file tables via catalog BucketSpec.

Changes:

FileWrite: add bucketSpec field, use V1WritesUtils.getWriterBucketSpec() instead of hardcoded None
FileTable.createFileWriteBuilder: extract catalogTable.bucketSpec and pass to the write pipeline
FileDataSourceV2.getTable: use collect to skip BucketTransform (handled via catalogTable.bucketSpec)
FileWriterFactory: use DynamicPartitionDataConcurrentWriter for bucketed writes since V2's RequiresDistributionAndOrdering cannot express hash-based ordering
All 6 format Write/Table classes (Parquet, ORC, CSV, JSON, Text, Avro) updated with BucketSpec parameter

Why are the changes needed?

After SPARK-56171 removed the V2 file write gate, INSERT INTO a bucketed file table goes through the V2 write path. Without this change, WriteJobDescription.bucketSpec is always None, so bucketed tables produce non-bucketed files.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added new tests in FileDataSourceV2WriteSuite:

Bucketed write with bucket ID verification via BucketingUtils.getBucketId
Partitioned + bucketed write with partition directory and bucket ID verification

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…Frame API writes and delete FallBackFileSourceV2 Key changes: - FileWrite: added partitionSchema, customPartitionLocations, dynamicPartitionOverwrite, isTruncate; path creation and truncate logic; dynamic partition overwrite via FileCommitProtocol - FileTable: createFileWriteBuilder with SupportsDynamicOverwrite and SupportsTruncate; capabilities now include TRUNCATE and OVERWRITE_DYNAMIC; fileIndex skips file existence checks when userSpecifiedSchema is provided (write path) - All file format writes (Parquet, ORC, CSV, JSON, Text, Avro) use createFileWriteBuilder with partition/truncate/overwrite support - DataFrameWriter.lookupV2Provider: enabled FileDataSourceV2 for non-partitioned Append and Overwrite via df.write.save(path) - DataFrameWriter.insertInto: V1 fallback for file sources (TODO: SPARK-56175) - DataFrameWriter.saveAsTable: V1 fallback for file sources (TODO: SPARK-56230, needs StagingTableCatalog) - DataSourceV2Utils.getTableProvider: V1 fallback for file sources (TODO: SPARK-56175) - Removed FallBackFileSourceV2 rule - V2SessionCatalog.createTable: V1 FileFormat data type validation

…catalog table loading, and gate removal Key changes: - FileTable extends SupportsPartitionManagement with createPartition, dropPartition, listPartitionIdentifiers, partitionSchema - Partition operations sync to catalog metastore (best-effort) - V2SessionCatalog.loadTable returns FileTable instead of V1Table, sets catalogTable and useCatalogFileIndex on FileTable - V2SessionCatalog.getDataSourceOptions includes storage.properties for proper option propagation (header, ORC bloom filter, etc.) - V2SessionCatalog.createTable validates data types via FileTable - FileTable.columns() restores NOT NULL constraints from catalogTable - FileTable.partitioning() falls back to userSpecifiedPartitioning or catalog partition columns - FileTable.fileIndex uses CatalogFileIndex when catalog has registered partitions (custom partition locations) - FileTable.schema checks column name duplication for non-catalog tables only - DataSourceV2Utils.getTableProvider: removed FileDataSourceV2 gate - DataFrameWriter.insertInto: enabled V2 for file sources - DataFrameWriter.saveAsTable: V1 fallback (TODO: SPARK-56230) - ResolveSessionCatalog: V1 fallback for FileTable-backed commands (AnalyzeTable, AnalyzeColumn, TruncateTable, TruncatePartition, ShowPartitions, RecoverPartitions, AddPartitions, RenamePartitions, DropPartitions, SetTableLocation, CREATE TABLE validation, REPLACE TABLE blocking) - FindDataSourceTable: streaming V1 fallback for FileTable (TODO: SPARK-56233) - DataSource.planForWritingFileFormat: graceful V2 handling

…ion to FileScan

Enable bucketed writes for V2 file tables via catalog BucketSpec. Key changes: - FileWrite: add bucketSpec field, use V1WritesUtils.getWriterBucketSpec() instead of hardcoded None - FileTable: createFileWriteBuilder passes catalogTable.bucketSpec to the write pipeline - FileDataSourceV2: getTable uses collect to skip BucketTransform (handled via catalogTable.bucketSpec instead) - FileWriterFactory: use DynamicPartitionDataConcurrentWriter for bucketed writes since V2's RequiresDistributionAndOrdering cannot express hash-based ordering - All 6 format Write/Table classes updated with BucketSpec parameter Note: bucket pruning and bucket join (read-path optimization) are not included in this patch (tracked under SPARK-56231).

LuciferYang · 2026-04-01T03:55:33Z

677a482 represents the actual change in the current patch, which is the 5th patch in SPARK-56170.

LuciferYang added 5 commits March 26, 2026 14:55

[SPARK-56174][SQL] Complete V2 file write path for DataFrame API

ef5bc4b

[SPARK-56176][SQL] V2-native ANALYZE TABLE/COLUMN with stats propagat…

ccd7272

…ion to FileScan

LuciferYang marked this pull request as draft April 1, 2026 03:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56177][SQL] V2 file bucketing write support#55128

[SPARK-56177][SQL] V2 file bucketing write support#55128
LuciferYang wants to merge 5 commits intoapache:masterfrom
LuciferYang:SPARK-56177

LuciferYang commented Apr 1, 2026

Uh oh!

LuciferYang commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Apr 1, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LuciferYang commented Apr 1, 2026 •

edited

Loading