Skip to content

fix(spark): improve error message when precombine field value is null#18061

Open
prashantwason wants to merge 6 commits intoapache:masterfrom
prashantwason:pw_oss_null_precombine_fix
Open

fix(spark): improve error message when precombine field value is null#18061
prashantwason wants to merge 6 commits intoapache:masterfrom
prashantwason:pw_oss_null_precombine_fix

Conversation

@prashantwason
Copy link
Copy Markdown
Member

Describe the issue this Pull Request addresses

Closes #18060

When records have null values in the precombine field, Hudi jobs fail with a cryptic error message that makes it difficult for users to diagnose the root cause:

org.apache.hudi.exception.HoodieException: Could not create payload for class: org.apache.hudi.common.model.DefaultHoodieRecordPayload
Caused by: org.apache.hudi.exception.HoodieException: Ordering value is null for record: ...

This error provides no actionable information about which field has the null value, which record is problematic, or how to remediate the issue.

Summary and Changelog

This change adds explicit null-checking with a clear, actionable error message before attempting payload creation in HoodieCreateRecordUtils.scala.

Changes:

  • Added null-check for ordering field values in HoodieCreateRecordUtils.scala
  • The new error message identifies:
    • The specific precombine/ordering field that has a null value
    • The record key to help locate the problematic record
    • Suggested remediation options (fix data or use OverwriteWithLatestAvroPayload)
  • Added test cases in TestHoodieCreateRecordUtils.scala to verify the improved error handling

Example of improved error message:

Precombine/ordering field 'ts' has null value for record key 'abc123'. Please ensure all records have non-null values for the precombine field, or use a payload class that doesn't require ordering (e.g., OverwriteWithLatestAvroPayload).

Impact

No public API changes. This is a usability improvement that provides clearer error messages when data quality issues occur during ingestion.

Risk Level

low - The change only affects error handling and adds a null check before an operation that would fail anyway. The behavior for valid data remains unchanged.

Documentation Update

none - No new configs or features added. This is an error message improvement.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Jan 30, 2026
throw new IllegalArgumentException(
s"Precombine/ordering field '$field' has null value for record key '${hoodieKey.getRecordKey}'. " +
s"Please ensure all records have non-null values for the precombine field, " +
s"or use a payload class that doesn't require ordering (e.g., OverwriteWithLatestAvroPayload).")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous versions of hudi, its common for users to configure precombine field even w/ OverwriteWithLatestAvroPayload. But in later versions, we fully relaxed the constraint and its totally fine to not have precombine field configured or have precombine configured, but some records could have null values, assuming the payload is OverwriteWithLatestAvroPayload or merge mode is RecordMergeMode.COMMIT_TIME_ORDERING

So, if we were to throw exception here, atleast we need to check the payload and merge mode and then throw exception accordingly.
For eg, incase of OverwriteWithLatestAvroPayload or RecordMergeMode.COMMIT_TIME_ORDERING, we don't wanna throw exception

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @nsivabalan, you're correct.

I'll update the PR to check the merge mode and payload class before throwing the exception. The null check should only throw when ordering is actually required, specifically:

  • Skip the exception for RecordMergeMode.COMMIT_TIME_ORDERING
  • Skip the exception for OverwriteWithLatestAvroPayload

For these cases, I'll use OrderingValues.getDefault() as a fallback instead of throwing, which aligns with how OverwriteWithLatestAvroPayload handles missing ordering values in its constructor.

Will push an updated commit shortly.

@github-actions github-actions Bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Feb 2, 2026
@prashantwason
Copy link
Copy Markdown
Member Author

@hudi-bot run azure

@prashantwason
Copy link
Copy Markdown
Member Author

Updated error message to use "Ordering field" instead of "Precombine/ordering field" since the precombine notion is deprecated since version 1.1.

@hudi-bot run azure

field => HoodieAvroUtils.getNestedFieldVal(avroRec, field, false,
consistentLogicalTimestampEnabled).asInstanceOf[Comparable[_]]))
field => {
val fieldVal = HoodieAvroUtils.getNestedFieldVal(avroRec, field, false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is becoming too fat.
can we move this to a private method.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add similar logic to spark record type as well?

prashantwason and others added 5 commits February 23, 2026 23:18
…l during payload creation

When records have null values in the precombine field, Hudi jobs fail with
a cryptic error message "Ordering value is null for record" that makes it
difficult for users to diagnose the root cause.

This change adds explicit null-checking with a clear, actionable error
message before attempting payload creation. The new error message:
- Identifies the specific precombine/ordering field that has a null value
- Provides the record key to help locate the problematic record
- Suggests remediation options (fix data or use OverwriteWithLatestAvroPayload)

Closes apache#18060

Co-Authored-By: Claude (claude-opus-4-5) <noreply@anthropic.com>
…dataWriterTableVersionSix

Added unit tests to verify the behavior of shouldInitializeFromFilesystem() method
which allows metadata table bootstrap when pending commits are being rolled back:

- Test initialization allowed when no pending data instants
- Test initialization allowed when only pending instant is current inflight
- Test initialization blocked when blocking instants have no rollbacks
- Test initialization allowed when all blocking instants have pending rollbacks
- Test initialization blocked when only some blocking instants have rollbacks
- Test mixed scenario with current inflight and rollbacks for other instants
- Test that completed rollbacks do not count as pending rollbacks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…erwriteWithLatestAvroPayload

- Check merge mode and payload class before throwing exception for null precombine
- Skip exception for RecordMergeMode.COMMIT_TIME_ORDERING
- Skip exception for OverwriteWithLatestAvroPayload
- Use OrderingValues.getDefault() as fallback for null values when ordering not required
- Add tests for both scenarios

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The precombine notion is deprecated since version 1.1, so use
"Ordering field" instead of "Precombine/ordering field" in the
error message for consistency.
Address review comment: Move the ordering value creation logic with
null-checking to a private createOrderingValue() method to reduce code
complexity in the main method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@prashantwason prashantwason force-pushed the pw_oss_null_precombine_fix branch from 666a57a to 8de9e14 Compare February 24, 2026 07:20
@prashantwason
Copy link
Copy Markdown
Member Author

Rebased with latest master and addressed review comments:

1. Extracted to private method:
Moved the ordering value creation logic to a new private createOrderingValue() method. This reduces the complexity of the main code path and makes the logic reusable.

2. Regarding SPARK record type:
After reviewing the code, the SPARK record type (HoodieSparkRecord) doesn't use OrderingValues in its construction - it uses a different code path that handles ordering through the payload mechanism differently. The HoodieSparkRecord constructor takes (key, targetRow, dataFileStructType, false) without an ordering value parameter. So the null-check logic isn't directly applicable to SPARK records in the same way.

If you think we should add similar validation for SPARK records, please let me know the specific code path where this should be added.

@hudi-bot run azure

Copy link
Copy Markdown
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 1 minor ask

* or COMMIT_TIME_ORDERING merge mode, null values are allowed and a default ordering value is used.
* For other cases, throws IllegalArgumentException if any ordering field has a null value.
*/
private def createOrderingValue(orderingFields: java.util.List[String],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor.
how about getOrderingValue(...)

- Renamed createOrderingValue() to getOrderingValue() per reviewer suggestion
- Fixed return type from OrderingValues to Comparable[_] to match
  OrderingValues.create() return type, fixing compilation errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@prashantwason
Copy link
Copy Markdown
Member Author

Addressed remaining review comments:

  1. @nsivabalan's naming suggestion: Renamed createOrderingValue() to getOrderingValue() as suggested.

  2. Fixed compilation error: The method was declared with return type OrderingValues but OrderingValues.create() returns Comparable[_]. Fixed the return type to Comparable[_], which also matches the createHoodieRecord() parameter type. This was causing all spark CI builds to fail.

@hudi-bot run azure

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.23%. Comparing base (18ee6cd) to head (0575544).
⚠️ Report is 38 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (18ee6cd) and HEAD (0575544). Click for more details.

HEAD has 15 uploads less than BASE
Flag BASE (18ee6cd) HEAD (0575544)
spark-scala-tests 6 0
spark-java-tests 9 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18061       +/-   ##
=============================================
- Coverage     57.28%   45.23%   -12.06%     
+ Complexity    18525     8418    -10107     
=============================================
  Files          1944     1186      -758     
  Lines        106137    60856    -45281     
  Branches      13121     6492     -6629     
=============================================
- Hits          60799    27526    -33273     
+ Misses        39619    30342     -9277     
+ Partials       5719     2988     -2731     
Flag Coverage Δ
hadoop-mr-java-client 45.23% <ø> (-0.19%) ⬇️
spark-java-tests ?
spark-scala-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1395 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error message when precombine field value is null during payload creation

4 participants