HDDS-15348. OmMultipartPartKeyCodec should not use UTF8.decode(..) by Russole · Pull Request #10347 · apache/ozone

Russole · 2026-05-23T09:12:31Z

What changes were proposed in this pull request?

Replace StandardCharsets.UTF_8.decode(...) in OmMultipartPartKey with StringCodec for decoding the persisted upload ID bytes.
The round-trip UTF-8 validation is needed because StringCodecBase.decode(...) catches strict decoding exceptions and falls back to compatibility decoding, which may use replacement characters. Re-encoding the decoded upload ID and comparing it with the original bytes lets us detect non-lossless decoding and reject malformed UTF-8 input.
Add a unit test to verify malformed UTF-8 upload IDs throw IllegalArgumentException.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15348

How was this patch tested?

Added/updated unit test:
- TestOmMultipartPartKey#testDecodeRejectsMalformedUtf8UploadId
Ran locally:
- mvn -pl :ozone-common -Dtest=TestOmMultipartPartKey test
All CI checks passed
- https://github.com/Russole/ozone/actions/runs/26320555387

szetszwo · 2026-05-23T17:45:32Z

+      String uploadId = StringCodec.get().fromPersistedFormat(uploadIdBytes);
+      if (!Arrays.equals(uploadIdBytes,
+          uploadId.getBytes(StandardCharsets.UTF_8))) {
+        throw new IllegalArgumentException(
+            "Invalid multipart part key: malformed UTF-8 uploadId");
+      }


@Russole , thanks for working on this! We should throw CodecException. It needs a new variant of StringCodec. Filed HDDS-15355.

BTW, we should also change the other IllegalArgumentException in OmMultipartPartKeyCodec to CodecException.

szetszwo

@Russole , thanks for the update. The change looks good.

We should also update toCodecBuffer(..) and toPersistedFormat(..) to use StringCodec.getCodecNoFallback() since String.getBytes(..) could replace the unsupported character silently.

szetszwo

+1 the change looks good.

adoroszlai

Thanks @Russole for the patch.

adoroszlai · 2026-05-28T19:33:51Z

+      byte[] uploadIdBytes = new byte[uploadIdBuffer.remaining()];
+      uploadIdBuffer.get(uploadIdBytes);
+      String uploadId = StringCodec.getCodecNoFallback()
+          .fromPersistedFormat(uploadIdBytes);


@szetszwo Can we use StringCodecBase.decodeNoFallback(ByteBuffer) directly, to avoid buffer copy?

ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/db/StringCodecBase.java

Lines 199 to 200 in cfb8ade

public String fromPersistedFormat(byte[] bytes) throws CodecException {

return decodeNoFallback(ByteBuffer.wrap(bytes));

ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/db/StringCodecBase.java

Lines 115 to 117 in cfb8ade

String decodeNoFallback(ByteBuffer buffer) throws CodecException {

try {

return newDecoder().decode(buffer.asReadOnlyBuffer()).toString();

@adoroszlai , this is a good point! However, it will need more changes to make it work properly. I believe the other codecs have similar problems. So, how about we do it separately?

HDDS-15348. OmMultipartPartKeyCodec should not use UTF8.decode(..)

b3a862b

Russole requested a review from szetszwo May 23, 2026 09:12

ivandika3 requested a review from spacemonkd May 23, 2026 11:39

szetszwo reviewed May 23, 2026

View reviewed changes

szetszwo mentioned this pull request May 23, 2026

HDDS-15355. Support StringCodec without fallback. #10349

Merged

Russole added 2 commits May 27, 2026 23:40

Merge branch 'master' into HDDS-15348

9193fb4

Throw CodecException in OmMultipartPartKeyCodec

6b105f3

Russole requested a review from szetszwo May 27, 2026 17:43

szetszwo reviewed May 27, 2026

View reviewed changes

Use strict StringCodec in OmMultipartPartKeyCodec

73851ff

Russole requested a review from szetszwo May 28, 2026 17:08

szetszwo approved these changes May 28, 2026

View reviewed changes

adoroszlai reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-15348. OmMultipartPartKeyCodec should not use UTF8.decode(..)#10347

HDDS-15348. OmMultipartPartKeyCodec should not use UTF8.decode(..)#10347
Russole wants to merge 4 commits into
apache:masterfrom
Russole:HDDS-15348

Russole commented May 23, 2026

Uh oh!

szetszwo May 23, 2026

Uh oh!

szetszwo left a comment

Uh oh!

szetszwo left a comment

Uh oh!

adoroszlai left a comment

Uh oh!

adoroszlai May 28, 2026

Uh oh!

szetszwo May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	public String fromPersistedFormat(byte[] bytes) throws CodecException {
	return decodeNoFallback(ByteBuffer.wrap(bytes));

	String decodeNoFallback(ByteBuffer buffer) throws CodecException {
	try {
	return newDecoder().decode(buffer.asReadOnlyBuffer()).toString();

Conversation

Russole commented May 23, 2026

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

szetszwo May 23, 2026

Choose a reason for hiding this comment

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai left a comment

Choose a reason for hiding this comment

Uh oh!

adoroszlai May 28, 2026

Choose a reason for hiding this comment

Uh oh!

szetszwo May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants