HDDS-15348. OmMultipartPartKeyCodec should not use UTF8.decode(..)#10347
HDDS-15348. OmMultipartPartKeyCodec should not use UTF8.decode(..)#10347Russole wants to merge 4 commits into
Conversation
| String uploadId = StringCodec.get().fromPersistedFormat(uploadIdBytes); | ||
| if (!Arrays.equals(uploadIdBytes, | ||
| uploadId.getBytes(StandardCharsets.UTF_8))) { | ||
| throw new IllegalArgumentException( | ||
| "Invalid multipart part key: malformed UTF-8 uploadId"); | ||
| } |
There was a problem hiding this comment.
@Russole , thanks for working on this! We should throw CodecException. It needs a new variant of StringCodec. Filed HDDS-15355.
BTW, we should also change the other IllegalArgumentException in OmMultipartPartKeyCodec to CodecException.
szetszwo
left a comment
There was a problem hiding this comment.
@Russole , thanks for the update. The change looks good.
We should also update toCodecBuffer(..) and toPersistedFormat(..) to use StringCodec.getCodecNoFallback() since String.getBytes(..) could replace the unsupported character silently.
szetszwo
left a comment
There was a problem hiding this comment.
+1 the change looks good.
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @Russole for the patch.
| byte[] uploadIdBytes = new byte[uploadIdBuffer.remaining()]; | ||
| uploadIdBuffer.get(uploadIdBytes); | ||
| String uploadId = StringCodec.getCodecNoFallback() | ||
| .fromPersistedFormat(uploadIdBytes); |
There was a problem hiding this comment.
@szetszwo Can we use StringCodecBase.decodeNoFallback(ByteBuffer) directly, to avoid buffer copy?
There was a problem hiding this comment.
@adoroszlai , this is a good point! However, it will need more changes to make it work properly. I believe the other codecs have similar problems. So, how about we do it separately?
What changes were proposed in this pull request?
StandardCharsets.UTF_8.decode(...)inOmMultipartPartKeywithStringCodecfor decoding the persisted upload ID bytes.StringCodecBase.decode(...)catches strict decoding exceptions and falls back to compatibility decoding, which may use replacement characters. Re-encoding the decoded upload ID and comparing it with the original bytes lets us detect non-lossless decoding and reject malformed UTF-8 input.IllegalArgumentException.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15348
How was this patch tested?
TestOmMultipartPartKey#testDecodeRejectsMalformedUtf8UploadIdmvn -pl :ozone-common -Dtest=TestOmMultipartPartKey test