-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[JAVA-6033] ServerHeartbeatSucceededEvent is not fired for initial POLL monitoring #1856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
driver-core/src/main/com/mongodb/internal/connection/DefaultServerMonitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - if evergreen's happy, I'm happy
…rverMonitor.java Co-authored-by: Ross Lawley <[email protected]>
rozza
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| if (!shouldStreamResponses && connection != null && !connection.isClosed() | ||
| && currentServerDescription.equals(connection.getInitialServerDescription())) { | ||
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we send two hello commands back-to-back but emit only one sequence of events as if only one heartbeat was sent.
When establishing a new connection, the flow looks like this:
- At line 286,
setupNewConnectionAndGetInitialDescription()callslogAndNotifyHeartbeatStarted()and setsalreadyLoggedHeartBeatStarted = true. - We receive the handshake result and return it as the
ServerDescription. - We detect that this is the initial connection and immediately loop again.
- On the next loop iteration, we hit line 264, but
alreadyLoggedHeartBeatStartedis still true, so we don’t log another “started” event.
We then call doHeartbeat(), which sends a second hello command, without waiting for the selection heartbeat timeout. When it succeeds at line 322, we log the “succeeded” event and reset the flag alreadyLoggedHeartBeatStarted = false.
As a result we get:
- 1 ServerHeartbeatStartedEvent (for the first hello / handshake)
- 2 hello commands sent back-to-back
- 1 ServerHeartbeatSucceededEvent (for the second hello)
This conflicts with the SDAM spec, which requires a correlation between started and succeeded/failed events, and also seem to require treating an initial handshake as a heartbeat event:
“The driver MUST guarantee that every ServerHeartbeatStartedEvent has either a correlating ServerHeartbeatSucceededEvent or ServerHeartbeatFailedEvent.”
“The driver MUST publish a ServerHeartbeatStartedEvent before the driver opens a connection for the next heartbeat to a server…”
Judging by the above, the initial handshake appears to be treated as the first heartbeat. The unified tests confirm this expectation, for example, the serverMonitoringMode.yml unified test explicitly expects this sequence in streaming mode:
serverHeartbeatStartedEvent (awaited: false) - handshake
serverHeartbeatSucceededEvent (awaited: false) - handshake
serverHeartbeatStartedEvent (awaited: false) - next poll after waiting
If we run this test with the current implementation, we get:
HEARTBEAT STARTED connectionId{localValue:7} awaited: false
HEARTBEAT STARTED connectionId{localValue:7, serverValue:242} awaited: true
So the correlation between ServerHeartbeatStartedEvent and ServerHeartbeatSucceededEvent (in terms of awaited) does not match the expectations, and the test fails. These tests are currently ignored here:
https://github.com/vbabanin/mongo-java-driver/blob/5cb23ba8e2a9da83857f3e2f7bbb7e5cfc8dcbe2/driver-sync/src/test/functional/com/mongodb/client/unified/UnifiedTestModifications.java#L410-L412 - the associated ticket is already closed, and I wasn’t able to find additional details on the reason for skipping these tests.
Given the above, it seems we should:
- Align the behavior with the spec (that we missed before this PR) so that the handshake acts as the first heartbeat and we don’t send an extra hello.
- Re-enable the ignored SDAM unified tests.
What do you think?
P.S: We send a ServerHeartbeatSucceededEvent right after receiving the regular heartbeat response. We could do the same for the initial connection: emit the succeeded event right after open, since open already includes receiving the handshake.
connection.open(operationContextFactory.create());
roundTripTimeSampler.addSample(connection.getInitialServerDescription().getRoundTripTimeNanos());
logAndNotifyHeartbeatSucceeded(.....)
JAVA-6033