fix: make crawler terminal status message reliably delivered#3733
fix: make crawler terminal status message reliably delivered#3733metalwarrior665 wants to merge 2 commits into
Conversation
The terminal "Finished! Total N requests..." status message was sent fire-and-forget (`void this.setStatusMessage(...)`) at the end of `run()`. When followed by `Actor.exit()` -> `process.exit()`, the un-awaited request often lost the race and never reached the platform, so the run was finalized with the platform's generic default terminal message instead. Which message won was effectively random. Race the terminal status write against `sleep(1)` so the event loop gets a single tick to flush the request before `run()` resolves, while still not blocking on the full round-trip. This mirrors the bounded wait the Apify SDK uses in `Actor.exit()`.
| } succeeded, ${this.stats.state.requestsFailed} failed.`, | ||
| { isStatusMessageTerminal: true, level: 'INFO' }, | ||
| ), | ||
| sleep(1), |
There was a problem hiding this comment.
I'd give it a bit more, to ensure socket buffers are flushed and message sent. imo even max 50 ms wait is good price for ensuring status message is right
There was a problem hiding this comment.
The whole awaited status message takes around 30ms so 50ms race is unnecessary
barjin
left a comment
There was a problem hiding this comment.
The implementation feels a bit hacky. Any chance we could, e.g., add a separate timeout option for BasicCrawler.setStatusMessage() and deal with this inside the method? 🙏
|
btw the 1 ms delay doesn't seem to be enough; we have that in the SDK, and it's still not writing the status message properly (see Slack thread) |
|
Ok, so there are 2 separate problems: Current BasicCrawler - This is a clear bug; we need to sleep at least 1ms to send the request because there might not be any other await after. So it must be at least 1ms. 1ms sleep and SDK - The 1ms sleep only works if there is already an open connection to api.apify.com. That is true in 99.99% cases but not in the contrived example from Slack where we do no API calls at all and exit. There is also chance that the connection would just die on the server side but likely super rare. Doing some deep HTTP inspection to see if we flushed the request would be overkill (unless you have a quick solution). I'm inclined to just put the await back in both places but it tears my heart to waste 60ms for mostly nothing. The 1ms will work until someone else does no HTTP Actor :) |
|
How about invoking |
In current codebase, this would mean quite a bit of refactoring in sensitive part so not worth. Actually, I have run 500 more test runs, and it worked every time because the keep-alive connection is there (pretty much impossible that there will be 0 API calls the whole time). I would go with the 1s, it might fail 1 out of a million, and it will be ok. And I will resolve the SDK case separately. |
|
sounds cool, cheers Lukas |
Fixes https://apify.slack.com/archives/CD0SF6KD4/p1780936657459989. If there is no other code after the crawler.run, it could just never spin the event loop to send the HTTP for status message.
I tested the fix on a few hundred runs, and it worked every time.
https://claude.ai/code/session_01PnJVep34w4Yzupcb3JzLJu