Skip to content

feat: add is_finished method to RequestQueueClient interface#1982

Open
Mantisus wants to merge 6 commits into
apify:masterfrom
Mantisus:queue-client-is-finished
Open

feat: add is_finished method to RequestQueueClient interface#1982
Mantisus wants to merge 6 commits into
apify:masterfrom
Mantisus:queue-client-is-finished

Conversation

@Mantisus

Copy link
Copy Markdown
Collaborator

Description

  • Adds an is_finished method to the RequestQueueClient interface. It's not a breaking change. The base class defaults is_finished to is_empty when it isn't overridden. Splits the queue predicates. is_empty means there are no requests available to fetch. is_finished means all requests are fully processed (nothing to fetch and nothing in progress).
  • Implements is_finished in all built-in RequestQueueClient.
  • Updates is_empty in all built-in RequestQueueClient to match the new meaning.
  • Affects the AutoscaledPool. It now skips scheduling new tasks when no requests are available to fetch. For example, while the only request is being processed.

Issues

Testing

  • Added a test for RequestQueue that checks the is_empty and is_finished values through the queue lifecycle. It covers all built-in clients.
  • Added a regression test that checks the AutoscaledPool doesn't schedule new tasks while the only request is being processed.

@Mantisus Mantisus requested review from janbuchar and vdusek June 21, 2026 21:23
@Mantisus Mantisus self-assigned this Jun 21, 2026

@janbuchar janbuchar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sound, the only thing I'm mildly concerned about is the removal of the is_empty return value caching — I'd feel better if I knew some historical context and why it's safe to do this.

Comment thread src/crawlee/storage_clients/_file_system/_request_queue_client.py

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you please implement apify/apify-sdk-python#987 as well?

That way, we can verify that it works on the platform too.

I'd like this to be part of SDK v4, built on top of Crawlee v1.8 with this included.

@Mantisus

Mantisus commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks! Could you please implement apify/apify-sdk-python#987 as well?

Sure.

@janbuchar janbuchar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks about right to me. @vdusek please review it too 🙂

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

Comment thread src/crawlee/storage_clients/_base/_request_queue_client.py
Comment thread src/crawlee/storage_clients/_redis/_request_queue_client.py Outdated
Comment thread src/crawlee/storage_clients/_file_system/_request_queue_client.py Outdated
Comment thread src/crawlee/storage_clients/_memory/_request_queue_client.py Outdated
Comment thread tests/unit/crawlers/_basic/test_basic_crawler.py Outdated
Mantisus and others added 3 commits June 23, 2026 14:35
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
@Mantisus Mantisus requested a review from vdusek June 23, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an is_finished method to the StorageClient interface Crawler can get softlocked when consuming the RequestQueue

4 participants