Skip to content

Add TaskDirReader for harbor datasets#588

Open
pan-x-c wants to merge 3 commits into
agentscope-ai:mainfrom
pan-x-c:feature/harbor_dataset
Open

Add TaskDirReader for harbor datasets#588
pan-x-c wants to merge 3 commits into
agentscope-ai:mainfrom
pan-x-c:feature/harbor_dataset

Conversation

@pan-x-c

@pan-x-c pan-x-c commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Description

This pull request adds support for reading tasksets organized as directories (folder-style tasksets), such as those used in Harbor-style benchmarks. It introduces a new TaskDirReader class, integrates it into the buffer reading system, and provides comprehensive tests for its functionality. Additionally, it generalizes the task scheduler to support any BufferReader implementation, not just file-based readers.

New folder-style taskset reader:

  • Added the TaskDirReader class in trinity/buffer/reader/task_dir_reader.py, allowing Trinity to read datasets where each task is represented by a directory. The reader supports optional indexing, batch reading, and integrates with selectors.
  • Registered the new "task_dir" storage type in the buffer reader registry (trinity/buffer/reader/__init__.py) and added TASK_DIR to the StorageType enum (trinity/common/constants.py). [1] [2]
  • Updated the optional dependencies comment in pyproject.toml to mention the harbor package for Harbor dataset support.

Testing and validation:

  • Added a comprehensive test suite for TaskDirReader in tests/buffer/task_dir_reader_test.py, covering reading, indexing, and resume functionality.

Generalization and integration:

  • Generalized the TaskScheduler to accept any BufferReader implementation, not just FileReader, making it compatible with the new TaskDirReader. [1] [2]

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@pan-x-c pan-x-c changed the title Add harbor reader Add TaskDirReader for harbor datasets Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant