Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
776ae71
openai api server record experience data
pan-x-c Jun 24, 2026
5fe9053
simplify
pan-x-c Jun 25, 2026
180b416
add config
pan-x-c Jun 25, 2026
f27988b
use api key as session id
pan-x-c Jun 25, 2026
40b53e8
clean stale header
pan-x-c Jun 25, 2026
d7b60e0
explorer: in-vLLM recording path with task_id batch join (coexists wi…
pan-x-c Jun 25, 2026
4ae9392
simplify code
pan-x-c Jun 25, 2026
4f0f0a7
unify vllm experience recording
pan-x-c Jun 25, 2026
8ce1f4b
add tests
pan-x-c Jun 25, 2026
02eb4e4
add log
pan-x-c Jun 25, 2026
db49262
fix middleware
pan-x-c Jun 25, 2026
e6ae7dc
update interface
pan-x-c Jun 25, 2026
63a5404
fix prompt text
pan-x-c Jun 25, 2026
5ff21b2
fix streaming recorder
pan-x-c Jun 25, 2026
6a0ad44
add delta stream
pan-x-c Jun 25, 2026
bb2f22c
refactor history recording
pan-x-c Jun 25, 2026
bf931de
sglang self recording experiences
pan-x-c Jun 25, 2026
a274ed3
add sglang tests
pan-x-c Jun 25, 2026
35952ae
fix sglang tests
pan-x-c Jun 26, 2026
6d6ce69
remove enable recording
pan-x-c Jun 26, 2026
034fb96
Merge branch 'feature/model_self_record_experience' of github.com:pan…
pan-x-c Jun 26, 2026
82d75ba
fix models
pan-x-c Jun 26, 2026
5cc3f05
add recording server
pan-x-c Jun 26, 2026
936da53
fix sglang
pan-x-c Jun 26, 2026
2228d65
remove redundant fields
pan-x-c Jun 26, 2026
3122b55
fix vllm test
pan-x-c Jun 26, 2026
579d189
fix tests
pan-x-c Jun 26, 2026
052562a
add store
pan-x-c Jun 26, 2026
d8f3de6
refactor store
pan-x-c Jun 26, 2026
8b91f8a
Merge prefix-matched recorded experiences
pan-x-c Jun 29, 2026
fcca4a6
finish merger
pan-x-c Jun 29, 2026
e283fcf
add docstring
pan-x-c Jun 29, 2026
3607157
simplify recorder_key sample id
pan-x-c Jun 29, 2026
b4d916d
optimize memory store
pan-x-c Jun 29, 2026
6d47fb4
simplify doc
pan-x-c Jun 29, 2026
464e566
refactor workflow interface
pan-x-c Jun 29, 2026
9ad8797
remove query layer
pan-x-c Jun 29, 2026
d42c27d
clean workflow interface
pan-x-c Jun 29, 2026
3342225
finish model wrapper interface
pan-x-c Jun 29, 2026
e255c18
refactor workflow interface
pan-x-c Jun 30, 2026
d422bbd
fix instance
pan-x-c Jun 30, 2026
880b8ca
Merge branch 'main' into feature/model_self_record_experience
pan-x-c Jun 30, 2026
81f308c
optimizer prefix merger
pan-x-c Jun 30, 2026
6b7f4dd
fix sub prefix
pan-x-c Jun 30, 2026
f3f13e0
fix recording tests
pan-x-c Jun 30, 2026
1939b31
fix workflow test
pan-x-c Jun 30, 2026
570b927
fix eval batch_id
pan-x-c Jun 30, 2026
d23db4d
add new workflow tests
pan-x-c Jun 30, 2026
ca402f7
fix workflow tests
pan-x-c Jun 30, 2026
d6c00c8
fix workflow tests
pan-x-c Jun 30, 2026
74bc51e
clean code
pan-x-c Jun 30, 2026
1223285
fix scheduler tests
pan-x-c Jun 30, 2026
b2708d0
fix scheduler tests
pan-x-c Jun 30, 2026
a43d6e1
fix coordinator tests
pan-x-c Jun 30, 2026
881f9af
fix tests
pan-x-c Jun 30, 2026
7c14499
fix sglang default key
pan-x-c Jun 30, 2026
8f69ffd
remove legacy history
pan-x-c Jun 30, 2026
14722e3
fix prompt path
pan-x-c Jun 30, 2026
85d753c
fix enable_history
pan-x-c Jun 30, 2026
aee8aa7
fix sglang auth
pan-x-c Jun 30, 2026
47c755d
simplify client side args
pan-x-c Jul 1, 2026
b8a1c34
fix vllm tests
pan-x-c Jul 1, 2026
0084f0f
fix vllm tests
pan-x-c Jul 1, 2026
52c2296
record
pan-x-c Jul 1, 2026
9c57277
simplify pipeline
pan-x-c Jul 1, 2026
ed4d2d8
fix logprobs
pan-x-c Jul 1, 2026
8864c36
fix reasoning parser
pan-x-c Jul 1, 2026
0d359a4
fix toolcall
pan-x-c Jul 1, 2026
14b7262
update recording context
pan-x-c Jul 1, 2026
892cefc
fix pre-commit
pan-x-c Jul 1, 2026
da9b0f1
clean scheduler
pan-x-c Jul 1, 2026
82b1544
clean coordinator
pan-x-c Jul 1, 2026
98430b4
block finished batch
pan-x-c Jul 1, 2026
1146972
update workflow doc
pan-x-c Jul 1, 2026
0ccf77c
clean build experience
pan-x-c Jul 1, 2026
815bd46
clean build experience
pan-x-c Jul 1, 2026
0571edb
fix model version
pan-x-c Jul 1, 2026
14d5bdc
fix model version drift
pan-x-c Jul 1, 2026
740805a
fix workflow reset
pan-x-c Jul 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .codex/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,8 @@
## Repository Convention

Treat `docs/agents/` as the single source of truth for agent-facing process and navigation documents.

## Local Python Environment

- Always use the repository virtual environment for Python commands: `.venv/bin/python`.
- Run Python tools through that interpreter, for example `.venv/bin/python -m pytest ...`, instead of relying on globally installed commands.
9 changes: 7 additions & 2 deletions docker/sync.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ ssh_port="${port_override:-$TRINITY_REMOTE_SSH_PORT}"
rsync_args=(
-az
--itemize-changes
--files-from=-
--from0
-e "ssh -p ${ssh_port} -o StrictHostKeyChecking=accept-new"
)
Expand All @@ -83,6 +82,12 @@ if [[ -n "$untracked" ]]; then
echo "" >&2
fi

# Write file list to a temp file to avoid the "Bad file descriptor" race
# condition that occurs when rsync reads --files-from stdin via a pipe.
tmpfile="$(mktemp -t trinity-sync-XXXXXX)"
trap 'rm -f "$tmpfile"' EXIT
git -C "$PROJECT_DIR" ls-files -z > "$tmpfile"

dest="${TRINITY_REMOTE_HOST}:${TRINITY_REMOTE_WORKSPACE}/"
echo "Syncing git-tracked files: ${PROJECT_DIR}/ -> ${dest}"
git -C "$PROJECT_DIR" ls-files -z | rsync "${rsync_args[@]}" "${PROJECT_DIR}/" "$dest"
rsync "${rsync_args[@]}" --files-from="$tmpfile" "${PROJECT_DIR}/" "$dest"
634 changes: 343 additions & 291 deletions docs/sphinx_doc/source/tutorial/develop_workflow.md

Large diffs are not rendered by default.

4 changes: 0 additions & 4 deletions docs/sphinx_doc/source/tutorial/trinity_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -421,8 +421,6 @@ explorer:
engine_type: vllm
engine_num: 1
tensor_parallel_size: 1
enable_history: false
enable_openai_api: false
nnodes: 1
auxiliary_models:
- model_path: Qwen/Qwen2.5-7B-Instruct
Expand Down Expand Up @@ -460,8 +458,6 @@ explorer:
- `external`: Use external API-based model engine.
- `rollout_model.engine_num`: Number of inference engines.
- `rollout_model.tensor_parallel_size`: Degree of tensor parallelism.
- `rollout_model.enable_history`: Whether to enable model call history recording. If set to `true`, the model wrapper automatically records the return experiences of model calls. Please periodically extract the history via `extract_experience_from_history` to avoid out-of-memory issues. Default is `false`.
- `rollout_model.enable_openai_api`: Whether to enable the openai API provided by Explorer. Default is `false`.
- `rollout_model.nnodes`: Number of nodes for each engine. Default is `1`. Only take effect when `rollout_model.engine_type` is `vllm` or `sglang`. When `nnodes` is greater than `1`, each engine instance will exclusively occupy the GPU resources of the full `nnodes` nodes (`nnodes * cluster.gpu_per_node`); sharing nodes with other instances is not supported.
- `auxiliary_models`: Additional models used for custom workflows, which has the same configuration options as `rollout_model`.
- `eval_interval`: Interval (in steps) for evaluating the model.
Expand Down
530 changes: 294 additions & 236 deletions docs/sphinx_doc/source_zh/tutorial/develop_workflow.md

Large diffs are not rendered by default.

4 changes: 0 additions & 4 deletions docs/sphinx_doc/source_zh/tutorial/trinity_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -418,8 +418,6 @@ explorer:
engine_type: vllm
engine_num: 1
tensor_parallel_size: 1
enable_history: false
enable_openai_api: false
nnodes: 1
auxiliary_models:
- model_path: Qwen/Qwen2.5-7B-Instruct
Expand Down Expand Up @@ -456,8 +454,6 @@ explorer:
- `external`: 使用外部 API 引擎。
- `rollout_model.engine_num`: 推理引擎实例的数量。
- `rollout_model.tensor_parallel_size`: 每个实例的张量并行度。
- `rollout_model.enable_history`: 是否启用模型调用历史记录功能。若设为 `True`,模型会自动记录调用返回的 experience。请定期通过 `extract_experience_from_history` 提取历史,以避免内存溢出。默认为 `False`。
- `rollout_model.enable_openai_api`: 是否启用 OpenAI API 推理服务。默认为 `False`。
- `rollout_model.nnodes`: 部署每个推理引擎实例所需的节点数。默认为 `1`。仅在 `rollout_model.engine_type` 为 `vllm` 或 `sglang` 时生效。当 `nnodes` 大于 `1` 时,每个引擎实例将会占用完整的 `nnodes` 个节点的 GPU 资源 (`nnodes * cluster.gpu_per_node`),不支持与其他实例共享节点。
- `auxiliary_models`: 用于自定义工作流的辅助模型,配置与 `rollout_model` 相同。
- `eval_interval`: 模型评估的间隔(以 step 为单位)。
Expand Down
156 changes: 156 additions & 0 deletions tests/buffer/memory_store_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
import unittest
import uuid

import torch

from trinity.buffer.store import ExperienceUpdate, MemoryStore
from trinity.common.experience import EID, Experience


def get_dummy_experience(num: int, request_id: str | None = None):
request_id = request_id or uuid.uuid4().hex[:6]
return [
Experience(
eid=EID(suffix=request_id if num == 1 else f"{request_id}:{i}"),
tokens=torch.zeros(5),
prompt_length=2,
info={
"sample_index": i,
"model_version": 0,
},
)
for i in range(num)
]


class MemoryStoreTest(unittest.TestCase):
def test_add_update_get_remove(self):
store = MemoryStore()
key = "0/task_a/1"
experiences = get_dummy_experience(3, request_id="req_a")

store.add(key, experiences)
self.assertEqual(len(store), 3)

store.update(
key,
update=ExperienceUpdate(reward=1.0, info={"source": "reward_model"}),
sample_ids=None,
)
result = store.get(key)
self.assertEqual(len(result), 3)
for exp in result:
self.assertEqual(exp.reward, 1.0)
self.assertEqual(exp.info["source"], "reward_model")
self.assertEqual(exp.eid.batch, "0")
self.assertEqual(exp.eid.task, "task_a")
self.assertEqual(exp.eid.run, 1)

removed = store.remove(key)
self.assertEqual(len(removed), 3)
self.assertEqual(store.get(key), [])
self.assertEqual(len(store), 0)

def test_update_subset_by_sample_ids(self):
store = MemoryStore()
key = "0/task_a/1"
experiences = get_dummy_experience(2, request_id="req_b")

store.add(key, experiences)
teacher_logprobs = torch.ones(3)
store.update(
key,
update=ExperienceUpdate(reward=2.0, teacher_logprobs=teacher_logprobs),
sample_ids=["req_b:1"],
)

result = store.get(key)
self.assertIsNone(result[0].reward)
self.assertEqual(result[1].reward, 2.0)
self.assertEqual(result[1].eid.batch, "0")
self.assertEqual(result[1].eid.task, "task_a")
self.assertEqual(result[1].eid.run, 1)
torch.testing.assert_close(result[1].teacher_logprobs, teacher_logprobs)

def test_overwrite_replaces_existing_records(self):
store = MemoryStore()
key = "0/task_a/1"

store.add(key, get_dummy_experience(2, request_id="old"))
store.overwrite(key, get_dummy_experience(1, request_id="new"))

result = store.get(key)
self.assertEqual(len(result), 1)
self.assertEqual(result[0].eid.suffix, "new")

def test_prefix_get_and_remove(self):
store = MemoryStore()
store.add("0/task_a/0", get_dummy_experience(1, request_id="a0"))
store.add("0/task_a/1", get_dummy_experience(2, request_id="a1"))
store.add("0/task_b/0", get_dummy_experience(1, request_id="b0"))

self.assertEqual(len(store.get("0/task_a")), 3)
self.assertEqual(len(store.remove("0/task_a")), 3)
self.assertEqual(len(store.get("0")), 1)
self.assertEqual(store.keys(), ["0/task_b/0"])

def test_complete_key_required_for_mutations(self):
store = MemoryStore()
with self.assertRaises(ValueError):
store.add("0/task_a", get_dummy_experience(1))
with self.assertRaises(ValueError):
store.overwrite("0/task_a", get_dummy_experience(1))
with self.assertRaises(ValueError):
store.update("0/task_a", update=ExperienceUpdate(reward=1.0), sample_ids=None)
with self.assertRaises(ValueError):
store.add("0/task_a/not_int", get_dummy_experience(1))

def test_duplicate_sample_id_is_rejected(self):
store = MemoryStore()
exp = get_dummy_experience(1, request_id="dup")
store.add("0/task_a/0", exp)
with self.assertRaises(ValueError):
store.add("0/task_a/1", exp)

def test_blocked_prefix_drops_add_and_overwrite(self):
store = MemoryStore()
key = "0/task_a/0"
store.add(key, get_dummy_experience(1, request_id="pre"))
self.assertFalse(store.is_prefix_blocked("0"))

# Real flow: block the batch, then delete its existing records.
store.block_prefix("0")
self.assertTrue(store.is_prefix_blocked("0"))
store.remove(key)
self.assertEqual(store.get(key), [])

# A late add on a fresh key under the blocked batch is dropped.
store.add("0/task_a/1", get_dummy_experience(2, request_id="post"))
self.assertEqual(store.get("0/task_a/1"), [])
self.assertNotIn("0/task_a/1", store.keys())

# A late overwrite is also dropped: _drop_key is a no-op (records were
# already deleted) and add is blocked, so nothing reappears.
store.overwrite(key, get_dummy_experience(1, request_id="overwrite"))
self.assertEqual(store.get(key), [])
self.assertNotIn(key, store.keys())

def test_blocked_prefix_does_not_affect_other_batches(self):
store = MemoryStore()
store.block_prefix("0")
store.add("1/task_a/0", get_dummy_experience(1, request_id="other"))
self.assertEqual(len(store.get("1/task_a/0")), 1)

def test_blocked_prefix_keeps_get_and_remove_working(self):
store = MemoryStore()
key = "0/task_a/0"
store.add(key, get_dummy_experience(2, request_id="keep"))
store.block_prefix("0")
# Reads and removes still work on already-stored records.
self.assertEqual(len(store.get(key)), 2)
self.assertEqual(len(store.remove(key)), 2)
self.assertEqual(store.get(key), [])


if __name__ == "__main__":
unittest.main()
103 changes: 0 additions & 103 deletions tests/common/experience_extraction_test.py

This file was deleted.

Loading
Loading