vLLM OpenAI API server supports recording experience data#591
Draft
pan-x-c wants to merge 29 commits into
Draft
Conversation
…th legacy) Refactor experience production so heavy data (tokens/logprobs/routed_experts) no longer rides runner->scheduler->coordinator as serialized bytes. The vLLM recorder now captures it in-process into a MemoryStore keyed by task_id, and the coordinator pulls it at finalize time via /records/consume_task. Runners ship only a small reward map. Both paths coexist behind `explorer.use_recorded_experience` (default off = legacy). Recording module (trinity/common/models/vllm_patch/recording/): - store: drop SqlStore; MemoryStore.update_reward_by_task_id stamps reward/run/task on a whole task-id group, pops and returns it (the in-memory replacement for the SQL HistoryRecorder join). - recorder: track in-flight record tasks; add flush() (await pending + queue.join) so a consume sees a quiesced store; honor skip_recording_ctx. - models: build_experience emits one Experience per completion (n>1) with info["sample_index"]; eid.suffix=request_id kept for traceability. - context: add skip_recording_ctx; task_id already flows via api_key (RecordingIdentityMiddleware) and now also via VLLMModel.chat (Ray entry). - query: POST /records/consume_task (flush -> update_reward_by_task_id -> serialize_many); drop the SqlStore 503 branch. - config/server: remove RecordingConfig entirely; the logprob width is a recorder-internal constant (we store only the chosen token, which vLLM force-includes at logprobs=1). No static config threaded through launch. task_id propagation (Ray entry, same contextvar as the HTTP middleware): - vllm_model: chat/generate accept task_id_key, set task_id_ctx around _generate_internal; logprobs sets skip_recording_ctx (auxiliary forward). - model: ModelWrapper.chat/chat_async forward task_id_key; SGLang.chat accepts-and-ignores it (recording is vLLM-only). Coordinator + runner + workflow: - rollout_coordinator: _resolve_rank_urls (ray.get_actor per engine) and a recording-mode finalize that fans out /records/consume_task per engine, deserializes, and feeds objects to the pipeline (no re-serialization). - experience_pipeline: process_experiences(exps) public object entry. - workflow_runner: recording mode returns a pickled reward map keyed by the per-sample task_id_key the workflow stamped; legacy path unchanged. - workflow: SimpleWorkflow/AsyncSimpleWorkflow run a per-sample n=1 loop in recording mode (distinct task_id_key per sample == reward unit for GRPO), legacy n=repeat_times single-call path unchanged. - config: ExplorerConfig.use_recorded_experience flag. SQL path removal (MemoryStore only): - delete proxy/recorder.py (HistoryRecorder) and proxy_test.py; proxy service/app drop /feedback, /commit, record_feedback, submit_experiences, ready_experiences (keep allocate_model + weight sync); allocator no longer fills record_db_url; drop InferenceModelConfig.record_db_url and the dead ExplorerConfig.db_url field; RecordingConfig deleted. Serve-mode external reward reporting is intentionally left unimplemented this version (proxy /feedback//commit removed); the affected serve integration tests (TestServeWithTrainer, ServeTest) are skipped with a pointer to the recording refactor plan. convert_messages_to_experience redirect (multi-turn) is deferred with TODOs at its call sites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-x-c/Trinity-RFT into feature/model_self_record_experience
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
As the title says
Checklist
Please check the following items before code is ready to be reviewed.