Add create_or_update_text_document agent tool#1808
Conversation
Lets corpus-level agents author or version-up a text-based document
in the active corpus. Scope is intentionally limited to text formats
(text/plain, text/markdown, application/txt) for the initial release;
binary formats still require the parsing pipeline.
The tool derives the corpus path from `title` (matching Corpus.add_document
sanitisation), so a second call with the same title in the same corpus
hits the same path and the dual-tree versioning architecture
(documents/versioning.py::import_document) version-ups the existing doc.
Returns status ("created" | "updated"), document_id, version_number,
path, byte_count, etc. Wired into the registry as a CORPUS-category
tool with requires_approval=True / requires_write_permission=True
(HITL gate, mirroring move_document).
Code Review: Add
|
| Scenario | Covered |
|---|---|
| Create at corpus root | yes |
| Create in folder | yes |
| Version-up (same title) | yes |
text/markdown accepted |
yes |
| Binary MIME rejected | yes |
| Empty title rejected | yes |
None content rejected |
yes |
| Empty string content | no |
| Nonexistent user | yes |
| IDOR on foreign corpus | yes |
| Read-only corpus PermissionError | yes |
| Folder in wrong corpus | yes |
| Path sanitisation | yes |
| Quota exceeded | no |
| Async wrapper smoke | yes |
Summary
The security posture (IDOR prevention, HITL flag, quota delegation) is correct, the async wrapping follows the _db_sync_to_async pattern, and the test suite covers the main happy and error paths well. The two actionable items before merge are: fix the empty-content validation gap (either add the check or remove the docstring claim) and rename _path_record to path_record. The path-collision docstring clarification and quota test are lower-priority but recommended.
Claude review (#1808): - Reject empty / whitespace-only content (was: 0-byte doc created) - Rename _path_record to path_record (used three lines down) - Document path-collision behavior in docstring + tool_registry desc - Add tests for empty content, quota-exceeded, and both aliases Discovered bug in opencontractserver/documents/versioning.py: the update-existing-path branch did 'effective_txt_file = txt_file or old_doc.txt_extract_file', causing a version-up to silently inherit the previous version's stored file when the caller passed content bytes but no explicit File object. Aligned with the create-path logic (lines 324-343): always materialise the content bytes into a fresh ContentFile when no File object is supplied.
|
Test comment - please ignore |
Code Review: Add create_or_update_text_document agent toolOverviewThis PR adds a well-scoped corpus-level LLM tool for creating or versioning-up text documents, along with a meaningful bug fix in versioning.py that prevented version-up from writing new content. The design is correct and the test coverage is broad. There are a few issues worth addressing before merge. Critical: Service Layer Invariant ViolationFile: The line Per CLAUDE.md, LLM tools with a user context must reach models through Fix: use the corpus service, e.g. Medium: Possible Breaking API Change in versioning.pyFile: The old fallback Ask: confirm no other caller of Minor: Async Test Signal Management is FragileFile:
Recommended pattern: disconnect in Minor: aupload_text_document Absent from FUNCTION_MAPFile: The Minor: No Content Size Guard
Positive Notes
|
Removed the per-class disconnect/reconnect of process_doc_on_create_atomic. conftest.py already disables document processing signals session-wide via the disable_document_processing_signals autouse fixture, so the class-level override was redundant — and worse, the tearDown reconnect *re-enabled* the signal mid-session, causing 33 unrelated tests (test_nested_approval_gates, test_llm_tools, test_blob_retention, test_extract_tasks, ...) that follow on the same xdist worker to fail with IsADirectoryError when their Document.objects.create() calls fired the now-reconnected parser pipeline.
Code Review —
|
…nt-versioning-JlyzS # Conflicts: # CHANGELOG.md
- Route corpus visibility + UPDATE check through BaseService.get_or_none / user_has so the LLM tool no longer composes Tier-0 visible_to_user / user_can inline (CLAUDE.md rule 7). - Cap content at MAX_FILE_UPLOAD_SIZE_BYTES before encoding to prevent in-memory runaway from a misbehaving agent. - Widen content parameter annotation to str | None to match the None-rejection test contract. - Document path-derivation expectation inline in test_path_derivation_collapses_unsafe_chars and add a size-cap test.
Summary
Adds a new corpus-scoped LLM tool,
create_or_update_text_document, that lets a corpus-level agent author or version-up a text-based document inside the active corpus.text/plain(default),text/markdown,application/txt(theTEXT_MIMETYPESset). Binary formats (PDF/DOCX) still require the parsing pipeline and are intentionally out of scope for this iteration.titleusing the same sanitisation asCorpus.add_document(truncate toMAX_FILENAME_LENGTH, non-alphanumeric →_,DEFAULT_DOCUMENT_PATH_PREFIX). A second call with the same title in the same corpus hits the same path and the dual-tree versioning architecture (documents/versioning.py::import_document) version-ups the existing doc — sameversion_tree_id,version_number + 1, old row flipped offis_current.requires_approval=True+requires_write_permission=True+ToolCategory.CORPUS, mirroringmove_document.PermissionTypes.UPDATEon the corpus (raisesPermissionErrorotherwise). User / corpus / folder lookups all return the same"does not exist or is not accessible"message regardless of whether the row is missing or unreachable. Quota check routes throughDocumentService.check_user_upload_quotaso capped users get a cleanValueErrorinstead of a half-created doc. The author is granted CRUD on the resulting document.{status: "created" | "updated", document_id, corpus_id, path, version_number, file_type, byte_count, message}.The tool exposes both
acreate_or_update_text_document(async, used by the agent runtime) and a synccreate_or_update_text_documentfor tests.upload_text_document/aupload_text_documentare exported as aliases for callers that prefer the verb-first name.Files
opencontractserver/llms/tools/core_tools/text_document_import.py— new tool module (sync + async).opencontractserver/llms/tools/core_tools/__init__.py— re-export the new symbols.opencontractserver/llms/tools/tool_registry.py—ToolDefinition(with parameter metadata for the GraphQL "available tools" surface) +FUNCTION_MAPentry (withupload_text_documentalias).opencontractserver/tests/test_text_document_import_tool.py— covers create, in-folder placement, second-call version-up,text/markdownacceptance,application/pdfrejection, empty-title /None-content rejection, nonexistent-user rejection, IDOR-on-other-user-corpus, read-only-corpus →PermissionError, folder-in-wrong-corpus rejection, and an async-wrapper smoke test.CHANGELOG.md— Unreleased / Added entry.Test plan
docker compose -f test.yml run django pytest opencontractserver/tests/test_text_document_import_tool.py -n 4 --dist loadscopedocker compose -f test.yml run django pytest opencontractserver/tests/test_tool_registry.py opencontractserver/tests/test_move_document_tool.py -n 4 --dist loadscope(regression: registry + sibling tool still pass)pre-commit run --all-filesGenerated by Claude Code