Fix/sub thread routing and apitoken access#78
Merged
Conversation
Mark CosmosImport and PostgreSqlImport tools as IsPackable=false to fix NU5019 errors during dotnet pack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Map ApiToken nodeType to Permission.Api in CreateNodePermissionAttribute (same satellite pattern as Thread/Comment) - Set IsSatelliteType=true on ApiToken node, add validation cache - Fix delegation null delivery check and add cancellation registration to prevent infinite hang on sub-thread routing failures - Add tests for ApiToken creation and delegation failure handling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RoutingGrain was passing deliveries to grains without updating the target address when path resolution split prefix/remainder. This caused routing loops for deeply nested sub-thread paths (6+ segments) because the grain received a delivery whose target didn't match its hub address. Now mirrors RoutingServiceBase behavior: sets UnifiedPath property and updates delivery target to the resolved prefix address. Also adds InternalsVisibleTo for MeshWeaver.Hosting.Orleans to access WithTarget, and Orleans tests for sub-thread routing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… leak Organization instances (e.g., PartnerRe) were visible to all authenticated users in search because of three layers of public read access: - ConfigureNodeTypeAccess(WithPublicRead) bypassed partition access in SQL - WithPublicRead() on hub config allowed unauthenticated hub reads - Access rule returned true for all reads Now Organization instances require partition-level permissions for read access. The Organization type definition itself remains visible (it's nodeType=NodeType which has its own WithPublicRead). Routing is unaffected as MeshCatalog path resolution is unprotected by design. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The GenerateAccessControlClause was using OR between public_read and partition_access, meaning any node type with public_read=true (Markdown, User, Organization) was visible to all authenticated users across ALL partitions. This leaked cross-partition data in search results. Now: partition_access is always required for schema-qualified queries. public_read only skips node-level permission checks within accessible partitions, not the partition check itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Updates the stored procedure so partition_access is always required. public_read only skips node-level permission checks, not the partition check. Prevents cross-partition data leakage in global search. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests now reflect that public_read does not bypass partition_access. - GlobalAdmin tests: grant partition_access to all org schemas - PublicRead test: verifies no results without partition_access - CrossPartition access test: asserts other orgs are excluded - Renamed PartnerRe references to FutuRe in test data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AsyncLocal doesn't flow through the AI framework's async streaming and tool invocation chain, so MeshPlugin tool calls (Get, Search, Create, Update, Patch, Delete) ran without user identity. This caused "Access denied" when agents tried to update nodes in partitions the user had access to. Now each tool call explicitly restores the user's AccessContext from ThreadExecutionContext.UserAccessContext via SwitchAccessContext. Also fixes FutuRe schema reference in CrossPartitionSearchTests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that Get and Patch work when AsyncLocal context is cleared (simulating AI framework tool invocation). The plugin must restore the user's identity from ThreadExecutionContext.UserAccessContext. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SetContext directly instead of SwitchAccessContext — no await needed, no disposal needed. The AsyncLocal is scoped to the thread's InvokeAsync async flow so setting it once per tool call is sufficient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The delegation tool calls meshService.CreateNode(subThreadNode) which requires Permission.Thread. Without access context (AsyncLocal lost in AI framework's tool invocation), this fails silently → delegation returns error → AI retries infinitely creating endless delegation attempts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests end-to-end delegation: agent calls delegate_to_agent tool, which creates a sub-thread and submits to it. Verifies access context flows through the AI tool invocation chain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- RoutingGrain: log resolution details at Info level for debugging - Delegation: guard against depth >= 3 to prevent infinite sub-threads - ThreadPathResolutionTest: verifies PostgreSQL correctly resolves deeply nested _Thread paths via satellite table (all 5 tests pass) - OrleansDelegationFlowTest: skeleton for end-to-end delegation test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-tool RestoreUserContext() with AccessContextAIFunction (DelegatingAIFunction) wrapper applied to ALL tools in CreateAgentCore. Every tool invocation — MeshPlugin, delegation, PlanStorage, etc. — now automatically restores the user's identity from ThreadExecutionContext.UserAccessContext before executing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
StreamingCompact was recursively embedding sub-thread StreamingArea via LayoutAreaControl when tc.Result == null. This caused infinite grain activations when the sub-thread didn't exist (CreateNode failed due to missing access context) — each failed activation triggered another embed attempt. Now delegation links are static with status indicators (dot/checkmark). No recursive LayoutAreaControl embedding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- AccessContextToolCallTest: verifies tool calls restore user identity from ThreadExecutionContext, even when AsyncLocal is cleared - StreamingRecursionTest: verifies delegation ToolCalls don't trigger recursive LayoutAreaControl embedding - DelegationDepthGuard: verifies depth >= 3 is detected correctly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous guard counted _Thread segments which didn't detect Worker→Worker→Worker recursion. Now counts segments after _Thread/ to determine real delegation depth (each level adds msgId/subId = 2 segments). Maximum depth = 2 (one delegation level). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ToolStatusFormatter: show agent name (without Agent/ prefix) + task preview instead of "Delegating to Agent/Worker" - appsettings: set MeshWeaver.AI and RoutingGrain to Information level so delegation and routing traces appear in App Insights - Fix delegation depth guard to count actual nesting from path segments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delegation entries now use <details> (same as regular tool calls) so users can expand to see the full task and result. Removed the recursive LayoutAreaView embed for in-progress delegations that caused stack overflow via cascading grain activations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NotifyParentCompletion was using DelegationTracker (static in-memory dictionary) which can't work across Orleans silos. Now posts a second SubmitMessageResponse with Status=ExecutionCompleted back through the hub, which the parent's RegisterCallback receives to resolve the delegation TCS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that SubmitMessageRequest produces both CellsCreated and ExecutionCompleted responses via RegisterCallback. This is the exact pattern used by the delegation tool — without the second response, the parent thread hangs forever. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ration Root cause: RegisterCallback removes callbacks after first invocation. The CellsCreated response consumed the callback, leaving nothing for ExecutionCompleted → parent thread hung forever. Fix: - HandleSubmitMessage registers CompletionCallbacks[threadPath] closure that posts ResponseFor(originalDelivery) on the thread hub - NotifyParentCompletion invokes the callback to send ExecutionCompleted - Delegation tool re-registers callback after CellsCreated response DelegationCompletionTest verifies both responses arrive via RegisterCallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
StreamingView now shows: - Thread title as clickable link - Executing message's Overview (bubble with text + tool calls) For executing delegations in the bubble, embeds the sub-thread's StreamingView (bounded by delegation depth guard, max 2 levels). For completed delegations, shows expandable details. No infinite recursion: StreamingView → Overview → Streaming is bounded by the max delegation depth (2), not by rendering depth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
StreamingView: if thread has executing cell, return its default area. Otherwise null. No title, no wrapping — simple passthrough. StreamingCompact delegation rendering: - Running (Result==null): show name + embed sub-thread's Streaming area - Completed (Result!=null): show title with link (checkmark) Recursion is bounded by delegation depth guard (max 2 levels): StreamingView → Overview → StreamingCompact → sub-thread Streaming → sub-thread Overview → done (no further delegation at max depth). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…alls When recovering a stale executing thread after restart, delegation tool calls now check their sub-thread's status: - Sub-thread completed (IsExecuting=false): mark as done - Sub-thread still running: mark as cancelled (parent can't re-subscribe) - Non-delegation: mark as cancelled Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…de API - AzureClaudeChatClient: handle DataContent as base64 document/image blocks in the Claude API format - AgentChatClient: detect content: prefix with binary extensions (.pdf, .png, .jpg, etc.), load via IContentService as Stream, create DataContent and include in ChatMessage.Contents - Path resolution: local (content:file.pdf → context path) or absolute (@OrgA/Doc/content:file.pdf) - ChatMessage supports mixed content: TextContent + DataContent - Tests for serialization, path parsing, and content type detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Foundation for document format conversion pipeline: - IContentTransformer interface for binary-to-markdown conversion - ContentCollection.GetContentAsTextAsync uses registered transformers - DocSharp.Markdown package added for docx → markdown conversion Next: restore ContentPlugin with xlsx/docx/pdf readers, wire into content browser and agent attachments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…space streams History was loaded via workspace.GetRemoteStream().Current which returns null in Orleans (workspace streams don't propagate). Now queries IMeshService directly (reads from persistence) which is reliable. This fixes agents losing context between messages and not knowing what was discussed or what nodes to update. Also reduced streaming throttle from 1s to 3s to prevent grain scheduler overload (Orleans messages were expiring before delivery). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ompts
- Load conversation history reactively: CombineLatest on remote streams
with 10s timeout and per-thread cache for re-submissions
- Reduce streaming throttle from 1s to 3s to prevent grain scheduler overload
- Worker prompt: mandatory read→adapt→write workflow, max 3 Gets, must Patch
- Orchestrator: prescriptive Worker delegation ("Get X, change Y, Patch it")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single DB query (namespace:{threadPath} nodeType:ThreadMessage) gets all
messages reliably. Remote streams to child grains never connected in time.
Results ordered by Thread.Messages list. Cached for re-submissions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Optimistic rendering: pending cells render instantly (no skeletons) - Thread creation: BuildThreadWithMessages + AutoExecutePendingMessage - History: GetDataRequest to each ThreadMessage node via CombineLatest - Agent reuse: AgentCache per thread path - Proper ChatMessage list passed to GetStreamingResponseAsync - Retry with error on API 500, ReduceToMeshNode fallback - Resubmit: click handler creates output cell, server skips creation - Worker/Orchestrator prompts: mandatory write-back workflow - Orleans test: ColdStart_AgentSeesAllPreviousMessages (FAILING - repro) Response never routes back from grain to test client - Monolith test: ThreeMessages_AgentSeesFullHistory (PASSING) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- XUnitLogger: fall back to XUnitFileOutputRegistry.GetAnyActiveOutputHelper() when testOutputHelperAccessor.OutputHelper is null (silo-side logging) - SharedOrleansFixture: register test client on SILO's routing service via reflection (InProcessSiloHandle.SiloHost.Services). Without this, response routing tried to activate a grain for the client address → failed. - AccessContextGrainCallFilter: swallow NullRef for Orleans internal Stop/Close - Result: silo logs now visible, SubmitMessageRequest routes + responds, history loads 5/5 messages, agent receives 6 messages. Test times out on QueryAsync polling (persistence flush delay) — execution itself works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SharedOrleansFixture: register client on silo's IRoutingService via reflection (siloHost.Services). Fixes response routing back to client. - OrleansChatHistoryTest: use completion callback instead of QueryAsync polling - XUnitLogger: fall back to static GetAnyActiveOutputHelper() for silo logs - Test passes: 5 history messages assembled, 6 sent to agent, completes <1s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… history) ColdStart_AgentSeesAllPreviousMessages now correctly fails with: Expected "I received 1 messages" to contain "6 messages" Root cause: AgentChatClient.BuildMessageWithContextAsync() merges all ChatMessage objects into a single text prompt. The 6 messages (4 history + 1 input cell + 1 new user) become 1 flattened string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Call agent.ChatClient.GetStreamingResponseAsync(allMessages) directly, bypassing AgentChatClient.BuildMessageWithContextAsync which merged all 6 ChatMessage objects into 1 text blob. The ChatClientAgent already has system prompt in its instructions; FunctionInvokingChatClient handles tools. - Add AgentChatClient.GetAgent() to expose the ChatClientAgent - ThreadExecution: call agent.ChatClient directly with full message list - Orleans test: GREEN (agent sees 6 messages: system + 4 history + 1 new) - Monolith tests: GREEN (2→4→6 messages across 3 turns) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- AgentChatClient: prepend agent.Instructions as system message, pass all messages as separate turns to agent.ChatClient (includes FunctionInvokingChatClient) - ThreadExecution: history = all messages EXCEPT last 2 (current input + output) - All tests GREEN: Orleans ColdStart + monolith ThreeMessages + TwoMessages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Assistant messages now include tool call details (name, args, truncated result) so the agent knows what it did in previous turns. Without this, the agent lost context about data it read or actions it took. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Check hub.RunLevel before creating streams. Catch ObjectDisposedException if hub disposes between the check and stream creation (race during F5/nav). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BuildThreadNode/BuildThreadWithMessages added /_Thread/ partition unconditionally. For delegations, contextPath already contains /_Thread/ (it's inside a thread). This created paths like: User/rbuergi/_Thread/thread-id/msg-id/_Thread/sub-thread-id Now detects existing _Thread in path and skips the partition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ThreadExecution: state-driven execution via WatchForExecution (watches workspace stream for IsExecuting=true, replaces command-driven flow) - HandleSubmitMessage: thin state updater only (no cell creation, no execution start) - AgentChatClient: pass tools via ChatOptions from FunctionInvokingChatClient .AdditionalTools (was null → Claude never saw tool definitions) - GUI: remove pendingCells, render LayoutAreaView for every message from the start. New thread flow: create thread → verify → create cells → verify → submit → navigate on response - WatchForExecution: idempotent cell creation (handles both GUI-initiated and delegation sub-threads) - MarkdownView: create kernel node before posting code submissions (fixes Orleans grain activation for interactive markdown) - HandleSubmitMessage: deduplicate Messages with Contains check - Fix $type serialization: PendingAttachments → ToImmutableList() - Fix ThreadsCatalog test for nested _Thread path changes - Update all thread tests to create cells before SubmitMessageRequest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- HandleSubmitMessage: don't set PendingUserMessage (reserved for delegations) - WatchForExecution: PendingUserMessage=null → StartExecution() directly (no slow meshService.CreateNode for existing cells) - ExecuteMessageAsync: include current user cell in history (count-1 instead of count-2), don't add UserMessageText separately when history has it Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ReadOnlyListConverter now always uses JsonDocument per-element parsing instead of Deserialize<T[]>. Old data in PostgreSQL may have $type not as the first property, which crashes the array deserializer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On hub init, both RecoverStaleExecutingThread and WatchForExecution see IsExecuting=true. Recovery clears it, but WatchForExecution already captured the stale state and starts a doomed execution. Fix: skip the first stream emission — recovery handles stale state, WatchForExecution only reacts to new state changes from HandleSubmitMessage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tivation Portal and client hubs are not grains — they register as memory stream subscribers in OrleansRoutingService.RegisterStreamAsync. The RoutingGrain now publishes to the stream for portal/ and client/ addresses instead of trying grain activation (which fails with "node not found"). This fixes cross-silo response routing: portal on silo A receives responses from grains on silo B via the Orleans memory stream. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip-first-emission broke delegation sub-threads (their first emission IS the legitimate trigger). Instead, check ExecutionStartedAt: if older than 2 minutes, it's stale (recovery handles it). Fresh executions (delegations, HandleSubmitMessage) proceed normally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RecoverStaleExecutingThread was killing fresh delegation sub-threads by clearing IsExecuting before WatchForExecution could trigger. Now both recovery and watch use the same 2-minute age threshold on ExecutionStartedAt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The delegation tool now creates both ThreadMessage cells BEFORE creating the thread node. Previously only the thread was created, relying on WatchForExecution to create cells — which was unreliable (race with recovery, cross-silo timing). Now cells exist when the grain activates. Added info-level logging around delegation cell/thread creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies the exact delegation flow: create user cell → create response cell → create thread with IsExecuting=true → WatchForExecution triggers → execution completes → response cell has agent text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HandleSubmitMessage now starts execution directly after updating state, avoiding the unreliable meshService.CreateNode round-trip inside the WatchForExecution subscription (which hangs due to reentrancy). - GUI flow (client provides cell IDs): respond + start immediately - Server flow (no IDs): create cells fire-and-forget, then start; respond with error if cell creation fails - WatchForExecution: only handles BuildThreadWithMessages auto-execute (Take(1) on hub startup, delegation flow) Fixes 12 CI test failures (10 threading + 2 security). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread message cells must have MainNode = the thread's content node (e.g., "PartnerRe/AIConsulting"), not the thread path. This is required for SatelliteAccessRule to delegate read permissions correctly. - HandleSubmitMessage: read MainNode from thread workspace node - WatchForExecution: read MainNode from stream node - ChatClientAgentFactory delegation: use execCtx.ContextPath - ThreadChatView (existing threads): don't fall back to threadPath - SetThreadHubIdentity: set hub access context from thread.CreatedBy Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thread message nodes created from the UI had MainNode set to the thread
path (e.g., "Org/_Thread/thread-id") instead of the content node ("Org").
This caused "Access denied" when SatelliteAccessRule delegated read
permissions to MainNode.
Fix: UPDATE main_node = split_part(main_node, '/_Thread/', 1) for all
ThreadMessage nodes where main_node contains /_Thread/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a silo goes down, grain delivery throws OrleansMessageRejectionException. Without the stream fallback, this returned delivery.Failed() which caused cascading DeliveryFailureExceptions across all UI clients. Restoring the fallback to Orleans memory stream ensures graceful degradation during silo restarts instead of hard crashes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.