diff --git a/.agents/skills/deepgram-java-audio-intelligence/SKILL.md b/.agents/skills/deepgram-java-audio-intelligence/SKILL.md index edccbd2..56dd872 100644 --- a/.agents/skills/deepgram-java-audio-intelligence/SKILL.md +++ b/.agents/skills/deepgram-java-audio-intelligence/SKILL.md @@ -7,15 +7,10 @@ description: Use when writing or reviewing Java code in this repo that enables D Audio intelligence is not a separate client in this SDK. It is the **Listen V1 REST request surface** with additional analysis fields enabled. -## When to use this product - -- You have **audio** and want transcript + analysis together. -- REST is the main path; the Java WebSocket client only exposes the real-time subset. - **Use a different skill when:** -- You want plain transcription only → `deepgram-java-speech-to-text`. -- You already have text and only need text analysis → `deepgram-java-text-intelligence`. -- You need turn-aware conversational streaming → `deepgram-java-conversational-stt`. +- Plain transcription only → `deepgram-java-speech-to-text`. +- Text (not audio) analysis → `deepgram-java-text-intelligence`. +- Turn-aware conversational streaming → `deepgram-java-conversational-stt`. ## Authentication @@ -46,24 +41,22 @@ ListenV1RequestUrl request = ListenV1RequestUrl.builder() MediaTranscribeResponse result = client.listen().v1().media().transcribeUrl(request); ``` -The concrete repo example (`examples/listen/AdvancedOptions.java`) demonstrates the same pattern for enabling higher-value Listen options via the builder. - -## What else the REST request surface supports +The concrete repo example (`examples/listen/AdvancedOptions.java`) demonstrates the same pattern for enabling higher-value Listen options via the builder. Always check the response for the intelligence fields you requested: -The generated `ListenV1RequestUrl` and `MediaTranscribeRequestOctetStream` classes also expose these verified analysis fields in this checkout: - -- `sentiment` -- `summarize` -- `topics` -- `customTopic` -- `customTopicMode` -- `intents` -- `customIntent` -- `customIntentMode` -- `detectEntities` -- `detectLanguage` -- `diarize` -- `redact` +```java +result.visit(new MediaTranscribeResponse.Visitor() { + @Override + public Void visit(ListenV1Response response) { + response.getResults().getSentiments().ifPresent(s -> System.out.println("Sentiment: " + s)); + return null; + } + @Override + public Void visit(com.deepgram.types.ListenV1AcceptedResponse accepted) { + System.out.println("Async accepted: " + accepted.getRequestId()); + return null; + } +}); +``` ## Quick start — WebSocket subset @@ -94,29 +87,19 @@ In this Java checkout, the WebSocket connect options include `diarize`, `detectE ## API reference (layered) -1. **In-repo source of truth**: `src/main/java/com/deepgram/resources/listen/v1/media/requests/` and `src/main/java/com/deepgram/resources/listen/v1/websocket/` plus `examples/listen/AdvancedOptions.java`. `reference.md` is absent here. +1. **In-repo source of truth**: `src/main/java/com/deepgram/resources/listen/v1/media/requests/` and `src/main/java/com/deepgram/resources/listen/v1/websocket/` plus `examples/listen/AdvancedOptions.java`. 2. **Canonical OpenAPI (REST)**: https://developers.deepgram.com/openapi.yaml 3. **Canonical AsyncAPI (WSS subset)**: https://developers.deepgram.com/asyncapi.yaml -4. **Context7**: `/llmstxt/developers_deepgram_llms_txt` -5. **Product docs**: - - https://developers.deepgram.com/docs/stt-intelligence-feature-overview - - https://developers.deepgram.com/docs/summarization - - https://developers.deepgram.com/docs/topic-detection - - https://developers.deepgram.com/docs/intent-recognition - - https://developers.deepgram.com/docs/sentiment-analysis - - https://developers.deepgram.com/docs/language-detection - - https://developers.deepgram.com/docs/redaction - - https://developers.deepgram.com/docs/diarization +4. **Product docs**: https://developers.deepgram.com/docs/stt-intelligence-feature-overview (links to individual feature docs for summarization, topics, intents, sentiment, language detection, redaction, diarization). ## Gotchas -1. **There is no separate “audio intelligence client”.** Everything hangs off Listen V1. -2. **Most intelligence fields are REST-only in this SDK surface.** The WebSocket connect options do not expose `summarize`, `topics`, `intents`, or `detectLanguage`. -3. **`summarize` on Listen V1 is its own generated type.** Do not assume the Read API shape is identical. -4. **The repo example only demonstrates diarization-level options.** There is no dedicated example file for sentiment/topics/intents in this checkout. -5. **`redact` is currently a single `String` field on the REST builders.** Do not assume Python-style string-or-list support here. -6. **Model support matters.** The examples consistently use `NOVA3`; follow that unless you have verified another model supports the overlays you need. -7. **These fields live on both URL and byte-upload request builders.** Pick the builder that matches your input source. +1. **No separate “audio intelligence client”.** Everything hangs off Listen V1 request builders. +2. **Most intelligence fields are REST-only.** WebSocket connect options do not expose `summarize`, `topics`, `intents`, or `detectLanguage`. +3. **`summarize` on Listen V1 has its own generated type.** Do not assume the Read API shape is identical. +4. **`redact` is a single `String` field** on the REST builders -- not a list like the Python SDK. +5. **Use `NOVA3` model** unless you have verified another model supports the overlays you need. +6. **Both URL and byte-upload builders expose intelligence fields.** Pick the builder that matches your input source. ## Example files in this repo @@ -126,10 +109,4 @@ In this Java checkout, the WebSocket connect options include `diarize`, `detectE ## Central product skills -For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: - -```bash -npx skills add deepgram/skills -``` - -This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). +For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`. diff --git a/.agents/skills/deepgram-java-management-api/SKILL.md b/.agents/skills/deepgram-java-management-api/SKILL.md index b2dc313..2a71435 100644 --- a/.agents/skills/deepgram-java-management-api/SKILL.md +++ b/.agents/skills/deepgram-java-management-api/SKILL.md @@ -7,15 +7,9 @@ description: Use when writing or reviewing Java code in this repo that calls Dee Administrative REST APIs for project metadata, project-scoped resources, and model discovery. -## When to use this product - -- List or inspect projects. -- Manage project keys, members, invites, usage, or billing. -- Discover public or project-scoped STT/TTS models. - **Use a different skill when:** -- You want to run a live agent session → `deepgram-java-voice-agent`. -- You want speech/text inference rather than project administration → use the product skills for STT, TTS, or Read. +- Live agent session → `deepgram-java-voice-agent`. +- Speech/text inference → use the STT, TTS, or Read product skills. ## Authentication @@ -49,8 +43,6 @@ for (ListProjectsV1ResponseProjectsItem project : projects) { ## Quick start — project models / keys -Pick a project from the list above. New accounts may have zero projects — guard against that before indexing. - ```java if (projects.isEmpty()) { throw new IllegalStateException("No Deepgram projects are visible to this API key."); @@ -63,21 +55,38 @@ client.manage().v1().projects().members().list(projectId); client.manage().v1().projects().members().invites().list(projectId); client.manage().v1().projects().usage().get(projectId); client.manage().v1().projects().billing().balances().list(projectId); + ``` -## Key parameters / API surface +## Destructive operations — validate-then-act -- Top-level public models: `client.manage().v1().models().list()` and `.get(modelId)` -- Projects: `projects().list()`, `get(projectId)`, `update(projectId, ...)`, `delete(projectId)`, `leave(projectId)` -- Keys: `projects().keys().list/create/get/delete` -- Members: `projects().members().list/delete` -- Invites: `projects().members().invites().list/create/delete` -- Project models: `projects().models().list(projectId)` -- Usage: `projects().usage().get(projectId)` -- Billing: `projects().billing().balances().list(projectId)` -- Requests: `projects().requests()` subtree exists in the generated API surface -- Agent think-model discovery: `client.agent().v1().settings().think().models().list()` -- Most clients expose `withRawResponse()` alongside typed methods +```java +// 1. Verify the key exists before deleting +try { + var key = client.manage().v1().projects().keys().get(projectId, keyId); + // 2. Confirm identity before proceeding + System.out.printf("Deleting key: %s%n", key.getApiKeyId()); + client.manage().v1().projects().keys().delete(projectId, keyId); +} catch (Exception e) { + System.err.println("Key not found or delete failed: " + e.getMessage()); +} +``` + +## API surface (all under `client.manage().v1()`) + +| Resource | Methods | +|----------|---------| +| `models()` | `list()`, `get(modelId)` | +| `projects()` | `list()`, `get`, `update`, `delete`, `leave` | +| `projects().keys()` | `list`, `create`, `get`, `delete` | +| `projects().members()` | `list`, `delete` | +| `projects().members().invites()` | `list`, `create`, `delete` | +| `projects().models()` | `list(projectId)` | +| `projects().usage()` | `get(projectId)` | +| `projects().billing().balances()` | `list(projectId)` | +| `projects().requests()` | subtree in generated surface | + +Also: `client.agent().v1().settings().think().models().list()` for think-model discovery. Most clients expose `withRawResponse()` variants. ## API reference (layered) @@ -92,12 +101,10 @@ client.manage().v1().projects().billing().balances().list(projectId); ## Gotchas -1. **Use an API key, not a temporary JWT, for Manage APIs.** The token-grant endpoint explicitly says those JWTs do not work here. -2. **Some example files are intentionally excluded from Gradle `compileExamples`.** `manage/ListModels.java`, `manage/MemberPermissions.java`, and `manage/UsageBreakdown.java` are currently excluded in `build.gradle`. -3. **Many manage examples are read-only by default.** Create/delete snippets are commented out to avoid destructive calls. -4. **Project-scoped model discovery and global model discovery are different.** `models().list()` returns public models; `projects().models().list(projectId)` returns what a project can use. -5. **This checkout does not expose the Python-style persisted voice-agent configuration client.** Do not promise `voice_agent.configurations.*` here. -6. **The SDK is highly nested.** For invites, the path is `projects().members().invites()`, not a top-level `invites()` client. +1. **Some example files are excluded from Gradle `compileExamples`** (`ListModels.java`, `MemberPermissions.java`, `UsageBreakdown.java`). +2. **Global vs project-scoped model discovery differ.** `models().list()` returns public models; `projects().models().list(projectId)` returns what a project can use. +3. **No Python-style persisted voice-agent configuration client** in this checkout. Do not promise `voice_agent.configurations.*`. +4. **The SDK is highly nested.** For invites: `projects().members().invites()`, not a top-level `invites()` client. ## Example files in this repo @@ -112,10 +119,4 @@ client.manage().v1().projects().billing().balances().list(projectId); ## Central product skills -For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: - -```bash -npx skills add deepgram/skills -``` - -This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). +For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`. diff --git a/.agents/skills/deepgram-java-speech-to-text/SKILL.md b/.agents/skills/deepgram-java-speech-to-text/SKILL.md index ecc25c1..6a74fb3 100644 --- a/.agents/skills/deepgram-java-speech-to-text/SKILL.md +++ b/.agents/skills/deepgram-java-speech-to-text/SKILL.md @@ -7,15 +7,10 @@ description: Use when writing or reviewing Java code in this repo that calls Dee Basic transcription for prerecorded audio over REST or live audio over WebSocket via `/v1/listen`. -## When to use this product - -- **REST (`media().transcribeUrl` / `transcribeFile`)** — one-shot transcription of a complete URL or byte array. -- **WebSocket (`v1WebSocket()`)** — live streaming transcription with interim/final results. - **Use a different skill when:** -- You want summaries, sentiment, topics, intents, diarization, redaction, or language detection overlays on the same endpoint → `deepgram-java-audio-intelligence`. -- You need turn-aware conversational streaming on `/v2/listen` → `deepgram-java-conversational-stt`. -- You need a full interactive assistant with TTS + LLM orchestration → `deepgram-java-voice-agent`. +- Summaries, sentiment, topics, diarization, or redaction overlays → `deepgram-java-audio-intelligence`. +- Turn-aware conversational streaming (`/v2/listen`) → `deepgram-java-conversational-stt`. +- Full interactive assistant with TTS + LLM → `deepgram-java-voice-agent`. ## Authentication @@ -62,25 +57,12 @@ MediaTranscribeResponse result = client.listen().v1().media().transcribeUrl(requ result.visit(new MediaTranscribeResponse.Visitor() { @Override public Void visit(ListenV1Response response) { - // Guard channels + alternatives against empty results (matches examples/listen/TranscribeUrl.java). - String transcript = ""; - java.util.List channels = response.getResults().getChannels(); - if (channels != null && !channels.isEmpty()) { - java.util.List alternatives = response.getResults() - .getChannels().get(0) - .getAlternatives().orElse(java.util.Collections.emptyList()); - if (!alternatives.isEmpty()) { - transcript = response.getResults() - .getChannels().get(0) - .getAlternatives().orElse(java.util.Collections.emptyList()) - .get(0) - .getTranscript().orElse(""); - } - } + String transcript = response.getResults().getChannels().get(0) + .getAlternatives().orElse(java.util.Collections.emptyList()) + .get(0).getTranscript().orElse(""); System.out.println(transcript); return null; } - @Override public Void visit(com.deepgram.types.ListenV1AcceptedResponse accepted) { System.out.println("Request accepted: " + accepted.getRequestId()); @@ -129,9 +111,14 @@ wsClient.onResults(result -> { System.out.printf("%s %s%n", isFinal ? "[final]" : "[interim]", transcript); } }); +wsClient.onError(err -> System.err.println("WebSocket error: " + err.getMessage())); -wsClient.connect(V1ConnectOptions.builder().model(ListenV1Model.NOVA3).build()) - .get(10, TimeUnit.SECONDS); +try { + wsClient.connect(V1ConnectOptions.builder().model(ListenV1Model.NOVA3).build()) + .get(10, TimeUnit.SECONDS); +} catch (Exception e) { + throw new RuntimeException("Failed to connect STT WebSocket", e); +} // send raw audio chunks here // wsClient.sendMedia(okio.ByteString.of(audioChunk)); @@ -199,10 +186,4 @@ The async REST clients return `CompletableFuture`. WebSocket clients are alre ## Central product skills -For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: - -```bash -npx skills add deepgram/skills -``` - -This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). +For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`. diff --git a/.agents/skills/deepgram-java-text-to-speech/SKILL.md b/.agents/skills/deepgram-java-text-to-speech/SKILL.md index 2ca45b9..d6059aa 100644 --- a/.agents/skills/deepgram-java-text-to-speech/SKILL.md +++ b/.agents/skills/deepgram-java-text-to-speech/SKILL.md @@ -7,13 +7,8 @@ description: Use when writing or reviewing Java code in this repo that calls Dee Convert text to audio with REST or stream audio back incrementally over WebSocket via `/v1/speak`. -## When to use this product - -- **REST (`audio().generate`)** — one-shot synthesis when you already have the full text. -- **WebSocket (`v1WebSocket()`)** — lower-latency synthesis while text arrives in chunks. - **Use a different skill when:** -- You need the system to listen, think, and speak in one session → `deepgram-java-voice-agent`. +- Full interactive assistant (listen + think + speak) → `deepgram-java-voice-agent`. ## Authentication @@ -41,8 +36,9 @@ SpeakV1Request request = SpeakV1Request.builder() .build(); InputStream audioStream = client.speak().v1().audio().generate(request); -Files.copy(audioStream, Path.of("output.mp3"), StandardCopyOption.REPLACE_EXISTING); +long bytes = Files.copy(audioStream, Path.of("output.mp3"), StandardCopyOption.REPLACE_EXISTING); audioStream.close(); +if (bytes == 0) throw new RuntimeException("TTS returned empty audio"); ``` REST returns an `InputStream`, not JSON. @@ -132,9 +128,8 @@ CompletableFuture future = asyncClient.speak().v1().audio().generat 2. **Flush before close on WebSocket.** The example sends `Flush` before `Close` so the tail of the audio is not lost. 3. **Streaming audio arrives as binary `ByteString`.** Convert to bytes before writing or playback. 4. **WebSocket options are narrower than REST.** `container` and `bitRate` are REST request fields, not WebSocket connect options in this checkout. -5. **TTS defaults are minimal unless you set them.** The example only sets `text`; pick an explicit model/encoding when output format matters. -6. **There is no Java `TextBuilder` helper in this repo.** That Python helper does not exist here. -7. **Async REST is `CompletableFuture`.** You still need to close the stream after the future resolves. +5. **TTS defaults are minimal.** Pick an explicit model/encoding when output format matters. +6. **Async REST is `CompletableFuture`.** You still need to close the stream after the future resolves. ## Example files in this repo @@ -144,10 +139,4 @@ CompletableFuture future = asyncClient.speak().v1().audio().generat ## Central product skills -For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: - -```bash -npx skills add deepgram/skills -``` - -This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). +For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`. diff --git a/.agents/skills/deepgram-java-voice-agent/SKILL.md b/.agents/skills/deepgram-java-voice-agent/SKILL.md index 3c957b5..39de249 100644 --- a/.agents/skills/deepgram-java-voice-agent/SKILL.md +++ b/.agents/skills/deepgram-java-voice-agent/SKILL.md @@ -7,16 +7,10 @@ description: Use when writing or reviewing Java code in this repo that builds an Run a full-duplex voice agent over a single WebSocket: user audio in, agent events + audio out. -## When to use this product - -- You want a live conversational agent. -- You need STT + think-provider + TTS orchestration in one session. -- You may need message injection, prompt updates, or function-call handling. - **Use a different skill when:** -- You only need transcription → `deepgram-java-speech-to-text` or `deepgram-java-conversational-stt`. -- You only need speech synthesis → `deepgram-java-text-to-speech`. -- You only need project/admin endpoints → `deepgram-java-management-api`. +- Transcription only → `deepgram-java-speech-to-text` or `deepgram-java-conversational-stt`. +- Speech synthesis only → `deepgram-java-text-to-speech`. +- Project/admin endpoints → `deepgram-java-management-api`. ## Authentication @@ -32,13 +26,11 @@ The agent WebSocket uses the SDK's `agent` environment URL and the same auth hea ## Quick start +Workflow: 1) Create client 2) Register handlers (including onWelcome) 3) Connect 4) onWelcome fires -- sendSettings 5) Verify onSettingsApplied 6) Stream audio via sendMedia. + ```java -import com.deepgram.resources.agent.v1.types.AgentV1Settings; -import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgent; -import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThink; -import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThinkOneItem; -import com.deepgram.resources.agent.v1.types.AgentV1SettingsAgentThinkOneItemProvider; -import com.deepgram.resources.agent.v1.types.AgentV1SettingsAudio; +// imports from com.deepgram.resources.agent.v1.types.* and com.deepgram.types.* +import com.deepgram.resources.agent.v1.types.*; import com.deepgram.resources.agent.v1.websocket.V1WebSocketClient; import com.deepgram.types.OpenAiThinkProvider; import java.util.List; @@ -66,8 +58,14 @@ wsClient.onWelcome(welcome -> { wsClient.onConversationText(text -> System.out.printf("[%s] %s%n", text.getRole(), text.getContent())); wsClient.onAgentStartedSpeaking(event -> System.out.println(">> Agent started speaking")); wsClient.onAgentV1Audio(audioData -> System.out.printf("Received %d bytes%n", audioData.size())); - -wsClient.connect().get(10, java.util.concurrent.TimeUnit.SECONDS); +wsClient.onErrorMessage(err -> System.err.println("Agent error: " + err)); +wsClient.onWarning(warn -> System.err.println("Agent warning: " + warn)); + +try { + wsClient.connect().get(10, java.util.concurrent.TimeUnit.SECONDS); +} catch (Exception e) { + throw new RuntimeException("Failed to connect to voice agent", e); +} ``` ## Message injection / control @@ -109,7 +107,7 @@ wsClient.sendInjectAgentMessage(com.deepgram.resources.agent.v1.types.AgentV1Inj 2. **Send settings first.** The repo examples wait for `onWelcome(...)` and immediately call `sendSettings(...)`. 3. **Audio is binary `ByteString`.** Playback/output is your responsibility. 4. **`sendMedia(...)` is raw audio bytes.** Match whatever audio settings you configured. -5. **Use the provider wrapper/union types rather than raw JSON.** Constructors like `OpenAiThinkProvider.of(...)`, `AnthropicThinkProvider.of(...)`, `GoogleThinkProvider.of(...)` package the provider into the think/listen/speak union the SDK expects. The underlying payload is still an `Object` (so provider-field mistakes won't be caught at compile time), but the wrappers keep routing correct and ensure you pick the right variant of the sealed union. +5. **Use provider wrapper types** (`OpenAiThinkProvider.of(...)`, `AnthropicThinkProvider.of(...)`, `GoogleThinkProvider.of(...)`) rather than raw JSON. The underlying payload is `Object`, so provider-field mistakes are not caught at compile time. 6. **There is no persisted agent-configuration management client shown in this checkout.** This repo exposes live agent runtime plus think-model discovery. 7. **Closing is connection-level.** The examples call `disconnect()`; there is no separate close-message flow like Speak/Listen. @@ -122,10 +120,4 @@ wsClient.sendInjectAgentMessage(com.deepgram.resources.agent.v1.types.AgentV1Inj ## Central product skills -For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills: - -```bash -npx skills add deepgram/skills -``` - -This SDK ships language-idiomatic code skills; `deepgram/skills` ships cross-language product knowledge (see `api`, `docs`, `recipes`, `examples`, `starters`, `setup-mcp`). +For cross-language Deepgram product knowledge, install `npx skills add deepgram/skills`.