Conversation
- New TTS provider using OpenAI's API for text-to-speech synthesis - Caches synthesized audio files to disk - Requires openaiKey setting in settings.json - Supports language detection and voice customization
There was a problem hiding this comment.
Pull request overview
This PR adds OpenAI as a new text-to-speech (TTS) provider, alongside the existing Google, macOS say, ElevenLabs, AWS Polly, VoiceRSS, and Microsoft providers. The new provider calls the OpenAI /v1/audio/speech API, caches synthesized MP3 files to disk, and requires an openaiKey in settings.json.
Changes:
- New
lib/tts-providers/openai.jsimplementing the OpenAI TTS provider with disk-caching - Updated
README.mdwith setup instructions and usage examples for the OpenAI provider
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
lib/tts-providers/openai.js |
Core provider implementation: hashes phrases for cache keys, downloads from OpenAI API, resolves file duration |
README.md |
Documents the OpenAI provider, its required API key, and example usage |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'use strict'; | ||
| const crypto = require('crypto'); | ||
| const fs = require('fs'); | ||
| const http = require('http'); |
There was a problem hiding this comment.
The http module is imported on this line but is never used anywhere in openai.js. Only https is used. This unused import should be removed to avoid confusion.
| const http = require('http'); |
| headers: { | ||
| 'Authorization': `Bearer ${settings.openaiKey}`, | ||
| 'Content-Type': 'application/json', | ||
| 'Content-Length': postData.length |
There was a problem hiding this comment.
Content-Length is set using postData.length, which returns the number of JavaScript characters (UTF-16 code units), not the byte length of the UTF-8 encoded string. When phrase contains multi-byte characters (e.g., accented characters, CJK characters, emoji), postData.length will under-count the actual byte size sent on the wire. This should use Buffer.byteLength(postData) to get the correct byte count.
| 'Content-Length': postData.length | |
| 'Content-Length': Buffer.byteLength(postData) |
| const settings = require('../../settings'); | ||
| const logger = require('sonos-discovery/lib/helpers/logger'); | ||
|
|
||
| function openai(phrase, language, voice = 'alloy', model = 'tts-1') { |
There was a problem hiding this comment.
The voice and model parameters have default values ('alloy' and 'tts-1'), but there is no way for users to override them. The provider is called by try-download-tts.js with only (phrase, language), so these parameters will always use their defaults. Other providers read override values from settings (e.g., settings.aws.name for AWS Polly). Consider reading voice and model overrides from settings (e.g., settings.openaiVoice, settings.openaiModel) so users can customize them without code changes, and document this in the README.
| resolve(expectedUri); | ||
| }); | ||
| } else { | ||
| reject(new Error(`Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}`)); |
There was a problem hiding this comment.
When the response status code is not in the 2xx range, the response body is never consumed (neither read nor destroyed). This can cause the underlying TCP socket to remain open and stall, potentially preventing further requests. The response stream should be consumed and drained (e.g., by calling res.resume()) before rejecting the promise, and ideally the error body from OpenAI should be included in the rejection error for better diagnostics.
| reject(new Error(`Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}`)); | |
| const chunks = []; | |
| res.on('data', (chunk) => { | |
| chunks.push(chunk); | |
| }); | |
| res.on('end', () => { | |
| const body = chunks.length ? Buffer.concat(chunks).toString('utf8') : ''; | |
| const message = `Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}` + | |
| (body ? `, body: ${body}` : ''); | |
| reject(new Error(message)); | |
| }); | |
| res.on('error', (err) => { | |
| // Ensure the stream is not left hanging on error | |
| res.resume(); | |
| reject(err); | |
| }); |
|
|
||
| // Construct a filesystem neutral filename | ||
| const phraseHash = crypto.createHash('sha1').update(phrase).digest('hex'); | ||
| const filename = `openai-${phraseHash}-${language}.mp3`; |
There was a problem hiding this comment.
The cache filename includes language (e.g., openai-${phraseHash}-${language}.mp3), but as documented in the README, the language code has no effect on the OpenAI TTS output — OpenAI determines the language from the text content itself. Including language in the cache key means the same phrase will be cached multiple times when called with different language codes (or without a language code vs. with 'en'), wasting disk space and defeating the cache. The filename should only include the phrase hash, voice, and model, as those are the actual parameters that affect the output.
| const filename = `openai-${phraseHash}-${language}.mp3`; | |
| const filename = `openai-${phraseHash}-${voice}-${model}.mp3`; |
| const settings = require('../../settings'); | ||
| const logger = require('sonos-discovery/lib/helpers/logger'); | ||
|
|
||
| function openai(phrase, language, voice = 'alloy', model = 'tts-1') { |
There was a problem hiding this comment.
The openai function is missing a guard at the top to check whether settings.openaiKey is configured. Every other TTS provider (e.g., aws-polly.js, voicerss.js, elevenlabs.js) returns Promise.resolve() immediately if its required settings key is absent, allowing the provider chain in try-download-tts.js to fall through to the next provider. Without this guard, the openai function will always attempt to call the OpenAI API, regardless of whether an API key has been configured, resulting in failed requests with an authorization error for every TTS invocation when no key is set.
| function openai(phrase, language, voice = 'alloy', model = 'tts-1') { | |
| function openai(phrase, language, voice = 'alloy', model = 'tts-1') { | |
| if (!settings.openaiKey) { | |
| logger.warn('OpenAI TTS disabled: settings.openaiKey is not configured'); | |
| return Promise.resolve(); | |
| } |
Incorporates PR jishi#902 from jishi/node-sonos-http-api.
Adds OpenAI as a text-to-speech provider with support for language detection and voice customization. The provider caches synthesized audio files to disk and requires an openaiKey setting in settings.json. Includes comprehensive documentation for setup and usage.