Skip to content

Add OpenAI TTS provider#3

Merged
tkhduracell merged 1 commit intomasterfrom
clip-zones-fix
Mar 5, 2026
Merged

Add OpenAI TTS provider#3
tkhduracell merged 1 commit intomasterfrom
clip-zones-fix

Conversation

@tkhduracell
Copy link
Copy Markdown
Owner

Incorporates PR jishi#902 from jishi/node-sonos-http-api.

Adds OpenAI as a text-to-speech provider with support for language detection and voice customization. The provider caches synthesized audio files to disk and requires an openaiKey setting in settings.json. Includes comprehensive documentation for setup and usage.

- New TTS provider using OpenAI's API for text-to-speech synthesis
- Caches synthesized audio files to disk
- Requires openaiKey setting in settings.json
- Supports language detection and voice customization
Copilot AI review requested due to automatic review settings March 5, 2026 15:11
@tkhduracell tkhduracell merged commit 987c2e1 into master Mar 5, 2026
4 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenAI as a new text-to-speech (TTS) provider, alongside the existing Google, macOS say, ElevenLabs, AWS Polly, VoiceRSS, and Microsoft providers. The new provider calls the OpenAI /v1/audio/speech API, caches synthesized MP3 files to disk, and requires an openaiKey in settings.json.

Changes:

  • New lib/tts-providers/openai.js implementing the OpenAI TTS provider with disk-caching
  • Updated README.md with setup instructions and usage examples for the OpenAI provider

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
lib/tts-providers/openai.js Core provider implementation: hashes phrases for cache keys, downloads from OpenAI API, resolves file duration
README.md Documents the OpenAI provider, its required API key, and example usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

'use strict';
const crypto = require('crypto');
const fs = require('fs');
const http = require('http');
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The http module is imported on this line but is never used anywhere in openai.js. Only https is used. This unused import should be removed to avoid confusion.

Suggested change
const http = require('http');

Copilot uses AI. Check for mistakes.
headers: {
'Authorization': `Bearer ${settings.openaiKey}`,
'Content-Type': 'application/json',
'Content-Length': postData.length
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content-Length is set using postData.length, which returns the number of JavaScript characters (UTF-16 code units), not the byte length of the UTF-8 encoded string. When phrase contains multi-byte characters (e.g., accented characters, CJK characters, emoji), postData.length will under-count the actual byte size sent on the wire. This should use Buffer.byteLength(postData) to get the correct byte count.

Suggested change
'Content-Length': postData.length
'Content-Length': Buffer.byteLength(postData)

Copilot uses AI. Check for mistakes.
const settings = require('../../settings');
const logger = require('sonos-discovery/lib/helpers/logger');

function openai(phrase, language, voice = 'alloy', model = 'tts-1') {
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The voice and model parameters have default values ('alloy' and 'tts-1'), but there is no way for users to override them. The provider is called by try-download-tts.js with only (phrase, language), so these parameters will always use their defaults. Other providers read override values from settings (e.g., settings.aws.name for AWS Polly). Consider reading voice and model overrides from settings (e.g., settings.openaiVoice, settings.openaiModel) so users can customize them without code changes, and document this in the README.

Copilot uses AI. Check for mistakes.
resolve(expectedUri);
});
} else {
reject(new Error(`Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}`));
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the response status code is not in the 2xx range, the response body is never consumed (neither read nor destroyed). This can cause the underlying TCP socket to remain open and stall, potentially preventing further requests. The response stream should be consumed and drained (e.g., by calling res.resume()) before rejecting the promise, and ideally the error body from OpenAI should be included in the rejection error for better diagnostics.

Suggested change
reject(new Error(`Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}`));
const chunks = [];
res.on('data', (chunk) => {
chunks.push(chunk);
});
res.on('end', () => {
const body = chunks.length ? Buffer.concat(chunks).toString('utf8') : '';
const message = `Download from OpenAI TTS failed with status ${res.statusCode}, ${res.statusMessage}` +
(body ? `, body: ${body}` : '');
reject(new Error(message));
});
res.on('error', (err) => {
// Ensure the stream is not left hanging on error
res.resume();
reject(err);
});

Copilot uses AI. Check for mistakes.

// Construct a filesystem neutral filename
const phraseHash = crypto.createHash('sha1').update(phrase).digest('hex');
const filename = `openai-${phraseHash}-${language}.mp3`;
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache filename includes language (e.g., openai-${phraseHash}-${language}.mp3), but as documented in the README, the language code has no effect on the OpenAI TTS output — OpenAI determines the language from the text content itself. Including language in the cache key means the same phrase will be cached multiple times when called with different language codes (or without a language code vs. with 'en'), wasting disk space and defeating the cache. The filename should only include the phrase hash, voice, and model, as those are the actual parameters that affect the output.

Suggested change
const filename = `openai-${phraseHash}-${language}.mp3`;
const filename = `openai-${phraseHash}-${voice}-${model}.mp3`;

Copilot uses AI. Check for mistakes.
const settings = require('../../settings');
const logger = require('sonos-discovery/lib/helpers/logger');

function openai(phrase, language, voice = 'alloy', model = 'tts-1') {
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The openai function is missing a guard at the top to check whether settings.openaiKey is configured. Every other TTS provider (e.g., aws-polly.js, voicerss.js, elevenlabs.js) returns Promise.resolve() immediately if its required settings key is absent, allowing the provider chain in try-download-tts.js to fall through to the next provider. Without this guard, the openai function will always attempt to call the OpenAI API, regardless of whether an API key has been configured, resulting in failed requests with an authorization error for every TTS invocation when no key is set.

Suggested change
function openai(phrase, language, voice = 'alloy', model = 'tts-1') {
function openai(phrase, language, voice = 'alloy', model = 'tts-1') {
if (!settings.openaiKey) {
logger.warn('OpenAI TTS disabled: settings.openaiKey is not configured');
return Promise.resolve();
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants