For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://modelgates.ai/docs/_mcp/server.

Text-to-Speech

ModelGates supports text-to-speech (TTS) via a dedicated /api/v1/audio/speech endpoint that is compatible with the OpenAI Audio Speech API. Send text and receive a raw audio byte stream in your chosen format.

Model Discovery

You can find TTS models in several ways:

Via the API

Use the output_modalities query parameter on the Models API to discover TTS models:

bash

# List only TTS modelscurl "https://modelgates.ai/api/v1/models?output_modalities=speech"

On the Models Page

Visit the Models page and filter by output modalities to find models capable of speech synthesis. Look for models that list "speech" in their output modalities.

API Usage

Send a POST request to /api/v1/audio/speech with the text you want to synthesize. The response is a raw audio byte stream — not JSON — so you can pipe it directly to a file or audio player.

Basic Example

typescript

import { ModelGates } from '@modelgates/sdk';import fs from 'fs'; const modelgates = new ModelGates({  apiKey: '{}',}); const stream = await modelgates.tts.createSpeech({  model: '{}',  input: 'Hello! This is a text-to-speech test.',  voice: 'alloy',  responseFormat: 'mp3',}); // Collect the audio stream and save to a fileconst reader = stream.getReader();const chunks: Uint8Array[] = [];while (true) {  const { done, value } = await reader.read();  if (done) break;  chunks.push(value);}const totalLength = chunks.reduce((sum, c) => sum + c.length, 0);const buffer = new Uint8Array(totalLength);let offset = 0;for (const chunk of chunks) {  buffer.set(chunk, offset);  offset += chunk.length;}await fs.promises.writeFile('output.mp3', buffer);console.log('Audio saved to output.mp3');

python

from openai import OpenAI client = OpenAI(  base_url="https://modelgates.ai/api/v1",  api_key="{}",) with client.audio.speech.with_streaming_response.create(  model="{}",  input="Hello! This is a text-to-speech test.",  voice="alloy",  response_format="mp3") as response:  response.stream_to_file("output.mp3")

python

import requests response = requests.post(  url="https://modelgates.ai/api/v1/audio/speech",  headers={    "Authorization": f"Bearer {API_KEY_REF}",    "Content-Type": "application/json"  },  json={    "model": "{{MODEL}}",    "input": "Hello! This is a text-to-speech test.",    "voice": "alloy",    "response_format": "mp3"  })response.raise_for_status() with open("output.mp3", "wb") as f:  f.write(response.content) generation_id = response.headers.get("X-Generation-Id")print(f"Audio saved. Generation ID: ")

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	The TTS model to use (e.g., `openai/gpt-4o-mini-tts-2025-12-15`, `mistralai/voxtral-mini-tts-2603`)
`input`	string	Yes	The text to synthesize into speech
`voice`	string	Yes	Voice identifier. Available voices vary by model — check each model's page on the Models page for supported voices
`response_format`	string	No	Audio output format: `mp3` or `pcm`. Defaults to `pcm`
`speed`	number	No	Playback speed multiplier. Only used by models that support it (e.g., OpenAI TTS). Ignored by other providers. Defaults to `1.0`
`provider`	object	No	Provider-specific passthrough configuration

Provider-Specific Options

You can pass provider-specific options using the provider parameter. Options are keyed by provider slug, and only the options for the matched provider are forwarded:

json

{  "model": "openai/gpt-4o-mini-tts-2025-12-15",  "input": "Hello world",  "voice": "alloy",  "provider": {    "options": {      "openai": {        "instructions": "Speak in a warm, friendly tone."      }    }  }}

Response Format

The TTS endpoint returns a raw audio byte stream, not JSON. The response includes the following headers:

Header	Description
`Content-Type`	The MIME type of the audio. `audio/mpeg` for `mp3` format, `audio/pcm` for `pcm` format
`X-Generation-Id`	The unique generation ID for the request, useful for tracking and debugging

Output Formats

Format	Content-Type	Description
`mp3`	`audio/mpeg`	Compressed audio, smaller file size. Good for storage and playback
`pcm`	`audio/pcm`	Uncompressed raw audio. Lower latency, suitable for real-time streaming pipelines

Pricing

TTS models are priced per character of input text. Pricing varies by model and provider. You can check the per-character cost for each model on the Models page or via the Models API.

OpenAI SDK Compatibility

The TTS endpoint is fully compatible with the OpenAI SDK. You can use the OpenAI client libraries by pointing them at ModelGates's base URL:

python

from openai import OpenAI client = OpenAI(  base_url="https://modelgates.ai/api/v1",  api_key="{}",) # Non-streaming: get the full audio responseresponse = client.audio.speech.create(  model="openai/gpt-4o-mini-tts-2025-12-15",  input="The quick brown fox jumps over the lazy dog.",  voice="nova",  response_format="mp3")response.write_to_file("output.mp3") # Streaming: process audio chunks as they arrivewith client.audio.speech.with_streaming_response.create(  model="openai/gpt-4o-mini-tts-2025-12-15",  input="The quick brown fox jumps over the lazy dog.",  voice="nova",  response_format="mp3") as response:  response.stream_to_file("output.mp3")

Best Practices

Choose the right format: Use mp3 for storage and general playback. Use pcm for real-time streaming pipelines where latency matters
Voice selection: Different providers offer different voices. Check the model's documentation or experiment with available voices to find the best fit for your use case
Input length: For very long texts, consider splitting the input into smaller segments and concatenating the audio output. This can improve reliability and reduce latency for the first audio chunk
Speed parameter: The speed parameter is only supported by certain providers (e.g., OpenAI). It is silently ignored by providers that don't support it

Troubleshooting

Empty or corrupted audio file?

Verify the response_format matches how you're saving the file (e.g., don't save pcm output with a .mp3 extension)
Check the response status code — non-200 responses return JSON error bodies, not audio

Model not found?

Use the Models page to find available TTS models
Verify the model slug is correct (e.g., openai/gpt-4o-mini-tts-2025-12-15, not gpt-4o-mini-tts)

Voice not available?

Available voices vary by provider. Check the provider's documentation for supported voice identifiers
Each model has its own set of voices — check the model's page on the Models page for the full list