Doubao TTS — Raycast Extension
Select any text on macOS and read it aloud with Raycast, powered by Volcengine Doubao TTS V3 WebSocket streaming.
Why Doubao TTS?
Doubao TTS is a high-quality Chinese AI speech synthesis engine with natural voices, emotional expression, and broad Chinese/English voice coverage. This extension makes it practical to listen to papers, articles, notes, documentation, and everyday selected text directly from Raycast.
Features
- Quick Read: select text and read it aloud instantly without opening a view.
- Voice Selection: browse 160+ voices organized by category, including the official Doubao TTS 2.0 voice catalog.
- Select Quick Read Voice: choose and preview the voice used by Quick Read.
- Stop Reading: stop playback anytime, or trigger Quick Read again to toggle playback off.
- Smart Chunking: split long text by sentence and punctuation.
- Pipelined Playback: synthesize the next text chunk while the current chunk is playing.
- Model Switching: supports Doubao TTS 2.0, TTS 1.0, and voice clone resource IDs.
- Flexible Auth: uses the current Volcengine
X-Api-Key flow, with legacy App ID and Access Key fallback.
Screenshots

Installation
Prerequisites
- Raycast installed
- A Volcengine account with Doubao TTS enabled
Setup
- Install Doubao TTS from the Raycast Store.
- Open the extension preferences.
- Fill API Key from the current Volcengine Doubao Speech console, or keep using your existing App ID and Access Key.
- Bind a hotkey to Quick Read Selected Text for the fastest workflow.
Configuration
| Setting | Description | Required |
|---|
| API Key | Volcengine Doubao Speech API Key. Preferred for new users. | Optional |
| App ID | Legacy Volcengine TTS App ID, used only when API Key is empty. | Optional |
| Access Key | Legacy Volcengine TTS Access Key, used only when API Key is empty. | Optional |
| Model Version | TTS model/resource ID. Defaults to Doubao TTS 2.0. | Optional |
| Default Voice | Voice used by Quick Read when no override is selected. | Optional |
| Speech Rate | Playback speed from 0.5x to 2.0x. | Optional |
Usage
Quick Read
- Select text in any macOS app.
- Open Raycast and run Quick Read Selected Text.
- Trigger the command again to stop playback.
Select the Quick Read Voice
- Run Select Quick Read Voice.
- Search or browse voices compatible with the selected model.
- Press Enter to set the selected voice.
- Use Preview Voice to audition a voice with selected or clipboard text.
Read with a Specific Voice
- Select text.
- Run Read with Voice Selection.
- Pick a voice and press Enter to read.
Technical Details
- API: Volcengine Doubao TTS V3 WebSocket bidirectional streaming
- Auth:
X-Api-Key or legacy X-Api-App-Id + X-Api-Access-Key, plus X-Api-Resource-Id and per-connection X-Api-Connect-Id
- Response: binary V3 WebSocket frames with streamed MP3 audio payloads
- Audio: MP3, 24000 Hz
- Chunking: smart split by punctuation, up to 4096 UTF-8 bytes per chunk
- Playback: macOS built-in
afplay
- Stop Control: shared PID file at
$TMPDIR/doubao-tts.pid
References
Acknowledgements
License
MIT
中文文档: README.zh.md