State-of-the-art Chinese & English TTS — directly from your keyboard, 100% free during the public beta.
Read any selected or clipboard text aloud, design a brand-new voice from a single sentence, or clone any voice from a 10-second sample — without leaving Raycast.
Xiaomi's MiMo platform launched MiMo-V2.5-TTS Series + V2.5-ASR as a single full-stack speech suite, and the entire TTS family is currently limited-time free on their billing page:
| Model | What it does | Beta price |
|---|---|---|
mimo-v2.5-tts | Preset-voice synthesis with full style control | Free |
mimo-v2.5-tts-voicedesign | Generate a new voice from one sentence of description | Free |
mimo-v2.5-tts-voiceclone | Clone any voice from a small mp3/wav sample | Free |
mimo-v2-tts | Legacy voices | Free |
(Source: platform.xiaomimimo.com/docs/zh-CN/price/pay-as-you-go, as of 2026-05-27)
If you've been priced out of OpenAI Speech, ElevenLabs, or Azure Neural TTS for long-form reading or scripted character work, MiMo is the cheapest path to top-tier TTS today.
On 27 May 2026, Xiaomi shipped a permanent price cut across the MiMo-V2.5 LLM line — up to 99% off, with Token Plan quotas multiplied by 5–8× at the same price, plus a full credit reset for existing subscribers. (Announcement) Even after the TTS free beta ends, the underlying inference stack and billing engine just got a generational cost reduction. SGLang HiCache with SWA dropped KV-cache traffic to ~1/7 and cacheable tokens up ~5× — so unit cost is structurally lower, not just promotional.
MiMo-V2.5-TTS is positioned by Xiaomi as a model that "doesn't just read — it performs." All three creation modes share the same instruction-following surface:
(轻笑), (深呼吸), (粤语), (唱歌), (suppressed anger) mid-text for surgical control. Mix Chinese and English tags. Singing mode included.Most "Chinese TTS" services are awful at English; most "English TTS" services give you mechanical Mandarin. MiMo trains both at native quality:
Type a 1–4 sentence description and get a brand new voice synthesized on the spot:
"A weathered Northern Chinese grandfather, slow and steady pace, slightly raspy and time-worn, like he's telling old stories."
Optional optimize_text_preview flag lets the model auto-rewrite your sample text to match the persona — you can submit with an empty text body and MiMo writes the script for you. Other commercial TTS services charge you per generated voice ID; here it's one HTTP call.
Drop in any mp3 or wav (≤10 MB after base64). MiMo replicates the timbre and the cloned voice keeps the full control surface: director prompts, inline tags, dialect switching, singing mode. No upload step, no separate voice-management dashboard — each clone is a one-shot inline call.
afplay PID.This extension is published on the Raycast Store as MiMo TTS. Search for it in Raycast → Store, then:
tp-...) from https://platform.xiaomimimo.com/. (Pay-as-you-go sk- keys go to a different endpoint and are not used by this extension.)| Command | Mode | What it does |
|---|---|---|
| Quick Read | no-view | Read selected or clipboard text with your default voice. Trigger again to stop. |
| Read with Voice | view | Browse voices and read selection / clipboard with the chosen one. |
| Set Quick Read Voice | view | Pick and preview the voice that Quick Read uses. |
| TTS Studio | view | Long-form composer with voice, speed, opening style, emotion / rhythm / vocal-texture / expression tags, performance presets, and a free-form director prompt. |
| Design Voice | view | Generate a brand-new voice from a one-sentence description (MiMo-V2.5-TTS-VoiceDesign). |
| Clone Voice | view | Replicate any voice from an mp3/wav file (MiMo-V2.5-TTS-VoiceClone). |
| Setup Voice Defaults | view | Persist a per-session override for model / voice / rate / style prompt / Token Plan base URL. |
| Stop Reading | no-view | Stop the current playback immediately. |
| Speed up Reading / Slow Down Reading | no-view | Adjust playback speed by ±0.25× for the next chunk; persists globally. |
| Reading Status | menu-bar | Now-playing status with playback / speed controls. |
The TTS Studio command exposes everything MiMo-V2.5-TTS supports:
唱歌 overrides others to enter singing mode).| Model ID | Used by | What it does |
|---|---|---|
mimo-v2.5-tts | Quick Read · Read with Voice · Set Quick Read Voice · TTS Studio | Preset voices with style controls. |
mimo-v2.5-tts-voicedesign | Design Voice | Generate a voice from a 1–4 sentence description. Optional optimize_text_preview. |
mimo-v2.5-tts-voiceclone | Clone Voice | Replicate a voice from a base64-encoded mp3/wav (≤10 MB). |
mimo-v2-tts | optional, via Setup Voice Defaults | Legacy V2 voices. |
Quick Read uses one keystroke to start and stop. When something is already playing, running Quick Read again terminates the afplay process, clears the now-playing state, and shows a stop HUD. The menu-bar status, dedicated Stop Reading command, and cmd+. from any view command all trigger the same stop path.
Cross-extension PID isolation: this extension uses raycast-mimo-tts.pid / .stop in tmpdir so it doesn't fight my multi-provider AI Voice Studio extension over the same afplay process.
The extension reads selected text and the clipboard only when you trigger a command. Synthesized audio plays via the system afplay binary. No data is persisted beyond per-session Raycast LocalStorage (voice override, speech-rate override, now-playing state). The API key never leaves Raycast Preferences. Voice-clone samples are sent inline to MiMo's endpoint for that single request — no upload service is used and nothing is cached server-side per their docs.
Originally part of AI Voice Studio, a multi-provider Raycast TTS extension covering Qwen-TTS, MiniMax, MiMo, and OpenAI. This standalone version extracts the MiMo provider so users who only want MiMo TTS get a focused, smaller surface — no Qwen / MiniMax / OpenAI code paths.
Open source on GitHub: https://github.com/xwzhangSZU/Raycast-Mimo-TTS.