Free On-Device TTS With 31-Language Support – Supertonic v3

An on-device TTS that outperforms ElevenLabs and OpenAI on financial text and phone numbers. Free, open-source, 31 languages.

Supertonic is a free, open-source, on-device text-to-speech model from Supertone Inc. that runs entirely on local hardware through ONNX Runtime.

The latest Supertonic v3 supports 31 languages, public ONNX assets, Python installation, browser inference, mobile examples, and native runtime examples (Python, Node.js, browser, Java, C++, C#, Go, Swift, iOS, Rust, and Flutter).

It’s ideal for developers who need private, low-latency TTS for desktop tools, mobile apps, edge devices, and offline reading workflows.

Features

  • Runs entirely on-device with no network connection required after the initial model download.
  • Supports 31 languages as of Supertonic 3.
  • Ships ONNX Runtime assets at approximately 99 million parameters.
  • Provides WebGPU and WASM support for browser-side inference.
  • Includes 10+ preset voices: Alex, James, Robert, Sam, Daniel, Sarah, Lily, Jessica, Olivia, and Emily.
  • Handles financial expressions, phone numbers, and technical unit abbreviations.
  • Supports expression tags, including <laugh>, <breath>, and <sigh>.
  • Outputs 16-bit WAV audio files.
  • Supports batch inference.
  • Runs on Raspberry Pi and e-reader hardware in airplane mode.
  • Includes a Voice Builder tool for creating a custom edge-native TTS voice from personal recordings.
  • Comes with a Chrome extension that reads webpages through on-device TTS.

Use Cases

  • Build a private text-to-speech feature inside a local desktop app.
  • Add offline narration to an e-reader, note app, or accessibility tool.
  • Generate multilingual speech from text inside a Python workflow.
  • Test browser-based TTS with WebGPU or WASM inference.
  • Add low-latency speech output to Raspberry Pi or edge-device projects.

Example Results and Deployment Scenarios

Supertonic 3 adds 31-language support, improved reading accuracy, fewer repeat and skip failures, and v2-compatible public ONNX assets.

Flat white and cafe latte are both espresso-based drinks with milk. However, they differ clearly in the amount and texture of the milk, as well as in overall flavor balance. A flat white is designed to highlight the espresso. It uses a very thin layer of finely textured microfoam, creating a smooth and almost flat surface. As a result, the coffee’s rich flavor and aroma remain more pronounced. The drink is also typically served in a smaller cup with less milk than a latte. In contrast, a cafe latte contains a higher proportion of steamed milk. It also has a thicker foam layer, which gives it a creamier and milder character. This softens the bitterness and acidity of the espresso. Because of this, it is an approachable and widely enjoyed milk-based coffee.
Example AreaWhat It Shows
Reading accuracySupertonic 3 stays within a competitive WER/CER range against larger open TTS systems across measured languages.
Supertonic 2 to Supertonic 3Supertonic 3 expands language support from 5 languages to 31 languages.
Repeat and skip failuresSupertonic 3 reduces repeat and skip failures compared with Supertonic 2.
Runtime footprintSupertonic 3 targets fast CPU inference and lower memory use than larger GPU-based baselines.
Model sizeSupertonic 3 uses about 99M parameters across the public ONNX assets.
Raspberry Pi demoSupertonic runs on Raspberry Pi for on-device real-time speech synthesis.
E-reader demoSupertonic runs on an Onyx Boox Go 6 e-reader in airplane mode with zero network dependency.
Chrome extension demoTLDRL turns webpages into audio through local text-to-speech.

Reading Accuracy Results

Supertonic 3 handles three text normalization categories that cause failures in several major commercial TTS systems. The comparison below uses real text inputs with no phonetic pre-processing on any system.

CategoryInput ChallengeSupertonic 3ElevenLabs Flash v2.5OpenAI TTS-1Gemini 2.5 Flash TTSMicrosoft
Financial Expression“$5.2M” and “$450K” with decimal magnitude abbreviations✅ Pass❌ Fail❌ Fail❌ Fail❌ Fail
Phone Number“(212) 555-0142 ext. 402” with area code and extension✅ Pass❌ Fail❌ Fail❌ Fail❌ Fail
Technical Unit“2.3h” and “30kph” with decimal abbreviated units✅ Pass❌ Fail❌ Fail❌ Fail❌ Fail

The financial expression “$5.2M” must read as “five point two million dollars,” and “$450K” as “four hundred fifty thousand dollars.” All four competing systems failed this. The technical unit “2.3h” must read as “two point three hours” and “30kph” as “thirty kilometers per hour.” All four competitors also failed this category.

Supertonic 3 keeps a competitive word error rate and character error rate range against much larger open TTS systems such as VoxCPM2 across its supported languages. It runs on CPU with substantially less memory.

How to Use It

Install the Python package

pip install supertonic

The SDK downloads ONNX model assets from Hugging Face automatically on the first run.

Generate speech with Python

from supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")
text = "A gentle breeze moved through the open window while everyone listened to the story."
wav, duration = tts.synthesize(text, voice_style=style, lang="en")
tts.save_audio(wav, "output.wav")
print(f"Generated {duration:.2f}s of audio")

This outputs a 16-bit WAV file. The lang parameter accepts a two-letter language code such as "en", "fr", "ja", or "ar".

Try the live demo

The live demo on Hugging Face runs entirely in a browser. Select a speaker, choose a language, enter text, set quality steps (default: 8) and speech speed (default: 1.00x), then click Generate Speech. No installation is required.

supertonic-3 live demo

Install from GitHub for other runtimes

Install Git LFS. macOS users can run:

brew install git-lfs && git lfs install

Clone the repository and model assets:

git clone https://github.com/supertone-inc/supertonic.git
cd supertonic
git clone https://huggingface.co/Supertone/supertonic-3 assets

Run the Python ONNX example

cd py
uv sync
uv run example_onnx.py

This generates outputs/output.wav using the default preset voice.

Run the Node.js example

cd nodejs
npm install
npm start

Run the browser example

cd web
npm install
npm run dev

Run the Java example

Java requires a JDK. macOS users can install one with brew install openjdk@17.

cd java
mvn clean install
mvn exec:java

Run the C# example

C# requires .NET 9 or newer.

cd csharp
dotnet restore
dotnet run

Run the Go example

Go requires the ONNX Runtime C library. macOS users can install it with brew install onnxruntime.

cd go
go mod download
go run example_onnx.go helper.go

Run the Rust example

cd rust
cargo build --release
./target/release/example_onnx

Run the iOS example

cd ios/ExampleiOSApp
xcodegen generate
open ExampleiOSApp.xcodeproj

In Xcode, go to Targets → ExampleiOSApp → Signing, select your Team, choose your iPhone as the run destination, then build.

Install the Chrome extension

The TLDRL extension is available on the Chrome Web Store. It converts any webpage to audio in under one second using on-device inference.

Use Voice Builder

The Voice Builder lets you convert your own recorded voice into a deployable, edge-native TTS voice with permanent ownership.

Supported Runtimes

RuntimePathNotes
Pythonpy/ONNX Runtime inference; pip install supertonic
Node.jsnodejs/Server-side JavaScript
Browserweb/WebGPU/WASM inference
Javajava/JDK required; JRE alone is not sufficient
C++cpp/High-performance native inference
C#csharp/.NET 9 or newer
Gogo/Requires ONNX Runtime C library
Swiftswift/macOS applications
iOSios/Native iOS via Xcode
Rustrust/Memory-safe systems
Flutterflutter/Cross-platform with macOS support

Model Specifications

PropertyValue
Parameter count~99M (public ONNX assets)
RuntimeONNX Runtime
GPU requirementNone for fixed-voice open-weight setting
Code licenseMIT
Model licenseOpenRAIL-M

Supported Languages (31)

CodeLanguageCodeLanguageCodeLanguage
enEnglishkoKoreanjaJapanese
arArabicbgBulgariancsCzech
daDanishdeGermanelGreek
esSpanishetEstonianfiFinnish
frFrenchhiHindihrCroatian
huHungarianidIndonesianitItalian
ltLithuanianlvLatviannlDutch
plPolishptPortugueseroRomanian
ruRussianskSlovakslSlovenian
svSwedishtrTurkishukUkrainian
viVietnamese

Alternatives

Pros

  • No signup or API key required.
  • Free and open-source.
  • Runs on CPU with no GPU required.
  • 31 languages in one model.
  • Eleven runtime environments supported.
  • Works on edge hardware like Raspberry Pi.
  • Browser inference via WebGPU/WASM.
  • Outperforms major paid TTS APIs on financial and technical text.

Cons

  • Setup requires Git LFS and a Hugging Face model download.
  • Model is licensed under OpenRAIL-M, not a fully permissive license.
  • Audio output is 16-bit WAV only.
  • No GUI desktop app for non-developers.

FAQs

Q: Does Supertonic require an internet connection?
A: Only for the initial model download from Hugging Face. All inference runs on-device after that.

Q: Can Supertonic run in a browser?
A: Yes. Supertonic supports WebGPU and WASM inference for browser-side TTS with no server required. The TLDRL Chrome extension uses this path to read webpages aloud in under one second.

Q: What hardware does Supertonic run on?
A: Supertonic runs on desktop computers, laptops, Raspberry Pi, e-readers, and mobile devices. The open-weight fixed-voice setting runs on CPU with no GPU required.

Q: Does Supertonic send text to the cloud?
A: Supertonic runs text-to-speech inference locally through ONNX Runtime, so local deployments can process text on the device.

Q: How does Supertonic compare to ElevenLabs or OpenAI TTS in voice quality?
A: The preset voices are clean and stable but lack the prosody range of large commercial models. Supertonic’s main advantage is local execution, privacy, and correct reading of tricky text formats.

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest & top AI tools sent directly to your email.

Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!