Chatterbox TTS Hackernews Viewer

Chatterbox TTS

670 points by pinter69 11 June 2025 | 188 comments

Comments

Mizza 11 June 2025

Demos here: https://resemble-ai.github.io/chatterbox_demopage/ (not mine)

This is a good release if they're not too cherry picked!

I say this every time it comes up, and it's not as sexy to work on, but in my experiments voice AI is really held back by transcription, not TTS. Unless that's changed recently.

xnx 11 June 2025

You can run it for free here: https://huggingface.co/spaces/ResembleAI/Chatterbox

travisvn 12 June 2025

Chatterbox is fantastic.

I created an API wrapper that also makes installation easier (Dockerized as well) https://github.com/travisvn/chatterbox-tts-api/

Best voice cloning option available locally by far, in my experience.

teraflop 11 June 2025

> Every audio file generated by Chatterbox includes Resemble AI's Perth (Perceptual Threshold) Watermarker - imperceptible neural watermarks that survive MP3 compression, audio editing, and common manipulations while maintaining nearly 100% detection accuracy.

Am I misunderstanding, or can you trivially disable the watermark by simply commenting out the call to the apply_watermark function in tts.py? https://github.com/resemble-ai/chatterbox/blob/master/src/ch...

I thought the point of this sort of watermark was that it was embedded somehow in the model weights, so that it couldn't easily be separated out. If you're going to release an open-source model that adds a watermark as a separate post-processing step, then why bother with the watermark at all?

pryelluw 11 June 2025

Silly question, what’s the lowest spec hardware this will run ?

ineedasername 11 June 2025

The emotional exaggeration is interesting, though I don't think I've come across anything quite so versatile and easy to "sculpt" as Elevenlabs and it's ability to generate a voice on the basis of a description of how you want the voice to sound. SparkTTS allows some additional parameters, and it's project on GitHub has placeholders in its code that indicate the model might be refined for more fine grained emotional control. As it is, I've had some success with it and other models by trying to influence prosody and tonality with some heavy handed queues in the text, which can then be used with VC to get closer to desired results, but it's a much more cumbersome process than Eleven.

nmstoker 11 June 2025

I've found it excellent with really common accents but with other accents (that are pretty common too) it can easily get stuck picking a different accent. For instance several Scottish recordings ended up Australian, likewise a fairly mild Yorkshire accent

abraxas 11 June 2025

Are these things good enough to narrate a book convincingly or does the voice lose coherence after a few paragraphs being spoken?

iambateman 12 June 2025

Just a regular reminder to tell your friends and family to be extra skeptical about phone conversations.

It’s becoming much more likely that the friend who desperately needs a gift card to Walmart isn’t the friend at all. :(

audiala 12 June 2025

What is the current state of the art for open source multilingual TTS? I have found Kokoro to be great as English as well, but am still searching for a good solution for French, Japanese, German...

philipkiely 12 June 2025

Example implementation with sample inference code + voice cloning example:

https://github.com/basetenlabs/truss-examples/tree/main/chat...

Still working on streaming

tevon 12 June 2025

I just tested it out locally, really excellent quality, the server was easy to set up and well documented.

I'd love to get to real-time generation if that's in the pipeline? Would like to use it along with Home Assistant.

stevage 11 June 2025

Interesting demo. A few observations, having uploaded a snippet of my own voice, and testing with some of my own text:

- the output had some of the qualities of my voice, but wasn't super similar. (Then again, the fact it could even do this from such a tiny snippet was impressive)

- increasing "CFG/pace" (whatever CFG is) even a little bit often just breaks down into total gibberish

- it was very inconsistent whether it would come out with a kind of British accent or an American one. (My accent is Australian...)

- the emotional exaggeration was interesting, but it seemed to vary a lot exactly what kind of emotion would come out

j2kun 11 June 2025

They should put the meaning of "TTS" in the readme somewhere, probably near the top. Or their website.

lukeinator42 12 June 2025

Does anyone know of an open-source TTS like this that can also encode speech to do voice conversion alongside TTS? i.e. a model that would take speech as input and convert it to one of the pretrained TTS voices.

causality0 11 June 2025

Anyone know how this compares to Kokoro? I've found Kokoro very useful for generating audiobook but it almost always pronounces words with paired vowels incorrectly. Daisy becomes die-zee, leave becomes lay-ve, etc.

pzo 12 June 2025

It's only for English sadly

palmfacehn 12 June 2025

Has anyone developed a way to annotate the input to provide emotional context?

In the past I've used different samples from the same speaker for this.

racecar789 12 June 2025

I’d sign up for a service that calls a pharmacy on my behalf to refill prescriptions. In certain situations, pharmacies will not list prescriptions on their websites, even though they have the prescriptions on file, which forces the customer to call by phone — a frustrating process.

I do feel bad for pharmacists, their job is challenging in so many ways.

Shopper0552 11 June 2025

Anyone know a good free open source speech to text? Looking for something for my laptop which is running Fedora KDE plasma.

MrThoughtful 12 June 2025

How do you set the voice?

On the Huggingface demo, there seems to be no option for it.

It has a female voice. Any way to set it to a male voice?

DHolzer 13 June 2025

I love chatterbox, it's my favourite. While the generation speed is quick, i wonder what performance optimization i could try on my 3090 to improve throughput. It's not quite enough for realtime.

ipsum2 12 June 2025

The voice cloning is okay, not as good as Eleven Labs. There's a Rick (from Rick and Morty) voice example, and the generated audio sounds muffled and low quality. I appreciate that its open source though.

kiririn7 11 June 2025

definitely worse than the new elevenlabs model(v3). that model is really good

andy_xor_andrew 11 June 2025

in my experience, TTS has been a "pick two" situation:

- fast / cheap to run

- can clone voices

- sounds super realistic

from what I can tell, Chatterbox is the first that apparently lets you pick 3! (have not tried it myself yet, this is just what I can deduce)

SV_BubbleTime 12 June 2025

Fun stuff... I don't know how or why, but connecting bluetooth while on this site, made all of the audio clips play at once (Firefox, Linux). Not the best listening experience.

bachittle 12 June 2025

I always have issues with TTS models that do not allow you to send large chunks of text. Seems this one does not resolve this either. Always has a limit of like 2-3 sentences.

andymcsherry 12 June 2025

Here's an open-source serving implementation: https://lightning.ai/bhimrajyadav/studios/build-a-production...

Also, a deployable model: https://lightning.ai/bhimrajyadav/ai-hub/temp_01jwr0adpqf055...

3ds 12 June 2025

There are only english voices, even in the paid version. Using them in other languages results in an accent.

ojw0816 13 June 2025

Looks good! What is the difference between the open-source version and the priced version?

az226 11 June 2025

How does one train a TTS model with an LLM backbone? Practically, how does this work?

monksy 13 June 2025

How would I install this alongside librechat or ollama using docker?

init0 12 June 2025

Chatterbox CLI https://pypi.org/project/voice-forge/

decide1000 11 June 2025

How does it perform on multi-lingual tasks?

benob 12 June 2025

Watermarking is easily disabled in the code. I a wondering when they will release model weights with embedded watermarking.

pradeepodela 12 June 2025

What is the latency?

andrewstuart 12 June 2025

There’s been surprisingly little advancement in TTS after a rapid leap forward three years ago or so.

There’s eleven labs which is quite good but not incredible and very expensive.

Everything else ……. all the big AI companies …. have TTS systems that are kinda meh.

Everything else in AI has advanced in leaps and bounds, TTS remains deep in the uncanny valley.

tuananh 12 June 2025

for this, what does it take to support another language?

ash1224 13 June 2025

wow! 200mms very good!

internet_points 12 June 2025

> Supported Lanugage

> Currenlty only English.

meh

_andrei_ 12 June 2025

very cherry picked

hsavit1 12 June 2025

another TTS that is only supporting English. This really irritates me

gardnr 11 June 2025

Previously, on Hacker News:

https://news.ycombinator.com/item?id=44120204

https://news.ycombinator.com/item?id=44144155

https://news.ycombinator.com/item?id=44195105

https://news.ycombinator.com/item?id=44230867

https://news.ycombinator.com/item?id=44172134

https://news.ycombinator.com/item?id=44221910

https://news.ycombinator.com/item?id=44145564

andyferris 12 June 2025

It took me ages to understand what TTS means!