The first LLM for text-to-speech. While other TTS just “reads” words, Octave grasps their meaning. Create any AI voice with a descriptive prompt, guide its emotional delivery (angrier! more sarcasm!), and bring your stories to life with human-like expression.
Hey Product Hunt! I’m Alan Cowen, CEO and Chief Scientist at Hume AI.
We're launching Octave, the first of a new generation of text-to-speech models. Traditional TTS models focus on the mechanical process of turning letters into sounds. Octave isn't a traditional TTS model, but a voice-enabled LLM, trained on 1000x more language. As a result, it understands the cognitive and emotional aspects of human speech. It reads your script like a human actor, delivering realistic emotions, sarcasm, pace, word emphasis, and more.
And unlike any other other TTS system, it can take explicit instructions to generate any voice you describe and modify its emotional tone and speaking style.
Octave is made possible by Hume's research. We're leading the space in voice-enabled LLMs, and we run large-scale psychology studies to help fine-tune our models to generate the right voices at the right time, drawing on a decade of research at the intersection of emotion science and AI.
We’re launching both a platform for creators and an API for developers. We're also launching the Expressive TTS Arena (arena.hume.ai)—a new public benchmark for evaluating emotion-rich, long-form speech generation with instructions.
Ready to try it?
Try Octave: hume.ai
Join our Discord: https://link.hume.ai/discord
Follow our updates: x.com/hume_ai
I’ll be here all day to answer questions and discuss how this technology evolved from our emotion research. Thank you for checking out Octave!
@achume Excited to see how this transforms content creation and AI interactions!
Spiritory
What sets Octave apart from traditional TTS models, and how does its training on 1000x more language enhance its performance? Congratulations!
Lancepilot
Can you describe how Octave's ability to understand the cognitive and emotional aspects of human speech improves its text-to-speech output?