Mati Staniszewski

I'm the ElevenLabs CEO - what do you want to do with voice AI but can't? (AMA)

Hi Everyone!

Solving AI audio end-to-end means tackling both generation and understanding - from text-to-speech to speech-to-text and everything in between. At ElevenLabs, we’re working on breakthroughs in AI audio that bridge research and real-world use.

Ask me anything about what we’re building, the challenges of scaling AI speech models, and where this space is headed. Also keen to hear what you’ve built with ElevenLabs! 

Add a comment

Replies

Best
Ralph Lasry

When will AI advance enough to generate a course with my voice and appearance, making it indistinguishable from me actually presenting? (course is just an example use case)

Rajiv Ayyangar

@ralphlasry There's an argument to be made that it doesn't need to be indistinguishable unless the target is people who know you well and intimately, and that all it needs to be is realistic and high quality. In that case, we might be there now?

Kwindla Kramer

I'd love to hear any thoughts you have about architectures for next-generation models. This is maybe a bit of a more long-range question, but for conversational voice it would be nice to move past the approach of "triggering inference" and towards a more natively streaming approach to inference.

We now have the ability to maintain long-lived connections to APIs like your Conversation AI API. This is really exciting! I'd love to be able to rely on the model to decide whether to respond based on something other than "turn detection." For example, if I'm talking to my personal assistant agent, it might let me talk for a while about my todo list items, silently collecting data, before deciding to respond.

Rajiv Ayyangar

@kwindla +1 would love to hear Mati's thoughts here! I'm imagining the next generation of conversational AI interfaces that have a much more fluid and flexible feel of conversation. And maybe even handle conversations between multiple humans and an AI better (and multiple AIs?).

steve beyatte

Using ElevenLabs feels like magic. It's phenomenal and deserves all of the attention it gets. It seems like it would take momentous engineering not only for the actual tech, but also to build for this scale so quickly. What was the hardest technical challenge in building out the platform? Did you ever think it wasn't going to be possible? Any good stories around the technical feats that were required?

Yan Bingbing

Is it possible to record my current voice and then generate a voice from childhood or old age?

Or even more fun, to combine the voices of a male and a female to predict the voices of their offspring?

Nika

@onbing This is an interesting idea. :)

Artin Bogdanov

Hi Mati,


I'm currently building an app called SUN, which allows curious minds to create deep-dive audio-courses on any topic with an integrated Q&A capability. I tested ElevenLabs' product, uploaded and trained voices, and I have to acknowledge, you've created an incredible product. Training a new voice requires less than a minute of audio, and the quality is arguably the best on the market.


However, after calculating unit economics, it became clear that using ElevenLabs wouldn't be a viable option for my startup. This is unfortunate because the experience I envision is voice-rich, incorporating nuanced accents, first-person perspectives, third-person perspectives, and more.


I'm curious about ElevenLabs' strategy regarding product costs. Are there any partnership programs on the roadmap for startups like mine? I’d love for SUN to be a long-term partner of ElevenLabs, but the current cost structure makes it unfeasible.

P.s. I hate to be a guy who complains about prices, but that's the challenge of my start-up in using ElevanLabs today.

Javier Fandos

Hey!
I've just build something to generate audio stories for kids, leting parents clone their voices... !

I want to build now something to "talk" with people who have died, I mean.. you clone the voice of someone who died.
Is this even legal? I mean the person who died is not gonna be able to allow you to clone their voice...
what are your thoughts about this? Could I use eleven labs for this?