Daily.co
p/daily
Real-time voice, video, and AI for developers
Rajiv Ayyangar
RTVI-AI Open Standard — Make an AI voice chat app in 21 lines of JavaScript
Featured
15
RTVI-AI is a new open standard for Real-time Voice and Video Inference. Open source reference JavaScript and React SDKs are available today, with iOS, Android and other platform SDKS coming soon.
Replies
Rajiv Ayyangar
I'm a fan of anything that enables builders to build better, faster, and more expressively. This seems promising in that regard. I know @kwindla and the Daily.co team have many decades of combined experience with the WebRTC community and other open source projects. It's exciting to see them setting a new standard for real-time AI inference. From @kwindla: --- Today we’re announcing an open standard for Real-time Voice and Video Inference: RTVI-AI. The RTVI abstractions and data structures define how client applications communicate with inference services. These are the “real-time APIs” for use cases like: - Voice chat with LLMs - Enterprise voice workflows such as healthcare patient intake - Video avatars and immersive experiences - Voice-driven user interfaces - Voice conversational apps for education, customer support, and games - High-framerate image generation and streaming generative video We’re shipping open source reference JavaScript and React SDKs today, with iOS, Android and other platform SDKS coming soon. This first release has been several months in the making, and incorporates work and insights from Groq, Deepgram, fal, Cartesia, Cerebrium, Vapi, and Daily With RTVI, a “hello world” voice-to-voice AI chat app in JavaScript is 21 lines of code. If you want to build real-time AI applications, implement infrastructure for real-time inference, or implement your own SDKs that leverage the RTVI standard, you are more than welcome to join this project. We welcome all contributions and ideas!
Kwindla Kramer
Thanks, @rajiv_ayyangar! Really fun to see this on Product Hunt. We've been building a lot of real-time voice and video AI apps, and there's so much potential to do useful, interesting new things. There's a live demo of here: https://demo.rtvi.ai/ And lots of good discussion on the Discord here: https://discord.com/invite/pipecat Our goal with RTVI is to make it easy to build AI voice-to-voice and real-time video applications. * Applications developers should be able to write code that can use any inference service. * Inference services should be able to leverage open source for the complicated, client-side developer tooling needed for real-time multimedia. * Any developer should be able to trivially stand up real-time AI infrastructure for small-scale use, testing, or prototyping.
Andriy Semenets
Congratulations on the launch! How does the video part works here? Does it use the same WebRTC standard? Thanks!
varun
@semanser Yes, in the example above, it sending both audio and video, but just receiving audio. It is possible to manipulate the video within pipecat (the server side) and send it back. We will have demo code for this shortly on github! Yes, pipecat supports and defaults to Daily's WebRTC transport. So you get all the benefits of webrtc's low latency and Daily's Global Mesh-SFU infrastructure.
Pavel Bocharov
Wow this is so cool! Congrats on the launch! I already see a couple of ideas to implement with this, upvoted!
Hassaan Raza
Another amazing launch from the team at Daily. Appreciate all the great work y'all do @kwindla !
blank
Wow, this is super exciting stuff! 🚀 Kudos to @kwindla and the Daily.co team for pushing the boundaries of real-time AI inference! I love how RTVI-AI opens up so many possibilities for builders to create innovative solutions. The use cases listed are mind-blowing, especially voice chat with LLMs and immersive video experiences! Can't wait to see the iOS and Android SDKs roll out too. It’s great to know that it’s open-source, making it so accessible for developers! Definitely looking forward to trying out that 21-line "hello world" app. This feels like just the beginning. Let’s build some awesome stuff together!
Kyrylo Silin
Hey Rajiv, How does RTVI-AI handle scalability for large-scale applications? Are there any performance benchmarks available? Congrats on the launch!
varun
@kyrylosilin the goal of RTVI is to be able to write the client side code without worrying about the underlying infrastructure. The infrastructure in theory should be swappable. The current RTVI implementation uses pipecat bots, which uses webrtc and the @dailyco infrastructure. The daily.co infrastructure can manage 10s of millions of simultaneous calls and we have a global footprint, 15 geo locations around the world, namely, us-east, us-west, canada, london, frankfurt, middle-east, mumbai, singapore, seoul, sydney, capetown, saopaulo. That being said, since RTVI is opensource, it’s possible to add other types of transports or services.
jonathan ander
This sounds like a valuable tool for developers working with real-time voice and video. The open source approach and upcoming platform SDKs are impressive.
Toshit Garg
Congratulations for launch on Ph...
Tracey glen
It's great to see open source solutions like this in the AI and audio space. Looking forward to the iOS and Android SDKs as well.
Rudi Skogman
Looks super powerful! Good job!
Jayesh Gohel
What is RTVI-AI? Is it a new way to use AI for real-time voice and video? Can developers use it easily with JavaScript and React now? Will there be tools for other platforms like phones soon?
varun
@jpgohil93 Yes, we launched with the React and Web/JS SDK, today. We are working on iOS and Android SDKs, which will be announced shortly. The way to think about this is RTVI is a the client-side implementation, which is open-source and can essentially connect to any server-side RTVI implementation. Today, the server-side implementation is pipecat.ai, which co-ordinates with the configured Speech-to-text, LLM, Text-to-speech.