Banana.dev
p/banana-dev
Serverless GPUs for Machine Learning Inference
Blake Peeling
Banana — Serverless GPUs for Machine Learning inference
Featured
74
Banana provides inference hosting for ML models in three easy steps and a single line of code.

Stop paying for idle GPU time and deploy models to production instantly with our serverless GPU infrastructure.

Use Banana for scale. 🍌
Replies
Ryan Hoover
Did you invest, @turnernovak?
Erik Dunteman
@turnernovak @rrhoover Not sure he could handle as much potassium as us
Ashley Porciuncula
Genius! Congratulations on the launch!
elena silenok
Can’t wait to explore!
Blake Peeling
@silenok thanks Elena!
David Banys
On-demand + autoscaling + minimal cold start + GPUs?? The dream!
Erik Dunteman
@david_banys all those things :) thanks David
Derek Pankaew
Wish this existed when we were doing GPU-heavy stuff!
Erik Dunteman
@derekpankaew yes I'm sure this would have been solid for the Yolo models you were running at Next Fitness! With models that small too they would have scaled up in the matter of a couple seconds too :)
Larry Arlene
Super useful tool! Thank you for sharing :)
Greg Priday
This looks seriously impressive. I'll be trying it out soon. Seems like this will be a huge step up over Google Cloud Run in terms of speed. I see on your roadmap that you're planning on moving to beefier GPUs in the future. Which GPUs are you running now? Also, from a technical perspective, I assume this works by moving models from CPU to GPU at inference time? Trying to wrap my head around how you're getting such fast cold starts.
Kyle Morris
@gregpriday Thanks for the support! The roadmap is now! We used to only do T4 GPUs, but now we also support A100 GPUs which are yielding faster cold boots + inference + download speeds. To understand how banana works it may be easier to think of us as a compiler company. When you send us a model we do stuff under-the-hood to make it run faster/cheaper. CPU/GPU memory hacks are definitely involved (how we load memory, where, when). A key point is none of our optimizations affect model outputs. This means we don't do weight quantization dynamic layer/node pruning which yield way smaller/faster models but does affect output.
I have ideas that involve ML, and seeing products being launched that make it easier for our dev community to deploy and run encourages me to take my ideas seriously. Thank you! I shall give it a try for sure
Максим Шевяков
That's useful!
Eric Jung
Are there any metrics around the cold boot? Does it depend on the model size, etc? And what does the quota look like on the max number of concurrency, model size, etc?
Erik Dunteman
@eric_j1 Yes! Cold boots vary based on model size, but a GPTJ model (which takes 20 minutes to load to GPU usually) comes live on our platform in 10 seconds. Most customers see 1-5s cold boots right now. We just upped max concurrency! We're provisioned for the average model (8gb GPU RAM) to spike to 200x concurrency, but do have a soft cap of 10 to prevent customers from accidentally overscaling. We can adjust that for anyone who needs more :)
Joe Speiser
This rocks, sharing with all my devs friends now!
Sahil Chaudhary
@joe_speiser1 Thanks for the share!
Nader Khalil
This is awesome!!! Serverless GPUs makes so much sense
Blake Peeling
@naderlikeladder heck yea! thanks Nader :)
Abhishek Bhargava
Love banana!! super excited to be a user soon :)
Erik Dunteman
@abhishek_bhargava do it, we've got the fastest servers in the whole wild west. any specific models you're looking at?
Abhishek Bhargava
@erikdoingthings mostly large language models! + inference using k-means clustering / indexing & potentially running LSTMs on the embeddings :)
Erik Dunteman
@abhishek_bhargava We like them large! When you do jump into Banana, if you implement in pytorch you'll get some killer speedups on cold boot
Blake Peeling
@abhishek_bhargava we can't wait to have you on Banana! thank you for the support :)
Kai McKinney
Incredible work, team! This just makes sense.
Blake Peeling
@kai_mckinney we appreciate that Kai :)
Roman Puliyan
Interesting
Morgan Gallant
Been using Banana in production for a good bit now, nothing but great things to share! Few notes: - Product is insanely good. Specifically, we use it for indexing jobs requiring a good bit of GPU compute. These jobs are huge, sometimes involving up to 1M inferences of a large NL model. Banana is perfect for this use case, as we can burst up to 10+ GPUs, only pay for the compute we use, and quickly scale back down to near zero. - Team is very strong, super responsive to questions and are experts at deploying & scaling ML models. We often get advice and recommendations from their team on how to best do something, and it's been really appreciated! - Lastly, velocity / speed of iteration has been ridiculous. They're moving really quick, have an ambitious roadmap, and ship new features and improvements daily. It's been really cool to watch. Would highly recommend anyone check them out!
Kyle Morris
@gallantlabs Thank you for your kind words & support, Morgan! Fantastic having you as a customer and inspiring seeing your progress as a team! Always a message away :)
Erik Dunteman
@gallantlabs Morgan is an awesome customer! Thanks for the love
Sarim Malik
Good luck on the launch, team.
Blake Peeling
@sarimmalik thank you Sarim!
Pranav Teegavarapu
this is awesome!!
Erik Dunteman
@pranavnt means a lot from the Kobra founder! Give it a whirl
Blake Peeling
@pranavnt thanks! we appreciate the <3
Oren Leung
gpu coldstarts is a tough problem to solve! glad banana is taking on this challenge! definitely super cool product! especially for hobby projects that require gpus and you don't want to pay an arm and an leg to host a demo
Natalie Sydorenko
ML is a new reality. Guys, your product opens the door to the future. Well-done!
Sahil Chaudhary
@natalie_sydorenko1 Thanks! We hope this can help democratise ML by making hosting affordable.