We live in a world where everything is being automated. But catching and understanding software problems still takes a lot of manual work. There's a better way! Let machine learning catch software problems and tell you what happened.
Gavin and Zebrium team congrats on the product launch!
Can you help us understand how Zebrius is different from the likes of DataDog or AppDynamics? I understand from the previous comments ( from Ajay) that Zebrius eliminates the need for dedicated resources to analyze data and build rules - you automate it based on ML. Can you share examples to illustrate the magnitude of this automation?
Hi @sakthi_chandra - thanks for the comments. It's a very exciting time for us! You're correct - the biggest difference is that we are ML driven and take away the human work involved with monitoring. A couple of examples: Soon after installing, one of our customers saw a Zebrium alert that pointed at a bug in their code. To cut a long story short, we uncovered an issue related to certificate handling that would have caused a major outage if it had gone unnoticed. Another example was a customer where we caught a malicious login attempt. In other cases we have detected infrastructure issues, database issues, networking issues, etc. The list is endless. The key thing is that no-one built rules to catch any of these problems - it was all done 100% by our machine learning with zero human supervision.
@sakthi_chandra Having used traditional cloud logging tools in the past, this is a huge breath of fresh air. No more having to manually create alert rules, graphs, and other configurations that require frequent updating. A tool that will automatically tell me something is wrong based on what it learns from my logs is tremendously helpful. On top of all that - providing a root cause helps us avoid searching in the dark.
Hey, Product Hunters!
I'm excited to tell you about Zebrium, an AUTONOMOUS monitoring platform. Zebrium is built to identify "Unknown Unknowns" using machine learning - problems you don't have alert rules built for - and show you root-cause, even the FIRST time you hit them.
Are you tired of "Unknown Unknowns" biting you in production? Tired of the fire drills, the slogging through logs and metrics? Tired of having to figure out what just broke, with hundreds of customers waiting on you? We need a new weapon to defeat complexity and slash resolution time. I believe autonomous monitoring is that new weapon.
*** How To Get Started ***
1.) Go to our website and follow the instructions. We'll email you a URL with login details.
2.) If you're using k8s, it's a one-line chart install. Once you log in and set your password, you can cut and paste the install command which includes your API key.
That's it! You're signed up and set up for our free service, with autonomous monitoring on your side.
Here's what you WON'T have to do:
3.) Training, connectors, parsers, configuration, waiting, alert rules, etc.
Works immediately on any app or stack!
*** How This All Began ***
I founded Zebrium because I was tired of writing data parsers and alert rules and then maintaining them, just to keep an eye on deployed software. The most frustrating part was the long tail of always-new failure modes. There was always the next fire drill.
Zebrium means "elemental pattern", and the idea was straightforward: use ML to structure telemetry from deployed software, to extract regular and anomalous features from this data, and to cross-correlate these features to detect incidents with root-cause. We suspected there were fundamental ways software behaves when it breaks - elemental patterns - and that we could exploit these patterns for incident detection.
Experience with hundreds of real-world incidents and dozens of applications has proven that we were right! It turns out that, most of the time, we can detect important incidents automatically. We can usually surface root-cause too, if the logs and metrics reflect it. Zebrium works so well that I would want to use it, if I weren't me; so, I imagine you'll want to use it, too. :)
I've been on the "short end of the stick" building rules, maintaining them, managing dozens of engineers building and maintaining monitoring solutions and pipelines.... for 20+ years. More than once I've been heads-down at Zebrium when I hear the familiar Slack-ding only to see our own software just alerted us to root cause of a problem in our own software!!! It's frikin brilliant!
I'll tell you, it makes me giddy every time I use the UI or see one of those Slack alerts in action.
Great approach @zebriumlearns - I love the fresh idea. I'm a big fan and multi-year user of your previous genius idea leveraged at Nimble Storage. I'm eager to see how I can leverage Zebrim, and want to see about giving it a go.
Love the super-simplicity of installing, configuring, and maintaining Zebrium's autonomous monitoring platform. Similar to other technologies (e.g., K8s & Terraform), it's a declarative approach that tells me WHAT I want to know about my staging and prod environments without having to specify the HOW. Frees me up to do the more important stuff. Kudos to the Z-team.
Congratulations on the launch @zebriumlearns ! Very innovative and impactful product. With your machine learning approach, I'm pretty sure the time to resolution will drop 10 folds while reducing the investment required to setup those tedious alerts. It's a rare win-win for any businesses looking to keep a check on critical services and deploys! Looking forward to success stories from Zebrium!
@zebriumlearns@sandeepk Thanks so much for the encouragement and your comment, Sandeep. We already have a lot of success stories and we're looking forward to a whole lot more!
Having been in the monitoting space for 6 years and also founded a company in this space, I don't believe the future of monitoring is staring at walls of dashboards and continuously configuring alerts. This space is ripe for disruption by Machine Learning and Zebrium have the most advanced solution in this space by far. Unlike other "AIOps" tools just basically act as band aids on top of your existing monitoring solutions to reduce your alert noise, Zebrium ingests the raw metrics and logs so has complete visiability into the incidents as they happen with root cause - very powerful!
Larry, you are a proven wizard in predictive and autonomous monitoring! Zebrium is a guaranteed success and you will make such a difference in the world!
With the explosive growth of logs coming from so many distributed services, it is indeed very hard for human to go through them and find all the issues and create alerts for each one manually and maintain those regex's across software changes.
If your software detects issues automatically, that is very cool. Finding problems with logs and correlating them with metrics anomalies is great too.
Gavin, Ajay, Rod and team - congrats on the product launch! Very exciting, and extends the value of ML that was established with InfoSight/Nimble Storage. It will be great to see where this can go as a platform. Kudos!
@matt_miller6 Thanks for the comment. Things are going really well here. We have a growing installed base, and more importantly, they're getting huge value from our platform.
This. This is what has been needed in the Kubernetes ecosystem for a while now. A lot of issue we deal with comes from the dynamic nature in which kubernetes orchestrates applications.
I like this term "unknown unknowns". There is this surprise that I'm sure most folks in the kubernetes space have encountered, where we couldn't have possibly anticipated the reason some workloads fail. I'm excited to have found a product that address this issue.
I'm going to try this out!
Hi @sidhartha_mani - thanks for the comment. Yep - K8s is the perfect ecosystem for us. Manual hunting through logs and dashboards is really hard when you have a distributed app with hundreds or thousands of microservices. But worse are the number of possible failure modes in these types of architectures - thus the importance of being able to detect "unknown unknowns" :-)
Can't wait to get your feedback once you have a chance to try it.
ML and AI technologies are the future. More industries and functions use and leverages these technologies to keep performance up. Not only advanced technologies solve problems faster but they also help you work smarter. DevOps engineers should try Zebrium solution to experience a new amazing way to monitor platforms. Autonomous monitoring first or yo will be last...
@usedigital Hi Franck - I love what you said at the end. We'll have to start using that as a tag line :-) But seriously we couldn't agree more - we totally believe that the only way to catch and help solve problems in the age of cloud native (lots of complexity) is to leverage machine learning. We'd love you to try it and give us feedback.
Rod, Gavin, Larry - Congratulations on the launch. What a great disrupter of all the manual processes we have all created over the years to deal with something that's been biting us for years.
@caeli_collins Thanks so much, Caeli. Thanks also for all your feedback and advice when our product was still in the very early stages. We're super-happy with where the product ended up.