Deepchecks Monitoring
p/deepchecks-monitoring
Open Source Monitoring for AI & ML
Kevin William David
Deepchecks LLM Evaluation β€” Validate, monitor, and safeguard LLM-based apps
Featured
94
β€’
Continuously validate LLM-based applications including LLM hallucinations, performance metrics, and potential pitfalls throughout the entire lifecycle from pre-deployment and internal experimentation to production.πŸš€
Replies
Shir Chorev
Thanks @kevin for hunting our LLM Evaluation solution 😊 πŸ‘‹ Hey, ProductHunt community I am Shir, co-founder and CTO of Deepchecks. At Deepchecks, we’ve built a pretty special solution for LLM Evaluation and are thrilled to launch it today on ProductHunt! When we launched our open-source testing package last year, we quickly received an overwhelming response with over 3K stars 🌟 and more than 900K downloads! After the launch of our NLP package in June, we noticed that an incredible amount of the feedback calls we were having about the NLP package were asking for help with evaluating LLM-based apps. 🀯 After creating an initial POC and getting feedback from various companies, we gained the confidence we needed to dive deeply into the LLM Evaluation space. And yes, turns out it’s a pretty big deal. πŸš€ As we began working on the LLM Evaluation module, we’ve arrived at some important learnings that teams are struggling to figure out answers to these questions while deploying their LLM apps: - Is it good? πŸ‘ (accuracy, relevance, usefulness, grounded in context, etc.) - Is it not problematic? πŸ‘Ž (bias, toxicity, PII leakage, straying from company policy, etc.) - Evaluating and comparing versions (that differ in their prompts, basemodels, or any other change in the pipeline) - Efficiently building a process for automatically estimating the quality of the LLM interactions and annotating them - Deployment lifecycle management from experimentations/development, staging/beta testing, to production. Deepchecks LLM Evaluation solution helps with- βœ… Simply and clearly assess "How good is your LLM application?" πŸ”€ Track and compare different combinations of prompts, models, and code. πŸ” Gain direct visibility into the functioning of your LLM-based application. ⚠️ Reduce the risk during the deployment of LLM-based applications. πŸ› Simplify compliance with AI-related policies and regulations. We're also hosting a launch event today at 8.30 AM PST today, feel free to sign up to interact with the Deepchecks team and see a live demo: https://www.linkedin.com/events/... Apply for Deepcheks LLM evaluation access: https://deepchecks.com/solutions... 😊 Would appreciate any questions, and hope to see you there!
philip tannor
@hay_day3 thanks my friend!
Divyansh ChaurasiaπŸ‘¨πŸ»β€πŸ’»
Excited for the launch! πŸŽ‰
Shir Chorev
@asdivyansh Such a pleasure to have you with us on this journey
philip tannor
@asdivyansh yup it’s a big deal ❀️
Akanksha Bhasin
Congratulations Deepchecks team on the launch! πŸš€ It is truly an impressive solution in the world of LLMs!
Shir Chorev
@akankshabhasin thanks for the kind words and support
philip tannor
@akankshabhasin thank you so much!
Alex Gavril
An innovative approach to evaluating language models. The detailed insights it provides are invaluable for improving model performance. Congrats on the launch! πŸ‘
philip tannor
@alex_gavril1 thanks, you rock!
Sergei Sherman
Great stuff, we are using deepchecks for our internal LLM evaluation, requires couple of minutes to get big insights!
philip tannor
@sergei2020 thanks a million my friend!
Sinan
I've been experimenting with LLM evaluation metrics on my own for a while now. This is a pretty good solution, will definitely try it out. How do you imagine the future of CI/CD for LLM applications?
philip tannor
@sakameister great question, this has been a question for testing classic ML as well. I can imagine a process kind of like GitHub Actions that runs suites of tests, and some of them may need to involve making sure some manual annotations happened
Khalid Idbouhou
can't wait to try this
Shir Chorev
@khalid_idbouhou Can't wait to hear your experience using it!
Vivek
Congratulations Div and team.
Congrats team Deepchecks LLM Evaluation on the launch!
Shir Chorev
@manmohit Appreciate it!
philip tannor
@manmohit thanks my friend!
Matan Mishan
Nice! How does it support RAG if any?
Shir Chorev
@matan_mishan Thanks for your question. Indeed, this is one of the most popular use cases users use :-) Question answering, customer support, etc... We enable logging the various steps in the interaction (e.g. the input, information retrieval part, output, etc.), and common issues we find are things like: the output wasn't based on the information retrieval part (a.k.a. indication for hallucination), the retrieved info isn't relevant to the question asked, etc.
Mahmudul Hasan
Congratulations πŸŽ‰πŸŽ‰πŸŽ‰
Ariel Biller
Another quality product from deepchecks. you've been kicking ass this year!
philip tannor
@lstmeow thank you so much my friend!
Nilay Jayswal
Congrats on the launch team!
Shir Chorev
@nilay1101 thanks my friend!
Shai Yanovski
Congratulations on launching the Deepchecks LLM assessment! This is an incredible achievement and a testament to your team's dedication to the field. I can see how this will be a game-changer for many projects. Keep up the great work!
Shir Chorev
@shai_yanovski Thanks so much. Appreciate your support throughout our journey! And looking forward to our next random meeting on bikes in the park πŸ˜…
Ofer Hermoni
Looking forward to learning more in the webinar today!
Shir Chorev
@ofer_hermoni Hope you did! Feel free to drop us a note for any thoughts or questions
Andrey Cheptsov
Congratulations on the launch! It's an amazing and much-needed product.
Shir Chorev
@andrey_cheptsov happy to hear your thoughts :)
Yael Barsheshet
Congratulations!!
Shir Chorev
@yael_barsheshet1 thanks for your support!
Saroj
@shirch : Congrats on the launch team, the product looks amazing.
On Freund
I've been loving every release from this team. Can't wait to try this one out.
Shir Chorev
@on thanks so much! Looking forward to hear your thoughts :)
philip tannor
@on can’t wait for the feedback!