Cris

Trustworthy Language Model (TLM) - Add trust to any LLM at scale.

by

TLM solves the biggest problem with productionizing GenAI: reliability/hallucinations. Get more accurate outputs than GPT-4, along with trustworthiness scores. Enabling reliable LLM applications like text generation, data enrichment, and RAG at scale.

Add a comment

Replies

Best
Anish Athalye
Hi Product Hunt! We’re super excited to share our new solution for LLM reliability with you. LLMs show great promise as a key component of new applications like AI-powered customer service chat-bots, AI-powered coding assistants, AI-powered data transformation and structured data extraction, and more. However, while nearly every enterprise is experimenting with LLMs, only a small fraction have successfully deployed LLMs for production applications because of a key issue with today’s LLMs: their unreliability and tendency to produce “hallucinations”, or bogus outputs. These hallucinations are a show-stopper for many applications, and early adopters of LLMs for production applications have been bitten by this. Air Canada’s rogue AI chatbot promised customers refunds against airline policies, and a court ruled that the airline must honor the promise (https://thehill.com/business/447...). A lawyer used ChatGPT to help prepare for a court case and now has to answer for its bogus citations (https://www.nytimes.com/2023/05/...). We built Trustworthy Language Model (TLM) to close the gap, addressing the key challenge for deploying LLMs. TLM builds on top of existing LLMs by improving their accuracy and also providing a trustworthiness score for the output, enabling production AI applications. Through extensive benchmarking, we’ve shown that TLM gives higher accuracy than existing LLMs like GPT-4, and its trustworthiness scores are well-calibrated. Learn more about how and why we built TLM in our blog post: https://cleanlab.ai/blog/trustwo... We’re excited to see what the community builds with TLM! Happy to answer any questions you have about TLM, LLM reliability, or data curation / data-centric AI more broadly. —Anish, on behalf of the Cleanlab Team
Albert
Congratulations on the launch of Trustworthy Language Model (TLM), Anish, Cris, and Emily! It's exciting to see a solution tackling the reliability challenges of LLMs head-on. The issue of 'hallucinations' in AI outputs is indeed a critical one, and TLM seems like a game-changer in this space. I'm curious to know more about the benchmarking process for TLM. How did you ensure that the trustworthiness scores are well-calibrated, and what measures were taken to achieve higher accuracy compared to existing LLMs like GPT-4? Looking forward to exploring the potential of TLM in enabling more reliable AI applications.
Emily Barry
@mashy Thanks Albert! We ran benchmarks with 5 Q&A datasets across different domains (world knowledge, school exams, math, medical diagnosis, …) and measured the ability of TLM trustworthiness scores to detect bad LLM responses with high precision/recall, as well as the accuracy of LLM vs TLM responses. You can find more details/results in our research blog, especially if you go through the Appendix: https://cleanlab.ai/blog/trustwo...
Avkash Kakdiya
I came across Trustworthy Language Model (TLM) on Product Hunt and wanted to extend my congratulations on its launch.