An open-source implementation of Anthropic's Computer Use to perform basic tasks using AI Agents. Currently supports Langchain, Azure OpenAI Models, and Gemini models. Contributions and supports are more than welcomed to improve the functionality.
Hi PHers 👋🏻,
Anthropic has taken the world by storm with their latest Computer Use model. It can perform arbitrary actions on the system using text prompts. This is one big feat in AI Development 🤯
❓ Why Clevrr Computer?
With Recent development in such models, it got me thinking if other models would be capable of doing the same or not? Clevrr Computer is a Langchain ReAct Agent with multi-modal modals like Google's Gemini 1.5 Pro and OpenAI's GPT-4o to perform actions on the system computer like mouse movements, web search, native app controls, etc. The possibilities are endless with such a technology and this is just the beginning.
💻 What can you do with it?
You can use Clevrr Computer to reply to your Mom on WhatsApp, spin up a fun conversation with ChatGPT on the web with 2 AIs, or do research and analysis on basic web search.
⚙️ How it works?
It's a multi-modal AI Agent running at the back with a constant screenshot capturing mechanism to learn what it is seeing on the screen and direct the main action agent to function accordingly, using python to perform actions automagically.
⛔ Does it have limitations?
Of course, it does (and it should, honestly) have limitation in terms of the screen access, code complexity, some tasks performance, etc.
🧪 Where can I try it?
Check out the full steps to download and use the Agent within your computer at https://github.com/Clevrr-AI/Cle...
I would love for you to contribute to make this project perfect and fast forward the era of AI agents! Contributions are more than welcome. 💖
This is awesome! Computer use is definitely the next step with AI. Quick question, can you command the AI with voice prompts or text prompts only? Congratulations on your launch team!
Huge congrats to the Clevrr Computer team on today's launch! I love how you've democratized access to AI-powered task automation with open-source goodness. Here's a curious question: What's the most creative/basic task (e.g., email sorting, content gen?) you've seen users automate so far with Clevrr Computer's Langchain, Azure OpenAI, or Gemini models?