Stories

Upgrading to GPT-4o: Four founders share their experience

We interviewed four startups who recently upgraded to GPT-4o. Here’s what they had to say.

Sarah Wright
Sarah Wright
May 27th, 2024
tl;dr
  • GPT-4o is so much more cost-effective, makers are able to add it to more features and keep overall costs down to keep their business running.
  • GPT-4o's improved context capabilities also greatly improves user experience for existing features.
  • Makers have explored, or are using, other models such as Claude 3 for different tasks where they need performance over speed.
  • Makers are exploring the multi-modal capabilities of GPT-4o. Initial results are promising.

Features made possible by GPT-4o

Q: Could you provide an example of how GPT-4o has improved a particular aspect of your product?
reap 

reap repurposes long videos (podcasts, webinars, interviews) into engaging social content (Reels, Shorts) with just a click.
🔖 Chapter identification. "GPT-4o could evaluate video transcriptions and captioned scenes with such precision that the chapters it identified seemed as though a human had curated them. The context was clear, the topics were relevant, and the overall quality of the clips soared."
🔁 Captions in multiple languages. "GPT-4o processes multiple languages out of the box with less context, more precision, and faster speed. This capability has allowed us to cater to a large global audience from day one."
CustomerIQ

CustomerIQsynthesizes your sales touchpoints (calls, tickets, and CRM notes), extracts insights, and creates revenue-driving content for marketing, sales, and product.
💁 Fast assistant. The biggest difference is in the speed of the responses, which has made for a really nice user experience. We now use GPT-4o for our AI Assistant. The Assistant uses customer highlights we extract from calls, meetings, tickets, etc. to answer questions and automate data entry for sales, marketing, and product teams.
🎯 Hyper-personalization. As a result of GPT-4o, we're now providing ways for our users to extract more specific data based on their customer's pain points or requests. These fields are used to populate marketing outreach or sales emails, which results in hyper-personalized interactions with customers based on their unique needs. We think the second and third-order effects of these models are ultimately more personalized customer experiences.
🗂️ Image organization (coming soon): A new frontier we're excited about is how we can use GPT-4o multi-modal capabilities to allow our users to organize more than just text, but also images, like in the case of social media posts. We have beta users testing this with us to identify high-performing content in large online communities. If all goes well we'll have more exciting things to launch here on Product Hunt in the next quarter :)
Nowadays

Nowadays takes the hassle out of organizing corporate events, like contacting venues and dealing with negotiations for you.
🌉 Image Classification: We use GPT-4o for image classification and QA of the venue images that are shown on customer dashboards. Images in the event planning space is SO important. Just like Airbnb… photos can immediately turn someone away or towards a venue! [GPT-4o] lets us know if there are blurry or unattractive images that we then remove to make sure that great images show. So far it’s been pretty accurate for us, but obviously it hasn’t been that long since the release. This is so much more cost-effective now with GPT-4o and much quicker than having somebody manually do them.
🏖️ Personalization: If our customer asks for a venue that’s on the beach, ideally the venue cover photos relate to that and has some image of a beach rather than a random image of a hotel room bed or meeting space. We’ve been testing GPT-4o to filter through images for what’s most important for a user. Maybe the user specifically asked for a venue with a golf course and pool onsite - then these images should be prioritized. We also have an Explore page where we’ve used GPT-4o for some simple classification tasks. We actually built that page entirely the day before the launch so that PH users could have something to explore even if they didn’t have any current events.
Emails: We’ve also started using GPT-4o for writing emails, the generated text sounds super natural and it’s much harder to actually tell that it’s AI. However, we only use it for more simple reasoning rather than analytical, complex tasks.
Buffup

Buffup is a browser extension. It lets you access GPT-4o and Claude from your browser, recognizing the intent of your browsing.
Accuracy on intent: We want to provide an easy-to-use AI tool to the users. In particular, you don't need to write a very precise prompt to get the information you want. In the latest update, we have made two efforts: one is to determine the user's intention to ask the AI a question based on the content of the webpage as soon as the user browses the webpage; the other is to determine the user's next question based on the context as soon as the AI finishes answering a question. We were lucky to meet the GPT-4o update, which brought more intelligent judgment. The success rate of human intent judgment is basically over 92% (the presence of one of the three intent options provided based on the content of the page that the user expects to know is considered successful).

Other models the founders considered

Q: What other models or approaches did you consider? Did you face any hurdles or decision-making trees while making the switch?
Reap: During our testing, we found that while LLaMA 3 had the best results among Gemini 1.5 and GPT-4, hosting and maintaining the model ourselves would have significantly slowed down our process. As we were testing LLaMA 3, GPT-4o was released, offering much better and more reliable results. On the other hand, Gemini 1.5 produced generic, repetitive results. Though better than GPT-3.5, Gemini 1.5’s outputs were still not usable for our needs.
CustomerIQ: We use a family of models in CustomerIQ so we're always experimenting with new models for different use cases behind the product. For example, we use GPT3.5 for simpler, behind-the-scenes tasks due to its speed and low cost. And we use fine-tuned models for more defined tasks like classification. The way we think about design/implementation with these generations of models is: Use the "smartest" model anywhere the user wants to reason over data, and use the most performant one where we have well-defined tasks.
Nowadays: At least for our use cases, GPT has never been super accurate for our data extraction. We use Claude 3 Opus for these tasks instead (reading through venue emails and quotes and parsing relevant data) - we’ve done several tests and it was way better at understanding nuanced information related to venue information because it requires more complex reasoning. We realized that AI is not always accurate, so we put internal checksums and guardrails in place to make sure that our data is accurate, and notifies if there ever needs to be human-in-the-loop to verify the data (e.g. getting a venue quote with a nightly rate of $29 is really low and most likely not true for these corporate events - we will re-input the data in the model to have it try to check its own work)
Buffup: We considered Gemini and Claude3. Neither of them is as good as GPT-4 in terms of capability, but I think they would be more cost-effective, especially Claude3.

At a glance

We extracted some of the key themes. Here's where GPT-4o delivers or falls short. 
Reap: GPT-4o made personalization affordable with faster processing and higher-quality outputs... GPT-4o delivers efficient and superior results. It handles more context in-depth, understanding users’ video styles from scene captioning to editing clips, creating a large context window for personalization and closely matching the actual video style."
CustomerIQ: As the costs of these models decrease we're able to apply them to more specific areas of our user's "job to be done."
Nowadays: GPT4o has been best for us for classification tasks. At least for our use cases, GPT has never been super accurate for our data extraction. We use Claude 3 Opus for [data extraction]."
Buffup: GPT-4o is half the price of GPT-4 and is much smarter and easier to switch, so there was no question that we would use GPT-4o. We have used GPT-3.5 and GPT-4 as our base models from the beginning. However, the output of GPT-3.5 is poor for this new feature of intent judgment, and GPT-4 is too costly. Our page intent judgment is free for all users, and we start charging only for answering questions (free users also have enough free times). We have more than 1000 active users per month, so you can imagine that our cost pressure is still high.

About this article 

Makers let us know which products make their own products great when they launch on Product Hunt. You can see the most-loved products on the Shoutouts Leaderboard.
Look out for more stories and trends as we share more about popular products in our Weekly Digest newsletter.
Are you a maker? Make sure to shoutout the products that made yours possible when you launch. We instantly feature shoutouts in our Daily Digest and dive deeper across our weekly newsletters.