How hard is it to find a dataset to train an AI model?
Sagar Patel
9 replies
I am curious about the experience of all other Founders who built their own AI. How hard was finding a dataset for your specific use case?
Did you build it from scratch or did you buy it/out sourced it??
Replies
Jake Harrison@jakeharr
Free Essay Checker AI
You can check the kaggle
Share
Maruti.io
Actually I thought this question yesterday, I think if you want your product standout you must give a huge database to it, then make it understand and processing. However, it is hard because only big companies can get large data base, and you are just a owner of a small startup. I mean, cooperation is based on the premise that there are benefits for both parties. And clearly in this stage you can offer basically nothing.
Maruti.io
@sylvia_sheng true! The landscape favors those with big datasets, and already captured market share! I think startups with Ex-Meta or Ex-Big-Name company have an edge. They can get funding to close the gap. The way I am closing the gap is by leveraging my network, and connections!
@sagarpatel10 Yesss, I also think in this way. But some times connections not means everything. Big database is also a digital asset, so I think startups must think altruistically and imagine what they can bring to the table.
What type of AI model is it? Usually if you know the domain closely enough to understand the problem, you'll have access to at least one dataset either through knowledge of public domain problems or private data. If you don't understand the problem domain, you're unlikely to be ready to produce an AI model for it.
Maruti.io
@david_rawlinson I agree! Domain expertise is important to be able to build a functional model! As for me, I am using datasets from hugging face, and Kaggle. This got me wondering though, where do other people find datasets if not on one of these two.
@sagarpatel10 I don't know where to start to be honest. There are so many places. What type of dataset/ problem are you looking for?
Maruti.io
@david_rawlinson assistant type dataset. You know like an AI assistant. There are quite a few on hugging face so I’m good. I was just wondering what other developers use, and if there are any higher quality datasets out there that they would recommend!