p/general

Share and discuss tech, products, business, startups, or product recommendations

Web Scraping 🔍🔥

Featured

•

3yr ago

Scraping public data from the web, transforming it, and using it for a new product can become a very successful business. What kind of web scraping projects have you worked on and which tools did you use?

Replies

Best

Bertha Kgokong

KaraboAI

(1) Scrapping job listing websites and creating your own product, mailing list etc for job hunters tools - python, selenium, Beautiful Soup

Report

3yr ago

Nik Hazell

Zappi Ad Predictor

I never finished it - but I started a Strava scraping project. I think there's a ton of suuuuper interesting data in there, although I did it for interests sake, rather than to monetise it. And yep, like @berthakgokong says - Python, Beautiful Soup, etc.

Report

3yr ago

David Gregorian

@berthakgokong @nik_hazell Also pretty cool. I think collecting data for a while and then figuring out what do to with it later is also not a bad idea. The value of data in general will be rising in the future. Have you tried puppeteer?

Report

3yr ago

Nik Hazell

Zappi Ad Predictor

@berthakgokong @david_gregorian I haven't - would you recommend?

Report

3yr ago

David Gregorian

@berthakgokong @nik_hazell You should check it out. The usability is pretty good, especially if you use it with Typescript. It is based on Chromium. All in all it has some quirks when controlling a headless browser engine, but I think that's not the fault of Puppeteer itself.

Report

3yr ago

Amirali Nurmagomedov

AnnounceKit

I remember my rookie days at coding. I was usually doing a lot of parsing, mostly bots fetching videos from various web sources. Everything done with preg_match function in PHP 🥲

Report

3yr ago

David Gregorian

@amirali_nurmagomedov Damn that's old school :P How long ago was that?

Report

3yr ago

Victor G. Björklund

Job websites, company databases, google serp, booking sites, etc. Mostly using google scrapy.

Report

3yr ago

David Gregorian

@victorbjorklund What do you mean by google scrapy?

Report

3yr ago

Renat Gabitov

Bardeen

Funny thing, I scraped the "Top Most Upvoted Products" using Bardeen.ai (our tool). It worked really nicely. BUT I wanted to figure out which month is the best to launch, and turns out they haven't updated that page, so now I gotta scrape the all products. https://www.producthunt.com/e/50... Let's see where this takes me.

Report

3yr ago

David Gregorian

@renat_gabitov Haha I also thought about it once. Can't you use the graphql api of producthunt? I think it is not public...

Report

3yr ago

Michael Silber

Product Hunt

@renat_gabitov @david_gregorian You can for sure use our public API for projects https://api.producthunt.com/v2/docs

Report

3yr ago

Balázsi Róbert

I'm building a no-code web scraping tool called https://datagrab.io.

Report

3yr ago

David Gregorian

@balazsi_robert Looks pretty dope! Did you create a chrome add-on?

Report

3yr ago

Jared Wright

Maasive (gets you jobs)

https://Metaheads.xyz - search engine for fb comments. nodejs + selenium :)

Report

3yr ago

David Gregorian

@jawerty Looks awesome! Does it store all the scraped data on a custom db? Or is there something happening on the fly, when doing a search?

Report

3yr ago

Naimur Rahman

I worked with Nodejs and puppeteer to scrape many complex sites for clients but now want to make software/tools as a side business. Any idea for me guys?

Report

3yr ago

David Gregorian

@naimur103 If you are so experienced with scraping stuff, maybe you could develop a no-code tool for creating custom scrapers :) Through a SaaS

Report

3yr ago

james smith

We at ejobsitesoftware used to receive many queries for the jobs database. So we have built a custom job scrapper in Laravel using Goutte. Check screenshot - http://cricketu.com/web-scrap/

Report

3yr ago

David Gregorian

@jobboardsoftware That looks pretty cool James! Did you think about publishing it? (Paid or open source)

Report

3yr ago

james smith

@david_gregorian We plan to use it along with Job Board Software - https://www.ejobsitesoftware.com and provide job database to job board owners

Report

3yr ago

Fabian Maume

Warmup Inbox

QApop is build using NodeJS Puppetter and AWS lambda. I also have some side income from consulting around Phantombuster

Report

3yr ago

David Gregorian

@fabian_maume QApop looks really good! Thanks for sharing :) Did you already think about applying the same to other (famous) forums?

Report

3yr ago

Stefan Morris

I had a website that scraped automotive listings and looked at the year, model, mileage, options and price to determine if it was a good deal (this was before everyone was doing it) I found the whole process of scraping messy and a bit shady (listing sites really wanted to protect their data) so I eventually abandoned it. Data ownership is a very messy subject which I decided to avoid completely. Decided to build a CMS instead - no reliance on external data :) It is currently in private release and I think it offers quite a few competitive features that separate it from the competition.

Report

3yr ago

David Gregorian

@stefan_morris Yes it can be messy. Especially the data ownership. But it's not illegal in general. It really depends on the use-case. With which tech stack are you building the CMS?

Report

3yr ago

Stefan Morris

@david_gregorian I agree, it's not necessarily illegal but depending on the site, it can break their Terms of Use agreement, which is where it can get messy. My CMS is a SaaS platform built with Vue/Nuxt and MongoDB. I'm still ramping up but there's a bit of information on my website (check out the docs) at https://shustudios.com I'm currently looking for a few beta testers.

Report

3yr ago

David Gregorian

@stefan_morris Is your CMS completely headless? For example like Contentful?

Report

3yr ago

Stefan Morris

@david_gregorian Yes, it is! It uses a REST API, but you can define the endpoints yourself in the CMS, as well as what data it should return. This gives you the best of both worlds between a REST API and a GraphQL API in my opinion.

Report

3yr ago

Scott K Wilder

I would like to scrap LinkedIn comments from a post. How can I do this?

Report

3yr ago

David Gregorian

@scott_k_wilder if you are skilled with javascript, try out puppeteer. It is a package which you can use with NodeJS. There are plenty tutorials for it :)

Report

3yr ago

Scott K Wilder

Thank you. Appreciate it. Will have to find someone to help with JS.

3yr ago

I've used fetch() (:

3yr ago

@itspablo so you fetched the raw HTML, right? Did you transform it somehow afterwards to be able to traverse the DOM?

Report

3yr ago

Metehan Çetinkaya

I scrape local web sites from various countries, I use Python as the programming language and Bsoup library which is really easy and https://scrape.do as the proxy gateway (Couldn’t get the job done without them since local web sites I scrape usually requires local residential IPs.

Report

3yr ago

David Gregorian

@metehan_cetinkaya Thanks for the hint with scrape.do! I'll definitely check it out for proxy rotations next time :)

Report

3yr ago

Tony paul

I've been working in web scraping for almost 10 years. The most demand we've seen is from the e-commerce industry in terms of the volume of the data scraped. The common use cases are price monitoring, competitive intelligence, reputation monitoring, etc. Another hot use case is extracting data from Linkedin. If I have to list the number of use-cases our data scraping supported - it will be more than 100 very different use cases across 20+ industries. Initially, we started with Python frameworks like scrapy and then built our own tools internally. I'm the founder of Datahut(https://datahut.co/), a data ( web scraped ) as a service provider.

Report

3yr ago

David Gregorian

@tonypaul_hb Sounds interesting. Are you still using Python or did you switch to another tech tack meanwhile?

Report

3yr ago

Michael Hood

Web scraping, a technique used to extract data from websites, has become an integral part of the data collection strategy for many companies. With the vast amount of information available online, companies are using web scraping to gather valuable insights, monitor competitors, and automate various processes. I was able to find ZenRows, it is a powerful tool that you can learn more info about and realize that it is an effective way to scale web scraping. It allows developers and companies to extract data from websites quickly and efficiently. With its intuitive interface and extensive features, ZenRows simplifies the web scraping process, allowing users to focus on utilizing the extracted data rather than worrying about the technical complexities of scraping.

Report

11mo ago