Golden Hill Software
p/golden-hill-software
Developer of Unread and Feed Hawk
John Brayton
Webpage Text API — Get the HTML content of a webpage without the junk.
Featured
6
The Webpage Text API is a cloud service that lets you easily retrieve the HTML for the content of a webpage without the junk (chrome, navigation, ads, and scripts) that tends to clutter modern webpages.
Replies
Aravs
This is great. Thanks for building this. I'd love to see free tier to try it out and a pay as you go pricing model. The pricing seems to be high.
John Brayton
@aravs7 Thank you. I am happy to give trial codes for folks to start developing against it, and to hold off charging until a customer starts using it in a production environment.
Aravs
@john_brayton That's great, How do I get the trial code?
John Brayton
@aravs7 Contact me at sales@goldenhillsoftware.com. I will need to know which price plan you want to try and the product/service name you are working on. (This can be changed later.)
John Brayton
The Webpage Text API has been powering the webpage text feature of Unread, my RSS reader, since February 2020. It is perfect for RSS readers, read later services, browser extensions, newsbots, and other applications where the user wants the content of the webpage without the cruft. I started developing the Webpage Text API for Unread in 2018, before Mercury Parser went open source. At the time Unread had webpage text retrieval capabilities powered by Readability.js. That worked well, but I needed the ability to cache webpage text and associated images ahead of time. It was impractical to generate webpage text for thousands of articles at a time on-device, so I researched server-based options. At that time Mercury Reader provided an API and generously made it available for free. However their terms of service would not allow Unread to aggressively cache webpage text for articles ahead of time. The Mercury Parser source code had not yet been made public. I looked into commercial options, but none fit my needs. So I started writing my own server-based system. I started by incorporating the heuristics used by Readability.js. I then added test cases from hundreds of different websites to improve the webpage text quality. After Mercury Parser went open source, I evaluated whether it would be more suitable for generating webpage text for Unread. I discovered that I got higher quality results from my own Webpage Text API than I would from Mercury Parser. This inspired me to continue improving the Webpage Text API, and to now offer it as a commercial product.
John Alex
how can i embed this feature in my website. my website Address: https://speakingbusiness.co.uk