Great idea to offer image downloads and filtering with GPT!
I built a similar tool last year that doesn't have those features:
https://url2text.com/
Apologies if the UI is slow - you can see some example output on the homepage.
The API it's built on is Urlbox's website screenshot API which performs far better when used directly. You can request markdown along with JS rendered HTML, metadata and screenshot all in one go:
https://urlbox.com/extracting-text
You can even have it all saved directly to your S3-compatible storage:
https://urlbox.com/s3
I've been running over 1 million renders per month using Urlbox's markdown feature for a side project. It's so much better using markdown like this for embeddings and in prompts.
If you want to scrape whole websites like this you might also want to checkout this new tool by dctanner:
https://usescraper.com/
Looks nice, but url2text doesn't seem to have an API, and urlbox doesn't seem to have an option to skip the screenshot if you only want the text. And for just the text, it looks to be really expensive.
Sorry the pricing isn't a good fit for you. Urlbox has been running for over 11 years. We're bootstrapped and profitable with a team of 3 (plus a few contractors). We're priced to be sustainable so our customers can depend on us in the long term. We automatically give volume discounts as your usage grows.
I built a similar tool last year that doesn't have those features: https://url2text.com/
Apologies if the UI is slow - you can see some example output on the homepage.
The API it's built on is Urlbox's website screenshot API which performs far better when used directly. You can request markdown along with JS rendered HTML, metadata and screenshot all in one go: https://urlbox.com/extracting-text
You can even have it all saved directly to your S3-compatible storage: https://urlbox.com/s3
And/or delivered by webhook: https://urlbox.com/webhooks
I've been running over 1 million renders per month using Urlbox's markdown feature for a side project. It's so much better using markdown like this for embeddings and in prompts.
If you want to scrape whole websites like this you might also want to checkout this new tool by dctanner: https://usescraper.com/