Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can use Microlink to turn PDF into HTML, and combine it with other service for processing the text data.

Here an example turning a arxiv paper into real text:

https://api.microlink.io/?data.html.selector=html&embed=html...

It looks like PDF, but it you open devtools you can see it's actually a very precise HTML representation.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: