Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is the other option to use Parsoid.

https://github.com/wikimedia/parsoid

That is MediaWiki's official off wiki parser that can turn wikitext into HTML or HTML back into wikitext. It would be reasonably simple to hook into its API and use it for data extraction instead.



Is converting Wikitext to HTML/RDFa really going to help with this task? I'd say it's actually clearer how to get the data out of the original Wikitext.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: