you could just render it as if it were being shown to a person. Thats where the fuzzy logic of AI would come it. It could be trained to identify what an article looks like, based on the layout and size and other inferences (that it figures out, hence AI).
For a while now, I've thought of the idea of outsourcing this to a 3rd world sweatshop. Basically pay people to click on the scummiest ad-loaded pages all day, saving copies of just the content, and re-hosting it as, say, web 1.0 content, text and pictures, nothing more. Whether they used copy-paste, or just "save page" and then pipe it through another program, who cares. just extract the content and host it as web 1.0 that would load super fast but maybe have the same font or formatting as the original.
For a while now, I've thought of the idea of outsourcing this to a 3rd world sweatshop. Basically pay people to click on the scummiest ad-loaded pages all day, saving copies of just the content, and re-hosting it as, say, web 1.0 content, text and pictures, nothing more. Whether they used copy-paste, or just "save page" and then pipe it through another program, who cares. just extract the content and host it as web 1.0 that would load super fast but maybe have the same font or formatting as the original.