Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, kudos to Scribd for perhaps the maximum LOC codebase for "how to make plain text completely fucking unusable".

There's real irony there in it being an article about delivering value to society through making useful technology.



OK, I was able to clean it up into plain text and post it to Pastebin: http://pastebin.com/vcEe9KWP

Used the $$ selector in Chrome to find Scribd's .ff0 elements, copied them into SublimeText, did some regex work to clean out the tags (a span element for every line? with absolute positioning? Really Scribd?), and then exploited the fact that paragraphs got jammed together with no whitespace between the last punctuation and the beginning of the next paragraph to use a regex to auto-break the paragraphs.

Not pretty, but should be readable.


I wish somebody had created some kind of universal markup language for text, that would let us easily share it...


There must be a framework for that ...


Need a new standard...


Your work is awesome. Thank you so much. Here, I pasted it into a gist so that it's not in a typewriter font. I thought it would also fix the lines being 25 words long but it didn't, but it's still an improvement: https://gist.github.com/anonymous/3ea317a1f71bbfeca6df5d8469...


Some prettiness to go with what you already accomplished: https://cdn.rawgit.com/JeffreyBPetersen/5a3198c940ccca39ff63...

Unfortunately the transcript is missing past a point. I haven't checked to see if that's the case in the original document.

edit - Looks like I should've refreshed the thread first: https://news.ycombinator.com/item?id=12514873


Oh, so much better. Thank you jesus!

In other communities, to circumvent asinine ploys at lock-in, someone would just brute force the pdf file with a burner facebook login, and repost it. But then, you'd still be stuck with a pdf.


Thanks for this. Plain text is so much easier!


Thank you very much.


...and copy-pasting it into a text editor gives you an unintelligible wall of text, and copy-pasting it into LibreOffice causes it to hang.

Ha, that's actually kind of impressive, in a really perverse sort of way.

I'm so glad Scribd is helping with the effort to carve the open web into a series of uncooperative fiefdoms.


Scribd is one of the most garbage technologies I have ever seen. I have no idea what they do other than fuck up what should be plain text.


Especially ironic considering sama/YC funded Scribd. Really shows the quality of their work!


LOC?


lines of code




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: