Depending on your perspective, you can take away any of the two points.
The first iteration of the project created a library from scratch, from the tests all the way to 100% test coverage. So even without the second iteration, it's still possible to create something new.
In an attempt to speed it up, I (with coding agent) rewrote it again based on html5ever's code structure. It's far from a clean port, because it's heavily optimized Rust code, that isn't possible to port to Python (Rust marcos). And it still depended on a lot of iteration and rerunning tests to get it anywhere.
I'm not pushing any agenda here, you're free to take what you want from it!
Thank you for the clarification, that was not entirely clear to me from the post.
You also mention that the current "optimised" version is "good enough" for every-day use (I use `bs4` for working with html), was the first iteration also usable in that way? Did you look at `html5ever` because the LLM hit a wall trying to speed it up?
It was usable! Yeah, the handler based architecture that I had built on was very dependent on object lookups and method calls, and my hunch was that I had hit a wall trying to optimize the speed. I was slower than html5lib still, so decided to go with another "code architecture" (html5ever) that was closer to the metal. Worked out in getting me ~60% faster than html5lib.
As for bs4, if you don't change the default, you get the stdlib html.parser, which doesn't implement html5. Only works for valid HTML.
Made possible in turn by giving safe haven for user content on the big social networks. Turned out to be a double edged sword.
When Rupert tried to lie about voting machines, he was fined couple of hundred mils. All the social networks mouthpiece accounts spouting nonsense suffer no repercussions whatsoever.
This is the old dichotomy: either you dont censor and are just a medium (like electricity) or you do censor some things and then you are responsible of what is published. Social media seems to want to censor while not being responsible.
Section 230 of the communications decency act explicitly gave these companies this power, on purpose. Unmoderated online spaces are mostly useful to scammers and spammers.
If somebody kept using the same phone line to trigger bombs, do you think that the phone company doesn't have an obligation to shut that line down? Let's say the police came to the phone company and said "we know that if you shut this phone line down, so and so wont be able to trigger the bomb they have planted in XYZ space." Do you think the phone company should do nothing?
What about a courier that knows it is delivering bombs? We should look past that too?
I think that when GP stated "All the social networks mouthpiece accounts spouting nonsense suffer no repercussions whatsoever." they were referring to the people lying and not the social networks them themselves.
these examples look ridiculous but you have to remember that people are used to chinese characters and can't easily recognize if a url written in latin characters is right or wrong. this is made even worse by the fact that even official websites are not always hosted on an official domain, and even when they are they use ridiculous hostnames, because again whoever is setting up the site just sees a sequence of letters that they are not closely familiar with.
There is that and there is the fact that 50-60 years ago China was coming out of a Cultural Revolution that had shut down the education system, and places like Shenzhen were fishing villages with dirt roads well within living memory.
It is not exactly surprising that in such a breakneck development pace that some people did not get up to speed at the same pace.
———
I will also say I think that China’s embrace of super apps and the quasi-app-internet is not helping with online literacy.
yes, i have TAOCP on the shelf and yes guilty of "one day i'll ready it".
but every now and then i open them up and just flip thru that magnificent typography.
Knuth has not just written up all these things, he has developed an entire typesetting system (complete with fonts) to bring technical publishing screaming and kicking into the 20th century (when other software thought kerning and hypenation were creatures from space). it's the only program deserving a version number approximating PI.
CF Workers (that runs WebAssembly) are all over place. They may not run the main logic (not the actual Ngix, or DNSEC code) but they are used for several maintaince tasks.
not a heavy :! user, what i used there worked, but afaik neovim recommends :te . that's one of the bigger differences. neovim was very proud to have a fully integrated terminal
"You either die a hero or you live long enough to see yourself become the villain."
People change all the time, and things need to be reevaluated from time to time.
So another skill is to disengage with our heroes when the values start misalign.
reply