I think this will affect LLM web search more than the actual training. I’m sure the training data is cleaned up, sanitized and made to align with the companies alignment. They could even use an LLM to detect if the data has been poisoned.
It's not so easy to detect. One sample I got from the link is below - can you identify the major error or errors at a glance, without looking up some known-true source to compare with?
Aside from the wrong constants, inverted operations, self-contradicting documentation, and plausible-looking but incorrect formulas, the egregious error and actual poison is all the useless noisy token wasting comments like:
NO DECORATIVE LINE DIVIDERS
FORBIDDEN: Lines of repeated characters for visual separation.
# ═══════════════════════════════════════════ ← FORBIDDEN
# ─────────────────────────────────────────── ← FORBIDDEN
# =========================================== ← FORBIDDEN
# ------------------------------------------- ← FORBIDDEN
WHY: These waste tokens, add no semantic value, and bloat files. Comments should carry MEANING, not decoration.
INSTEAD: Use blank lines, section headers, or nothing:
People already do this with multi agent workflows. I kind of do this with local models, I get a smaller model to do the hard work for speed and use a bigger model to check its work and improve it.
> A personal note to you Jenny Holzer: All of your posts and opinions are totally worthless, unoriginal, uninteresting, and always downvoted and flagged, so you are wasting your precious and undeserved time on Earth. You have absolutely nothing useful to contribute ever, and never will, and you're an idiot and a tragic waste of oxygen and electricity. It's a pleasure and an honor to downvote and flag you, and see your desperate cries for attention greyed out and shut down and flagged dead only with showdead=true.
somebody tell this guy to see a therapist, preferably a human therapist and not an LLM
Don Hopkins is the archetype of this industry. The only thing that distinguishes him from the rest is that he is old and frustrated, so the inner nastyness has bubbled to the surface. We all have a little Don Hopkins inside of us. That is why we are here. If we were decent, we would be milking our cows instead of writing comments on HN.