My new TV wouldn't work unless I agreed to it recoding and uploading those recordings to it's servers which may be temporary stored while they are transcribing the audio to text for more permanent storage.
My TV is forcing me into a employment agreement where I generated data to train their models or otherwise 'improve service'.
Data is so valuable companies are risking a huge PR backlash. Data collection is the business model and I assume the same ethos will make its away into open source.
I believe 2021 was the tipping point where most text content is now AI generated, so to avoid training your LLM with other LLM output they restrict the date to 2021.
I suppose because there is a theory that a native application can provide a better video experience than webrtc. I've no idea whether that is true for zoom.
This is a very old project glad to see it get the credit it needs.
One interesting thing is if you show someone this for a random sentence and ask them want do they think. They always say how did it copy my writing. Everyone thinks it looks like their own writing. Likely because its the average of all peoples writing. Try it out
Another reason these diagrams should be expressed in some sort of code is LLM currently are not well versed in architecture because its missing from the training set.
That is correct. Still, it's enough to establish that you(linked with your driving licence/passport) have visited pornhub 37 times a day on the 12th of December. They dont' store exactly what you were doing, but obviously it's not like the public cares about that. And pornhub is a very mild example, let's say a website you visited pulled some resource from 4chan(website known to harbour terrorists and pedophiles! /s) or somewhere that sounds like it's a terrorist organisation. Or even hacker news(are you a hacker? that's illegal you know). You have no control over your DNS lookups(by default) so you don't actually know what gets written in those logs, nor have any way to inspect them.
And specifically because the logs are so crap and don't actually contain any information beyond the domain name, they can be used to infer pretty much anything the prosecutors might want to see. Otherwise, why even keep them?
Use a LLM to generate comments maybe finetuned on HN
When real people start showing up they will see comments and assume its not a wasteland. As real people come reduce the bot comments.