Hot damn, fed it part of an unpublished blog post I wrote, and it got me immediately.
I'm not famous or anything. I've written some academic papers and had a couple blog posts trend on HN, which are surely in the training set.
It was able to identify me based on my style (at least according to its explanation). The way I approached the topic and some of the notation I used point to a particular academic lineage, and the general style reflected my previous blog posts.
That said, I gave it part of an (unpublished) personal essay, and it had no idea. But I have no writing in that style that's published, so it makes sense. Still impressed.
It's not at all comparable. ADS-B is opt-in: you place an ADS-B out transmitter in your plane and turn it on to transmit your position [1]. You're perfectly free to turn your ADS-B off if you like. It's not even required for most of the US, so some planes may not have it installed (ADS-B out is only required within 30 nautical miles of a Class B airport [big airport], inside Class C airspace [medium-size airport], or in Class A airspace [18,000 ft above sea level]).
Military aircraft don't use ADS-B out a lot of the time. Spy planes are obviously not going to transmit their locations. A civilian plane with an electrical failure might stop transmitting ADS-B out. Being able to identify planes via satellite is a whole separate capability.
[1] In all the planes I've flown, ADS-B is configured to transmit whenever the master electrical switch is turned on, but it can be configured to be turned on and off at will. See this video on a mid-air collision involving an eccentric character who liked to fly with ADS-B out turned off: https://youtu.be/G5y3JiOEnVs?si=rs5gNMurZ9ssUloS. If I recall correctly, he had his ADS-B wired to his nav lights so he could turn it on and off at will.
From my experience in general aviation, I've never met anyone who intentionally turned off their ADSB. It is generally wired into the transponder, and the bulk of air traffic worldwide happens where a transponder is practically required. Yes, it is technically not needed but you can't get help from ATC, can't fly instruments in the clouds, and you can't fly high.
Sure, it can happen but these are edge cases. Space-based ADSB solves this problem with a fraction of the effort and much better data. Spooks might need this for military stuff, but for the bulk of planes, it doesn't make sense.
You are NOT at all (legally) free to arbitrarily turn off your ADSB on an aircraft equipped with it. 91.225(f) [1].
> Except as prohibited in [unmanned aircraft section], each person operating an aircraft equipped with ADS-B Out must operate this equipment in the transmit mode at all times unless [authorized by FAA or ATC].
A common way to add ADSB to an aircraft not originally equipped is replacing one of the lights with a uAvionics skyBeacon[2], which has a LED light + ADSB-out transmitter. So the nav light switch would control it, but you'd also now be required to have them on at all times.
ADS-B is mandatory in many jurisdictions and for all commercial flights basically everywhere. Obviously like any transponder, you can pull the breaker, but turning it off is likely to beer the end of you're commercial piloting career.
That video is great, thanks. Though the ML solution doesn't seem to claim to be able to identify individual aircraft, just do daily counts of aircraft at specific airfields. Which I guess works for military aircraft, but I guess if you're a nation state actor you've already got this sort of technology/intelligence?
There are a number of issues with these studies, one being that the way the sock puppet bots interact with content is not exactly organic. Typically they search for content in a conditioning phase, followed by random scrolling during which the recommended videos are collected and classified by an LLM. Modern recommendation algorithms famously work by examining how long and how users engage with content, and there's none of that going on here. Still, the methodology itself and the use of LLMs to classify content is clever and probably about the best we can get.
Also, even if there _is_ a bias, it doesn't tell us why. Are the recommendations intentionally spiked, or is this simply the recommendation strategy that maximizes profit? (Or that the recommendation model thinks will maximize profit?) It's very difficult to tell, which is part of what makes these models dangerous and also part of what makes them difficult to regulate.
On a sidenote, TikTok (and presumably other content platforms) _really_ does not like these studies, as demonstrated by them nerfing search functionality after the second study above was released to prevent researchers using these techniques in the future. I haven't read the study in detail yet, but it will be interesting to see how the team at NYU Abu Dhabi adapted their methodology.
While I am skeptical of what reasonable conclusions can be drawn from a study like this, they explain the methodology in the article. You said:
>Typically [...] followed by random scrolling during which the recommended videos are collected and classified [...] Modern recommendation algorithms famously work by examining how long and how users engage with content, and there's none of that going on here.
But they claim that videos are watched, not just collected from the recommendation page.
"The accounts watched 10 videos, followed by a one-hour pause, and repeated this process for six days"
Perhaps I should have been more clear. It's TikTok, so of course the only way to collect recommendations is to watch videos. Some studies watch the whole video, some just watch part of it, but it's TikTok, so fundamentally you're watching a video.
I might just not be reading it properly. I've never used TikTok, I assumed by your description that they scraped video titles/transcripts/etc. from the recommendation page without any engagement on the video. (I suppose I should read the study you linked!)
When you say "how users engage with content, and there's none of that going on here", by "none of that", do you just mean likes/comments, that sort of thing?
I would usually consider watching as engaging with content, but if you mean additional engagement (as I would call it, anyways), that would make a lot more sense to me.
I think the key metric missing here is how long the user watches each video. Likes and replies are probably helpful too, but when I've used short-form video content apps like TikTok, Reels, and YouTube Shorts before, they've gotten a pretty good measure of me without me ever liking, replying, or following.
With the current methodology, the bot either watches the whole video, a fixed duration of it, or a random duration before swiping. The bot doesn't organically watch or swipe based on its interests like a human user would.
I think this is a valid point and a really interesting question. If that's the standard, we need to regulate all recommendation algorithms. (i.e. put limits on Twitter, Instagram, and YouTube as well.)
How could we regulate this? I can think of two ways:
- Results-based enforcement. i.e., companies are free to use whatever recommendation model they like, but have to recommend content within ideological bounds. i.e., you can't bias toward one partisanship more than X%. There's some precedent for this with the equal-time rule (https://en.wikipedia.org/wiki/Equal-time_rule) and FCC fairness doctrine (https://en.wikipedia.org/wiki/Fairness_doctrine).
- Algorithm-based enforcement. i.e., there are limits on the algorithm itself. Perhaps you have to present your algorithm to a government agency and provide a proof that it obeys certain properties. But the enforcement here is analytical rather than empirical.
IMO the interesting question is not whether an individual platform is biased and what its biases are, but rather how we might regulate recommendations given that there is always a risk of bias.
The problem is that the algorithms are programed to show people what they want to see OR what the platform want's you to see.
If it's orgasmic then this is no different than any other form of organic popularity. Seeing as Trump won the popular vote and the electoral collage, people were interested in republican content. On the same token it's very easy to AstroTurf and claim it was organic. From my personal conversations in meat-space I lean organic.
Is there funny business going on? Absolutely, all the time everywhere in every way. Can we say this was funny business? Not without the code.
TLDR: Popularity algorithms push popular content algorithmically.
Yeah, I'd say this is exactly what's going on at big tech companies. Add to that the need for an interview process that is very standardized for (1) scalability in a large company and (2) legal protection against discrimination (at least nominally), and it makes a load of sense.
Yes! My perennial problem is that most Chinese TV content has excellent subtitles, but they're burned in to the videos. So if I want to watch a Chinese show with a friend who doesn't read Chinese, there's no option even for auto-translated subtitles. I've often thought of writing a script to generate subtitle files using text recognition, but haven't gotten around to it.
Ah, but one thing I miss about the burned-in subtitles was how, on VHS at least, one could fast forward like 2-3x and still be able to read the captions. I double-lazied my way out of more than one high school book review where a boring book was made into a boring movie.
There's also the problem of content which doesn't have subtitles to begin with---YouTube's automatic subtitle generation is great, but could be improved and expanded to languages other than English.
It is expanded, at least to Japanese. What's also fun is you can then have it auto translate the auto generated subs. The results are... not typically great, but the potential!
A long time ago, I tried the "autosub" feature with anime and it replaced understanding with unintentional (dry) humour. The output had phrases that resembled a news transcript, which is probably what they were training the system on.
Becoming more and more surprised by intentionally misleading headlines these days....
Shocker---you might be required to be vaccinated in order to live in a facility for ultra-sick kids, many of whom are immunocompromised. I guess that's an "insult to the Canadian anthem" somehow.
Pretty interesting! I tried this both English -> French and French -> English.
English -> French seemed to work best, with the AI output have a very similar timbre to my real voice. Not hyperrealistic for me, but decent enough given I gave it a ~20s sample.
French -> English was less good in terms of the timbre and pitch of the voice---way higher than my real voice. It did have a bit of a Canadian accent, though, which is funny because I speak French with a Quebec accent. Maybe that's what I would sound like if I had a Canadian accent in English?
Funnily, I (native American English speaker who learned French in QC, and whose accent in French indicates this) tried it both ways. I think the accent is basically built in both ways, which makes sense, although it would be more interesting if it based your accent in the output off the phonology in the input.
Unfortunately probably none, unless you live pretty far north. We're pretty close to the speed of light limit* when it comes to latency between the US and Asia, which makes sense as a straight line is over the Pacific anyway. (e.g., I get ~1.1x the theoretical min latency pinging Tokyo from my Fiber connection in the SF Bay Area)
Where it seems like we're pretty far from the theoretical min is connections within the continental US. Latency is pretty bad (e.g. I get like 1.5x - 4x the theoretical min latency from my Fiber connection in the SF Bay Area, depending on endpoint). I assume part of this is indirect connections (you don't have a direct fiber connection between every pair of cities, because that would be dumb) and some of it is routing overhead (a connection to Asia goes a long distance, but often has way fewer hops).
Note that this is theoretical min latency based on the speed of light through fiber*, which is a bit higher than (about 5/3x) the speed of light through air. New fiber optic tech might help this at some point.
Your intuition is correct, undersea cables are close to direct connections while overland routes are interconnects so you'll have tons of hops. What we have for overland gets the job done so it's hard to justify the massive cost of installation. Plus there's likely competing business interests at play.
I'm extremely excited about this project. If the product reviews end up as positive as the pre-production reviews, my next laptop is definitely going to be a Framework Laptop.
I'm not famous or anything. I've written some academic papers and had a couple blog posts trend on HN, which are surely in the training set.
It was able to identify me based on my style (at least according to its explanation). The way I approached the topic and some of the notation I used point to a particular academic lineage, and the general style reflected my previous blog posts.
That said, I gave it part of an (unpublished) personal essay, and it had no idea. But I have no writing in that style that's published, so it makes sense. Still impressed.
reply