We [0] index YouTube actively and see way over 5.5B videos [1] at this point. We catch a lot of unlisted videos and we did try to figure out how is that possible in the past.
It seems that a lot of users will upload video which is by default published with the default settings and thus is visible from the outside. Even if they change the settings fairly quickly, automated systems like ours will already know about the existence of that video.
There could be other reasons but this seems the most likely, especially as a video that is being uploaded can be published fairly swiftly.
It sounds like you are aware that you are scraping videos that are later re-labeled as "unlisted", but you don't mention what you do to mitigate this problem.
Even if it may not be illegal, at the very least it would seem un-ethical to link to private videos like this, and it would seem trivial for you to "re-scrape" your database every now and then to check whether any existing videos have changed from listed -> unlisted, and if they have, remove them.
This logic would require them to re-scrape every video forever, which is unreasonable.
I think a better approach for everyone involved would be to only store references to videos which were posted more than x minutes ago. I'm not sure if they have that information when scraping though.
>It seems that a lot of users will upload video which is by default published [and then they change it to private] //
So to avoid that sort of unexpected public-ing (ie publishing) only one extra scrape would be needed. Or, if they knew the period over which the setting was normally changed then they could just delay the scrape until most would have already been changed.
I imagine though, in part, the 'fun' is catching inadvertent publication and morality is no t considered.
It actually has nothing to do with "fun". As I mentioned in my other comment, we don't expose our database publicly and nobody but us can see that a video is unlisted.
It would beat the purpose of our service would we delay our identification, and it would actually require some significant engineering efforts in order to introduce such capabilities into our system with significant economical impact on our business.
Has "unlisted" ever been known to mean "private"? I never assumed it was - rather it was just a video that would not appear in searches or recommendations on YouTube.
It seems that a lot of users will upload video which is by default published with the default settings and thus is visible from the outside. Even if they change the settings fairly quickly, automated systems like ours will already know about the existence of that video.
There could be other reasons but this seems the most likely, especially as a video that is being uploaded can be published fairly swiftly.
[0] https://pex.com
[1] https://blog.pex.com/what-content-dominates-on-youtube-39081...