From how I understand it, site data like the URL or even the content of the webs...

From how I understand it, site data like the URL or even the content of the website is fed into an ML model which outputs a cohort value. That cohort value might be changed by every single individual website you visit, or by content on that website. So it's possible that some sensitive data you'd rather keep private lands in the hands of Google. It's probably made even worse as websites can change their content and thus influence the ML model, tickling out data that would otherwise not have been available.

> The browser uses machine learning algorithms to develop a cohort based on the sites that an individual visits. The algorithms might be based on the URLs of the visited sites, on the content of those pages, or other factors.

https://github.com/WICG/floc