I mean you can probably do something clever with this like rainbow tables on fingerprints or something like that which is more probabilistic so you never store individual fingerprints. Would be interesting to know what the solution is.
Sure but any probabilistic approach is either relatively inaccurate reduces it's usefulness for this use case, or accurate which raises the same identifiability concerns cookies would introduce. I guess my point was; this has been thought about already. (Pseudo) anonymized attribution is a bit of a solved problem and you can do it with or without cookies. That's mostly a implementation detail rather than a distinguishing feature.
> (Pseudo) anonymized attribution is a bit of a solved problem and you can do it with or without cookies.
How is it typically done without cookies then?
> any probabilistic approach is either relatively inaccurate reduces it's usefulness for this use case, or accurate which raises the same identifiability concerns cookies would introduce
How so? Even if it's accurate you wouldn't be storing anything the information (random id or fingerprint) for the individual user, so you would only be able to answer with reasonable certainty whether you saw the user before or not. You can't identify anyone from that (other than identify them as a new vs returning user) so there is no identifiability concern, unless of course one thinks that constitutes a concern in itself which I don't think the GDPR does.
Maybe we're disconnecting. Cookies are just a standardised way to communicate a small key/value set between client/browser and server through HTTP headers. It's not inherently (in)secure, sensitive, etc. There are zero things you can do with cookies that you cannot do without and there are no inherent differences in security, they're just very convenient if you're in HTTP world.
And yes what you said is exactly right; you're allowed to fingerprint a unique user and track data with that fingerprint as the sole unique identifier without any PII legislation (GDPR, CCPA, etc.) compliance issues. You just cannot store any information that allows linking PII data to that fingerprint in either direction. In other words, attribution to a random UUID that just happens to represent an anonymous user is not an issue.
Circling back to the original comment; there is no (good) argument against cookies if you're basically doing exactly what cookies are doing. Umami using it as a USP is, at best, a little odd.
> you're allowed to fingerprint a unique user and track data with that fingerprint as the sole unique identifier without any PII legislation (GDPR, CCPA, etc.) compliance issues.
I don't think this is correct, or at the least it's unfortunately phrased. If your fingerprint is so specific that it can distinguish unique users, it is covered under GDPR compliance. I don't know too much about the CCPA so not sure if it's the same there.
Yes, you are allowed to collect device statistics such as form factor, viewport size etc. But if you can distinguish between two different users with identical devices accessing your site at the same time, under GDPR you have an obligation to inform [14]. And if you can recognize a returning user across sessions, you also need consent.
If the random user ID is truly anonymous (so, cannot be linked back to an identifiable person even with other data you have), it is not personal data under GDPR and no obligation to inform or consent is needed. If the data processor stores any information that makes PII attribution possible then, and only then, does it fall under GDPR, CCPA, etc. That random ID being persisted on the device allowing for subsequent attribution is still not PII sensitive unless/until the aforementioned identifiability barrier is breached. This is exactly why prominent analytics platforms (Plausible, Matoma, Mixpanel if configured correctly, etc) all offer data hygiene barriers.
I suspect what's happening here is that the word "user" is making things ambiguous here. It was meant in the context of attributable session, not as the data subject as per GDPR language for example.
I don't know about Umami, Plausible describes how they solved this here: https://plausible.io/data-policy, under the section "How we count unique users without cookies"
TL;DR: They derive an identifier from IP address and User Agent using an hash, allowing them to have a tracking identifier without storing Personal identifiers (the IP address)
They salt the values and compute id = hash(daily_salt + IP + UA). Then they remove those every 24 hours. I think it sounds like a perfectly reasonable solution.
If they remove those every 24 hours, then doesn’t that mean if I made two visits, separated by more than 24 hrs, it would count as 2 unique visits rather than 1?
Yes. I am still not aware how to track returning visitors if while still staying within some privacy framework. Unfortunately on HN the default answer is to say not to track at all.
I was under the impression that this is the exact kind of thing that violates the GDPR. That is.. processing an identifier (IP address) to do something more (track user actions across multiple requests) than what is required (route traffic to the server).