It does not really matter anymore how bulk usage data collection is called or whether it is "privacy-preserving".
Looking at the current developments in AI, I am concerned that AI models can easily de-anonymize and guess end point users when being fed with "telemetry data" of hundreds of thousands clients.
I hear a lot and read a lot of software and hardware vendors saying that "telemetry" is supposed to somehow magically improve the user experience in the long run, but in actuality software tends to get worse, unstable and less useful.
So, I would like to know how exactly any telemetry data from Fedora Linux clients is going to help them, or how is it going to improve anything.
I disagree wholeheartedly. Privacy-preserving technologies like including privacy-preserving AI (e.g., federated learning, homomorphic encryption) and privacy-preserving data linkage/fusion are really important. They're crucial in my day-to-day work in aviation safety, for example.
And telemetry is important. We have limited resources. How do we determine the number of users impacted by a bug or security vulnerability? Do we have a bug in our updater or localization? Are we maintaining code paths that aren't actually used? Telemetry doesn't magically improve user experience, but I'd rather make decisions based on real data rather than based on the squeakiest wheel in the bug tracker.
We can certainly make flawed decisions based on data, but I'd argue that we're more likely to make flawed decisions with no data.
> We can certainly make flawed decisions based on data, but I'd argue that we're more likely to make flawed decisions with no data.
What I've seen in practice so far is that the use of telemetry has harmed software quality more than helped. It often leads developers to optimize for the wrong things and make poor design decisions. This happens because they tend to think that "the data never lies", ignoring the fact that telemetry always gives a skewed and incomplete picture.
Do you have some specific examples of this playing out?
I’ve been a product manager for products that had no telemetry, and that can be a rather undesirable place to operate, especially if you’re in the enterprise space where product changes can impact the operations of businesses.
I think it’s certainly possible to focus on the wrong things, but I don’t see that as an outcome of telemetry itself as much as an outcome of a product team that doesn’t understand the problem space or customer base.
The attributes to capture are presumably based on what teams understand to be key indicators about their app/service. I think confident incorrectness armed with bad data is just a slightly different version of a complete lack of data. Such a team was operating on whatever they imagined to be important before, and they continue to do so after, albeit with greater conviction.
But good telemetry in the hands of a good product team can be immensely beneficial for decision making and can protect customers from bad decisions. Anecdotally, my ability to pull numbers about certain attributes has been key to my ability to shut down executive pressure to make changes that would have drastically impacted customers if not for the direct evidence that it would.
I’m also not claiming that downsides don’t exist, and privacy is always my primary concern, but there are a range of outcomes based on the maturity of a team/company, and as long as the PM understands that data is not an alternative to having a relationship with customers, I think data is pretty important.
> Do you have some specific examples of this playing out?
Honestly, I can't actually remember specific examples. It's not something I dwell on. But I know that's it's happened several times that software has been made much less useful to me because features have been removed on the basis of being rarely used, ignoring the fact that even though they're rarely needed, when they are needed, they're indispensible.
More often, though, the bad telemetry-based decisions I've seen are around UI changes. Things like a laser-focus on reducing the number of clicks it takes to perform things, even though sometimes reducing the number of clicks for a thing adversely impacts the usability of it.
For bad telemetry-based UI decisions, my standout example if Firefox, although that's hardly the only one.
> But good telemetry in the hands of a good product team can be immensely beneficial for decision making and can protect customers from bad decisions.
This was actually my point of view a few years back, when telemetry started to become popular. And, as a dev who sells software commercially, I totally understand the value on that side. My experience with products that have used it, though, has shifted my view.
All that said, I do agree that it's possible to use telemetry in a way that is good for users. But I don't think it's common, and I think the reason for that is economics and human nature.
Once you start measuring a thing, that tends to become a goal rather than just a data point. And since the industry is all about maximizing velocity, that effect is even stronger. Doing proper usability studies is a slow and expensive process. Telemetry can be a useful thing as part of that process, but the tendency is to make it pretty much the entire process. That does a disservice to everybody.
> the PM understands that data is not an alternative to having a relationship with customers, I think data is pretty important.
Not just the PM. The entire company. But a relationship should be consensual, not forced. I have zero issues with opt-in telemetry. When it's not opt-in, though, it's an invasion and adversarial. I presume that's not the sort of relationship a good PM wants.
I think this is a good view and explanation. At the end of the day though, the only thing that matters is what you pointed out:
> But a relationship should be consensual, not forced. I have zero issues with opt-in telemetry. When it's not opt-in, though, it's an invasion and adversarial.
That's it. No matter how many ways you dice it - collecting data without consent or forcing opt-out is an invasion. As more and more of our lives shift to being online, our privacy and our sense of autonomy in a digital world is ever increasingly paramount.
Ideally, yes. In practice, and especially in larger shops, it’s the PM’s job to own this relationship and to make sure the important players have this understanding.
> But a relationship should be consensual, not forced.
So, you use telemetry to figure out why planes are [nearly] crashing?
Do you work for Boeing or something?
When I've worked on mission critical (so, safety critical, in practice), we made sure the probability of catching a failure in testing was 100x the chance of catching it in production.
Modern software development techniques like fault injection and fuzzing make this pretty easy to achieve.
Close enough. I work for for MITRE and the FAA leading our efforts to identify aviation safety hazards and also improve aeromedical certification, so I do work closely with the airlines, OEMs, unions, trade orgs, and other stakeholders.
We use de-identified voluntary safety reports filed by pilots, air traffic controllers, and others, along with flight telemetry data from the aircraft and other data to identify and study potential safety issues in the national airspace. Privacy-preserving techniques ensure that we can collaborate on safety and trust that the data stays non-attributional (and thus, non-punitive since participation is voluntary) despite competing interests.
We can't really do fault injection or fuzzing for real-world systems to understand, say, the impact of false low altitude alerts on risk of undesired aircraft states (e.g., controlled flight into terrain) at a certain airport.
I get your point, I really do. And it got me thinking. I suppose, you're right concerning bugs and safety aspects. But, usage pattern collection is a huge problem, not only privacy-wise. This is what I do not understand:
Why are there code paths nobody uses, that have to be maintained, in the first place?
Unused code paths were once used or it was thought that they would be used. If you don't know if something is used, it's safest to assume it is being used or is there for some reason (Chesterton's Fence). If it's being used, you need to maintain it lest there be a regression.
For example, let's say we implement support for a web standard or vendor prefix. We can mark it as deprecated.
But how do we know the code path is no longer in use? Are people still using this CSS property (e.g., a vendor prefix)? Are people still using gopher or this one configuration variable? The more configuration options you have, the more combinations you need to test and maintain.
I'm working with monitoring systems and the things I keep hearing is that people rely on having a 'number' but when pressed admit that the number isn't reliable.
Data doesn't always help! It can lead to assumptions, often really bad ones. And IBM isn't to be trusted with it.
If you read the comment thread for that proposal you'll see that the person writing the proposal doesn't care about such niceties and has a pretty loose definition of 'privacy preserving'.
> Do we have a bug in our updater or localization?
It can be done with error reporters like the "System program problem detected. Do you want to report the problem now?" popup in Ubuntu. In my experience, many users are willing to send error reports, and they're extremely useful, although 90% of reports are garbage.
You really don't need AI to do this. Collect enough data points and you can fingerprint basically anyone using very old fashioned techniques. AI doesn't really bring anything new to the table.
What AI can sometimes add is automatic feature extraction. Rather than having a to have a human explicitly think "I bet we can identify people via $x" and writing bespoke code to do it. E.g. this has been the big deal with AI in medical diagnostics, that it can look at the same data a very skilled human doctor sees, and still manage to discover something that the human couldn't, due to unnoticed features
AI would need far less data points to achieve an acceptable result, and maybe even be able of fingerpointing, not just fingerprinting. Just wait and see how the data broker bros start selling AI assisted data mining for ads. Big tech is doing that already. It's mind boggling how everyone seems to be just OK with that.
You're not wrong. While they do not have yet a list of metrics they'd like to collect (from their initial mailinglist post [1]), it's stated as an idea in there
> We also want to know how frequently panels in gnome-control-center are visited to determine which panels could be consolidated or removed, because there are other settings we want to add, but our usability research indicates that the current high quantity of settings panels already makes it difficult for users
to find commonly-used settings.
Personally I'd like to see more transparency in their usability research, because GNOME is best know for removing features, which is what they'd like to do this time around as well.
I agree that the usability research should be transparent and data-driven. As a statistician, I'd rather have something I can actually critique instead of "we just didn't think it was user-friendly."
Because they're very much the victims of web designers with the same mindset, and if anyone should be the ones to recognize how unfair throwing minority user groups under the bus can be.
It also doesn't really make strategic sense to focus on the lowest common denominator. Chrome already has that group. The one place they could eke out a loyal userbase is specifically the users that Chrome fails to capture because they have unusual needs or requirements.
Actually when I think about it, it's even worse than this. Firefox has been on the receiving end of this type of discrimination more or less for as long as it's existed. It was the state of affairs when IE was the challenger too.
You have to be just mindbogglingly oblivious to not see how this has been one of their biggest problems the last 20 years.
They spent some of that budget on getting a freaking sneaker designer to make time-limited theme colo-- sorry, time-limited colorways and did a bunch of popups and cringe copy on it.
Meanwhile, Brave's vertical tabs were done by a single developer.
In addition, this kind of thinking will attrit some of those 5% of users, so repeated application will shrink their market share even further. it's the excel problem: a huge number of features will only be used by a minority of users, but a majority of users use at least one of those features.
Did you read my comment? If we had to remove one of two features and one of them was used by 5% of users and one of them was used by 95% of users wouldn't you like to know which one is which?
Without data you can invest your time into the wrong features.
Good, unfortunately. Developer time is a zero-sum game. They're not saying "ah we can just kick back and do less work then", they're saying "how do we allocate our limited resources to maximise benefit to users?". And when it's voluntary FOSS work, honestly it isn't unreasonable if it is the former situation
Differential privacy techniques are provably impossible to de-anonymize, if implemented correctly. It is possible. But fraught with possibility for error or manipulation.
This is the answer. The person above can speculate and fearmonger what magic "AI" is going to be able to do, but if there is no personal data there to begin with, or if you use math like in differential privacy, there's not going to be a way to identify individuals.
That is, if you suspect they'd change their minds and start trying to deanonymise previously collected data anyway — remember that open source distributions (I don't know fedora-the-organisation specifically) are generally made up of volunteers like you and me. Notable exceptions obviously exist, like for-profit Canonical; that's not the org type I mean or trust.
> if there is no personal data there to begin with
Which is a very large "if". There is a tendency for people to think that the only personal data consists of what legally counts as PII, when in fact there is much more personally identifying information than is covered in those definitions.
You need police and a judicial system and you fix them whenever they break. But you don't need telemetry, it's entirely optional and shoddy implementations translate into unnecessary risk.
Given it is open source software, I'd say we have not only recourse but self-determination here. It's also transparent what is being collected and whom it is being sent to. If you want to know what they store about you, it should take one email and a GDPR reference to find out.
I'm no fan of privacy invasion, always bother clicking through the banner to find the reject cookies option, and sent out plenty of GDPR requests whereas almost nobody else I know ever sent one. I'm not in favor of tracking, but collecting anonymous statistics, especially when they open with "privacy-preserving" and the business is not Facebook or Google or so where we know there's shit about to hit the fan, the cynicism and mistrust in this thread baffles me. Nobody minds when they browse the web and every site keeps access logs invisibly, but oh boy if someone announces keeping a visitor counter for a configuration screen to see if people can find their way to it
> it should take one email and a GDPR reference to find out.
I am not in Europe. The GDPR doesn't help me.
> collecting anonymous statistics, especially when they open with "privacy-preserving"
I am far from convinced that such statistics are gathered in a "privacy preserving" way, but that's neither here nor there.
> the business is not Facebook or Google or so where we know there's shit about to hit the fan
The problem is that you can't just trust the current devs. You also have to trust all future devs and companies that may buy the thing. It's not Facebook or Google now, but it could be in the future. And this is Fedora, which is connected to Red Hat, which is connected to IBM.
And it's also not just about privacy. It's also about impact on product development. It's not exactly rare that software has been made much worse as a result of decision-making based on telemetry data.
> Nobody minds when they browse the web and every site keeps access logs invisibly
No? I think quite a lot of people mind this. But there's nothing that can be done about that. It's still worth trying to keep everything from getting even worse, though.
That's not true. Differential privacy just says that the output of the ML model will not change much if you add or remove a particular user, so you can't reverse the training process to infer what training data a given user provided.
It says nothing about whether or not you can join the output of multiple ML models with other telemetry to build a deanonymization model.
Differential privacy satisfies a post-processing guarantee. It says that if you take the output of a differentially private process and do any amount of processing and combining with outside information, then you don't learn any more than you would have gotten with the outside information alone (up to epsilon).
While true, the requestor had never heard of differential privacy techniques and that is not what they were planning on implementing. This was brought up in the discussion thread.
> I am concerned that AI models can easily de-anonymize and guess end point users when being fed with "telemetry data" of hundreds of thousands clients.
You don't need AI for this. This is done by real humans right now, using data points correlated from multiple sources.
People keep saying "you don't need ai for this." Sure. But to do it at scale, and to intelligently connect disparate kinds of data contextually?
That's time consuming and expensive without ai, so you can't do it at scale to a comprehensive degree. That hasn't been practical until now. It still isn't quite cost effective to do this for every human, everywhere, but soon it will be. Give it 5-10 years
That's definitely a legitimate fear, as seen with the AOL controversy [1], but if they're just collecting aggregate statistics it's much less of a risk. I.e.
User ANON-123 with default font x and locale y and screen resolution z installed package x1
Is clearly a big hazard, but statistics on what fonts, locales, and resolutions have is not really. Even combinations to answer questions like "what screen resolutions and fonts are most used in $locale?" should be safe as long as the entropy is kept low. It is less useful, since you have to decide on your queries a priori rather than being able to do arbitrary queries on historical data, but ethics and safety > convenience
Combine ANON-123 with information from their browser, which has default font x, locale y, screen resolution Z, and package x1, and that anonymous data just became much more rich.
It doesn't take very many bits of information to deanonymize someone once you start combining databases.
> Looking at the current developments in AI, I am concerned that AI models can easily de-anonymize and guess end point users when being fed with "telemetry data" of hundreds of thousands clients.
I can almost guarantee you that the US government has a tool where you can input a few posts from a person on an anonymous network and get back all of their public profiles elsewhere. Fingerprinting tools beat all forms of VPNs and the like. Our privacy and anonymity died like maybe two years ago, there is no stopping it.
That may be true, but it doesn't mean there's no value in protecting your privacy from others anyway. Personally, I'm much more worried about private entities collecting information about me than I am about the government doing so.
Instead of collecting more data why don't do something with the data we already have? A quick look at the Fedora Bugzilla or the GNOME GitLab issues tab suggests the bottleneck doesn't lie in data collection, but in processing.
Apples and oranges. Bug reports are filed by a specific type of user and doesn't give a comprehensive view of all bugs. Statistics can also include a lot more than bugs, like "is the number of MIPS users proportional to the amount of extra effort we need to put in to make that happen?" is not a data point you'll find in bugzilla or other tickets.
Adding to this, bug reporting in the Red Hat ecosystem is an extremely painful process. Multiple accounts, no feedback, mandatory detailed system information collection if you want to use the built in bug report (which also needs an account). No automatic aggregation, processing, or association with similar issues.
It's a black box with no incentive to participate unless you're one of those specific types of users that is dedicated enough to put up with all of that, or users that have never done it before and are trying hard to contribute back.
Yup I absolutely think that should be fixed before they try this telemetry option. Anonymous bug reporting (just crash details) are probably more palatable than this proposal but that would be more work on the bug report side to aggregate and filter out garbage, and they want something that is minimal effort on their side (stated directly in the discussion).
Because management can impose these new data collection policies more easily than fixing known issues. It then gives them the potential to find new easier work to have the engineers implement thus making it seem like they are being effective. Meanwhile, it can be unclear how these metrics relate to overall software quality.
Some metrics like startup time and crash counts lead to clear improvement, while others like pointer heatmaps and even more invasive focus tracking are highly dubious in my opinion.
On a related note, I’m coming to the opinion that A/B testing is harder to pull off than many think. And serving a single user both A and B at any point can confuse them and get in the way of their trusting the consistency of the software. Much like how when you search for something twice and get different results in Apple Maps. OK, now I’m just ranting…
They moved to the CADT model twenty years ago, so the bug reports will never be read.
Now, with telemetry, they can say quantifiable things like "we've driven catastrophic root filesystem loss and permanent loss of network connectivity to 0% of installs!", and prioritize any contrary bug reports away in a data-driven, quantifiable way.
(Because, of course, weak telemetry signals are more valuable than actual humans taking the time to give you feedback on your product.)
> the bottleneck doesn't lie in data collection, but in processing
I created a bug report [1] for tigervnc-server in Fedora because the Fedora documentation [2] for setting up a VNC server didn't match any more what was coming from dnf.
In the bug report I provided the info that would need to be fixed in the documentation. Now after two months, seemingly nothing has been done to fix the situation.
If anything there often appears to be a negative correlation with increased data collection and product quality, in my experience.
I figure it must be due to an abdication of responsibility-- absent information, the product must at least appeal to someone working on it who is making decisions about what is good and what isn't, and so it will also appeal to people who share their preferences. But with the power of DATA we can design products for the 'average user' which can be a product that appeals to no single person at all!
Imagine that you were making shirts. To try to appeal to the most number of people, you make a shirt sized for the average person. But if the distribution of sizes is multimodal or skewed the mean may be a size that fits few or even absolutely no one. You would have done better picking a random person from the factory and making shirts that fit them.
When your problem has many dimensions like system functionality, the number of ways you can target an average but then fit no one as a result increases exponentially.
Pre-corporatized open source usually worked like fitting the random factory worker: developers made software that worked for them. It might not be great for everyone, but it was great for people with similar preferences. If it didn't fit you well you could use a different piece of software.
In corportized open source huge amounts of funding goes into particular solutions, they end up tightly integrated. Support for alternatives are defunded (or just eclipsed by better funded but soulless rivals). You might not want to use gnome, but if you use KDE, you may find fedora's display subsystem crashes out any time you let your monitor go to sleep or may find yourself unable to configure your network interfaces (to cite some real examples of problems my friends of experienced)-- you end up stuck spending your life essentially creating your own distribution, rather than saving the time that you hoped to save by running one made by someone else.
Of course, people doing product design aren't idiots and some places make an effort to capture multimodality though things like targeting "personas"-- which are inevitably stereotyped, patronizing, and overly simplified (like assuming a teacher can't learn to use a command prompt or a bug tracker). Or through efforts like user studies but these are almost always done with very unrepresentative users, people with nothing better to do then get paid $50 to try someting out, and you learn only about the experience of people with no experience and no real commitment or purpose to their usage (driving you to make an obscenely dumbed down product). ... or by things like telemetry, which even at their best will fail to capture things like "I might not use the feature often, but it's a huge deal in the rare events I need it." or get distorted by the preferences of day-0 users, some large percentage of which will decide the whole thing isn't for them no matter what you do.
So why would non-idiots do things that don't have good results? As sibling posts note, people are responding to the incentives in their organizations which favor a lot of wheel spinning on stuff that produces interesting reports. People wisely apply their efforts towards their incentives-- their definition of a good result doesn't need to have much relation to any external definition of good.
> Optional telemetry, of course, but again - this creates a selective and unpredictable reality. Ordinary people don't care either way, and nerds will always make deliberate choices that often have nothing to do with product or profit or anything else entirely.
So, by that logic, users who opt out of telemetry are aware of why they are doing it. People who don't care share their chaotic usage patterns. This creates a false picture of usage reality, and makes software worse. In conclusion, there are only two remaining choices:
1. Make telemetry non-optional
2. Ditch telemetry and rely on QA studies
But then users who care about this issue will just block the application's ability to phone home, or use a different one that doesn't spy (my definition of spying is any time data about me or my machines is collected without my informed consent).
You just need to make the application complex enough that phoning home is convenient and nice-to-have functionality is deeply intertwined with tracking. Like what Google Chrome does.
> So, by that logic, users who opt out of telemetry are aware of why they are doing it.
Ambiguous: they're aware of why who is doing it? I opt out of telemetry because I don't know why they're doing it - the data collectors. I mean, I know why they say they're doing it, but I don't know if it's true.
I also don't want my computing resources and network bandwidth used to further a goal that I might not support. Even if the only reason for collecting data really is to "improve the product", perhaps that'll result in them making the product dependent on systemd, which from my POV would be an adverse outcome.
The reason they collect telemetry usually is why they say they do - to improve the product. If it's commercial software then, sure, the ulterior motive is to get more money out of you, but they're doing that by making you like the product more so, great?
And if you're worried that your user case (e.g. not-systemd) will be deprecated then that's a good reason to keep it on and be represented. I bet people who used Firefox's RSS feature were more likely than usual to turn off telemetry, yet that didn't save their feature. If anything, it may have made Mozilla more confident in removing it
> The reason they collect telemetry usually is why they say they do - to improve the product.
Even if that's true, organizations like to keep this stuff for longer than they should and are not immune to hacks or changing business priorities (like AI). If there was serious liability attached to leaking or "off-label" use of this data then I'd feel better about it.
Also, using telemetry to justify removing features is not about making the product better but about cutting costs. How is Firefox better without RSS support? The fact that only a small number of users use it tells you nothing about how passionate or influential those users are, nor how such a change will affect wider perception of your products. See also: how pissed off many on HN still are about the death of Google Reader even though it was a "niche" product.
> Also, using telemetry to justify removing features is not about making the product better but about cutting costs. How is Firefox better without RSS support?
Maintaining stuff has a cost in time and/or money. Added new features or fixing bugs has a cost in time and/or money. Time and/or money is limited, so decisions must be made on what is the best use of it. Removing features can (sometimes) result in a lot of savings in the form of simplified codebase (which means improved developer productivity), reduces surface area for bugs, etc. I (really) hate when products remove features, and I think it's often done assuming a lot more savings than actually happens, but there is some logic to it.
I don't know what Firefox used the RSS support savintgs to do, but if they used it to build containers (which I use constantly) then for me Firefox is much better. If they squandered that time, then of course it's not better off. But without having access to roadmaps and planning documents, it's dificult to know which tradeoffs were made.
That phrase can be viewed in many different ways, depending upon who the decision makers setting the goals are.
> If anything, it may have made Mozilla more confident in removing it ...
Of course. People know this, that's a good part of why people are against metrics.
It leads to dumbed down software, as there's always a 5% of features that can be cut (on every budget or dev iteration) to "focus more on the majority uses!".
If you deploy telemetry and go on to assume that it's giving you accurate information, that's a mistake. Blaming it on me for not letting them have their telemetry is victim-blaming.
Yes, you're right. I meant, users who choose to opt out are more competent users (or just more paranoid) and might sooner understand what telemetry means and/or does. Usually, and I am assuming, competent software users tend to have much more unique and specialized usage patterns than the average joe.
As someone who's used Fedora for 20 years (since before it existed, ie: RH6), this decision would be a show stopper. No, Fedora, you're doing just fine without telemetry! If you try to force it, you'll lose a ton of loyal users. Find out whoever is pushing this idiocy and fire them, quick.
Maybe Debian testing?
Although I dont think this is a big of deal as sending the data is still opt-in, it collects the data by default but doesn't send it.
Debian has opt-in minimal telemetry; popularity-contest, submits installed packages and when they were used (approximately). It also has lots of other privacy issues inherited from upstream open source projects.
I think telemetry collection is a symptom of deeper organizational issues.
For instance, I've never worked with a competent release manager who said "we need more field telemetry!"
Instead, the good ones invariably want improved data mining of the bugtracker, and want to increase the percentage of regression bugs that are caught in automated testing. They also generally want to increase the percentage of automated test failures that are root-caused.
> I can speak as a GNOME developer—though not on behalf of the GNOME project as a community—and say: GNOME has not been “fine” without telemetry. It’s really, really hard to get actionable feedback out of users, especially in the free and open source software community, because the typical feedback is either “don’t change anything ever” or it comes with strings attached. Figuring out how people use the system, and integrate that information in the design, development, and testing loop is extremely hard without metrics of some form. Even understanding whether or not a class of optimisations can be enabled without breaking the machines of a certain amount of users is basically impossible: you can’t do a user survey for that.
The problem being that inevitably the 'improvement' gets measured through the same telemetry figures which are being optimised, so of course it's percieved by developers has helping them improve things.
To add to the point: Alphabet has probably most data than any other company (except FB I suppose) and they still can't release a good product that people will actually use no matter how much data they have.
There's this anecdote (which might have been made up entirely to make a point):
Restaurant management wanted to compare different soup offerings by counting orders from each soup to determine which ones were more popular. They have selected the most popular two offerings, the rest were scrapped in order to safe money on ingredients. Soon after, not only did the order numbers of those two soup offerings drop, the total number of soup orders dropped. How come? Well, maybe nobody has asked the customers if the offered soup was tasty at all. A quick survey revealed that customers make the popular choice, find out it's crap, and then do not ever order soup again in that restaurant, or in rarer cases give the other one a shot. It turned out, the most popular offer was basically cheap crap nobody wanted to eat, and when there's nothing else, they keep ordering the same, or never visit that restaurant again.
Telemetry does not tell anything about user preferences. Who ever is selling you that idea, does not either.
Alphabet markets two linux-based operating systems for consumer devices that are extremely popular. Telemetry from the field makes android and chromeos better operating systems.
I don't think the fact that Google has two popular OSes with telemetry means that telemetry makes them better OSes. For Android, their only real competitor is iOS. Their big advantage there is cost and the reason there isn't really another competitor is the difficulty of creating an OS which requires a number of resources. For ChromeOS, the situation is similar with regards to the time to create an OS. I think there, their main competitors are small Linux distributions, but, in addition to manpower, Google has the money to procure and sell laptops with their software already on them at scale. So, I'm not convinced telemetry is actually creating a better product, their products happen to have telemetry.
The is HN so you are allowed to hold forth out of pure ignorance, if you want to. Android-wide and ChromeOS-wide profiling produce binaries including the Linux kernel that are peak-optimized for actual conditions in the field and real use cases. They ship the only profile-optimized Linux kernels you can get from any distribution. It is a demonstrably better product through telemetry.
Got a link to any information on this, especially the collection of the profile information from user's devices? The only reference I can see to PGO on android is the support they have for a more traditional flow (create special instrumented binary, run a 'representative' workload or two, get profile data for PGO). It would be especially interesting if there's any info on how much of a performance improvement this yielded.
Annoying, I have used Fedora for many years now and I hadn't planned to stop that. Even with the slim chance that they don't go through with this, it tells that they have lost their bearings. Oh well it will be nostalgic doing a bit of distro hopping again.
Where are you planning to distro hop to? I find Fedora has a special balance between bleeding edge and stable that no other distribution I tried achieves.
That's what I was going to suggest; the SUSE people do good work. If Fedora becomes something I want to leave... that's where I'm going.
I don't really need or want this pressure on Fedora -- 'the premiere desktop... blah, barf'. This is how we got Canonical and their brazen licensing
I'm happy enough with it basically being RHEL-next. Package the misc things (GNOME, KDE, Sway, etc) with the appropriate SELinux policies and bam, a decent desktop OS.
One minor quality-of-life suggestion that I found with SUSE, is after you install things make a soft link from their "zypper" package manager utility to "zyp".
That way, you can just type "zyp ..." for stuff which is pretty easy, whereas "zypper" I always found to be more of a pain (for typing). "zypper" seems to attract typos, for me anyway. ;)
You could do the same thing with an alias too I guess... :)
> if I want a simple installer where everything just works
I actually feel like the archinstall[0] tool included in the official Arch ISOs really nails easy installation. It's an official way to install Arch that is incredibly user friendly and fast, in my opinion.
thank you for sharing this. one of my hesitations around Arch was the effort to set it up -- I'm too old and too busy to deal with doing it all by hand.
I ran Gentoo back in college, did LFS once -- all good learning experiences -- but Fedora or Ubuntu can get me a usable system in 20 minutes and I don't have to think too hard about anything except basic partitioning and a password.
This looks like it'll get me something working, but not commit me to a full-on Manjaro install.
Absolutely! Archinstall is so nice to just get a system up and running quickly. And not many installers (i.e. on other distros) give you so many options for desktop environments. archinstall is a great tool!
It's so tedious every time people ascribe some action of Fedora or Red Hat to IBM. This is nothing to do with IBM. Red Hat can and does make these decisions all by itself.
But these things happened after IBM acquired Red Hat and IBM certainly has the power to instruct Red Hat to steer clear from any actions that might turn the FOSS community against them.
I work at Red Hat and interact daily with the people making the decisions and it has nothing to do with IBM! In fact they run Red Hat (surprisingly) hands-off. And this particular proposal comes from GNOME who have been making terrible decisions long before IBM acquired Red Hat.
Then maybe IBM should get involved, but in a positive way to let it be known that they risk brand and image damage from these stupid moves and kindly ask for them to stop.
You're being somewhat annoying in this thread, that's the second time that you are purposefully misinterpreting a comment. I really don't understand what it is that you are trying to achieve but if it is of any help: you are making RedHat look pretty bad and insensitve to end user concerns.
Really, given that "One of the main goals of metrics collection is to analyze whether Red Hat is achieving its goal to make Fedora Workstation the premier developer platform for cloud software development." is right there in the text it would seem pretty clear to me that the 'Fedora Community' (in sofar as one exists) is just Red Hat in disguise serving its own interests here, and at arms length to see what the response is. Think of it as a trial balloon which will certainly be followed by more clicks of the ratchet.
That's pretty disingenuous. Are we supposed to pretend that IBM bought redhat to do exactly nothing with it? It's not exactly far fetched to assume that redhat's owners influence redhat. We can't prove that every decision is directly coming from IBM but the point is moot. Redhat is IBM.
> Are we supposed to pretend that IBM bought redhat to do exactly nothing with it?
They bought Red Hat because Red Hat was profitable, the stock was going up, sales were in the early stages of a massive swing upward, and IBM was dropping stock price and waning in market power. It's the same playbook they've been using for decades.
It seems more far fetched to me to believe that IBM bought Red Hat so they could destroy it.
I actually work at Red Hat and IBM are very hands off. I'm quite surprised myself in fact, as I predicted that IBM would absorb Red Hat in one way or another. However you believe whatever you want.
>I actually work at Red Hat and IBM are very hands off.
Are you the CEO or within the decision making structure? Otherwise really it just sounds like you just aren't in on the backchannel where these instructions are issued. Oh and inb4 "There is no backchannel!", IBM is subject to Sarbanes–Oxley there's a backchannel.
Do you really think IBM cares enough about Fedora to mandate the insertion of telemetry, possibly upsetting everybody at Red Hat that works on it?
It seems so much more likely to me that the Fedora team just wants telemetry data (which is very standard in the industry) for decision making, exactly as they said.
> Do you really think IBM cares enough about Fedora to mandate the insertion of telemetry, possibly upsetting everybody at Red Hat that works on it?
Yes, I think that's well within the realm of possibility. IBM bought Red Hat for some reason, so they obvious care quite a lot.
I'm not saying they're actually doing this, of course. I don't know. But we're talking about large corporations here, so suspicion seems warranted.
> It seems so much more likely to me that the Fedora team just wants telemetry data (which is very standard in the industry) for decision making, exactly as they said.
I think this is true! Both things can be true.
Standard in the industry, though? No. It's standard in the commercial software world (which is one of the reasons why I avoid commercial software), but it's not standard in the FOSS world.
The reason why I object so strongly to these sorts of moves from Fedora (and other such outfits) is because there's clearly a strong effort being made to normalize this in the FOSS world, and I think that would be tragic.
Does 'Red Hat' actually exist as a separate entity?
Is there any 'relative autonomy' entity for Red Hat, like a board of directors or a council of any kind. Or is Red Hat simply a department in a larger structure?
This question is not meant to be rhetorical or challenging.
As someone who currently works for Red Hat, yes. Red Hat is more than just a "department within a larger structure", it retains a separate CEO (who reports to IBM, yes), a separate human resources department, separate marketing events, we (apart from maybe a few dozen people) do not have IBM email addresses nor do we use their internal systems or vendors (except for e.g. Employee Stock Purchase Plan). There is almost no interaction between Red Hat and IBM employees apart from the highest levels.
Most of the decisions/changes which people attribute to IBM fall into one of a few categories:
* What tends to happen to any company when it doubles / triples in size to >20,000 employees, as Red Hat did from 2017 to 2023.
* Leadership shuffles, such as when Jim was replaced by Paul when Jim became President of IBM. (This happened at the same time as the acquisition, but just because one CEO made choices differently than another might have doesn't make IBM directly responsible for those decisions).
* The rise of Amazon, Google, Microsoft / PaaS and SaaS / containers sucking a lot of air out of the traditional Red Hat market segment. RHEL is still an important cornerstone of the company, but remaining a successful company 10 years from now will require finding additional niches and creating value in ways that are not as vulnerable to entities 100x our size.
I'm struggling to think of any company, anywhere, that has that kind of arrangement after an acquisition. Can you point one out?
>Some services not merged (yet).
No services have been merged as far as I know, other than the ones that are literally impossible to not merge, such as the employee stock purchase plan. Please do not twist what I said.
Health insurance is separate, 401k is separate, IT systems are separate. As far as I know the expense system is separate (I haven't needed to use it in years, and I'm on vacation so I can't check).
>Redundancies in a landscape that is changing.
What is even the point of saying this? You asked if there was separation and then dismiss evidence of separation as "redundancies" [that won't last]. I can't predict the future, but the way HN talks about Red Hat you would think the entity known as Red Hat no longer exists (or only barely so). The separation has remained constant for 4 years so far. Not sure what else to say.
I'm at a point now where I think this has just reached "conspiracy" levels. No amount of reasonable evidence could dissuade the people who think that IBM is micromanaging Red Hat, to the point where they are forcing (over the objection of the people who work on Fedora) the addition of some sort of privacy preserving telemetry to the product. When you get to the point of "all the Red Hat people that say Red Hat is independent are lying! They don't know what really goes on, but I (completely unaffiliated and viewing from a distance) know the truth!" I don't think there's anything more you can do.
Sorry this probably did seem like it was implicitly referring to you, but it wasn't my intention at all. I find you very reasonable :-)
It's kind of the culmination of dozens of threads on Hacker News over the last few years and increasing frustration on my end. Mainly the frustration is because I have criticisms of Red Hat, but every conversation seems to jump straight to some variation of IBM and it drowns out the (IMHO) reasonable discussion, or it's (rarely but still happens) a Red Hat person who doesn't think a single decision they've made is bad.
It wasn't my intention to twist your words at all so I apologise for not communicating effectively. What you refer to as separation obviously exists on a practical level that you experience day to day.
My reference to redundancies arose from my own experience (at massively smaller scale) when organisations merge. Duplicated functions are removed over time.
I think I've said enough for this topic and I hope you enjoy the rest of your holiday.
Do you think it's some giant conspiracy that everyone who works (or worked, in my case) at Red Hat has to go around claiming autonomy while secretly having their puppet strings pulled from IBM?
No. That is why I asked the poster above about structures that provide relative autonomy.
For instance, in this particular case, the proposal by the Red Hat Display Systems Team may well be considered by the Fedora Engineering Steering Committee and the Fedora Council which sounds like what I mean by 'relative autonomy'.
However it appears that Red Hat itself does not have any 'corporate body' that is distinct from IBM.
I suppose I disagree that a distinct corporate body is required for autonomy, but I would definitely agree that without such the risk of the autonomy disappearing in the future is ever-present. I think Red Hat has tons of autonomy now, but at some point IBM could start changing that. It's also possible they have, but having worked at Red Hat and knowing many of the people there, I would expect a lot of screaming, whistleblowing, and resigning should such a thing ever happen. Red Hat is a remarkably "speak your mind" culture, and people do even though it sometimes starts shit storms (such as when people call out the CEO for something publicly on the company-wide mailing list).
> One of the main goals of metrics collection is to analyze whether Red Hat is achieving its goal to make Fedora Workstation the premier developer platform for cloud software development. Accordingly, we want to know things like which IDEs are most popular among our users, and which runtimes are used to create containers using Toolbx.
How is that any of their business? Usually there is a lot more misdirection in what potential benefits telemetry will yield. Not, “We would really like to know what software is running in their machine, there is probably something we can do with it”
I find it hard not to be cynical about this after everything IBM has been doing. I wonder if this is an attempt to identify exactly which organizations are using Fedora in order to upsell or do another CentOS-like rug pull?
Luckily, OpenSuse Tumbleweed looks to be a pretty good alternative to Fedora. There’s even an immutable version of it, like Silverblue!
> We believe an open source community can ethically collect limited aggregate data on how its software is used
For me the big question is why? Proprietary software needs telemetry because the user is not in control and features are only added by the owner of the software, thus the centralized owner needs to know what features to add.
Open source is different. It is decentralized. Anybody can tweak the system to make it better for themselves. In addition, as opposed to proprietary software which sells licenses, the most common open source monetary model seems to be selling support in which case the people buying support can ask for the feature without telemetry.
To put it another way, you need telemetry for a cathedral since decisions are made centrally. A bazaar doesn’t need telemetry, since decisions are decentralized.
>Anybody can tweak the system to make it better for themselves.
99% of users just use whatever the default packages their ON gives them. Practically none of them are digging into the code and making changes.
>as opposed to proprietary software which sells licenses, the most common open source monetary model seems to be selling support in which case the people buying support can ask for the feature without telemetry.
Proprietary software which are sold are in the business of creating products with enough value that people are willing to pay actual money for them. It is smart to look at what these businesses are doing because they are incentivized to make good software so they get to be efficicient about doing so. Meanwhile most open source programs are given out for free and do not care about user experience or providing features for the mass audience.
Asking for features publically is just one, biased signal. There is more to telemetry like seeing what features are actually used. What investments should be double downed on. What features should be made more clear how to use. What crashes are the worst. What parts of your program need to be optimized. etc.
> For me the big question is why? Proprietary software needs telemetry because the user is not in control and features are only added by the owner of the software, thus the centralized owner needs to know what features to add.
Open source projects can use telemetry to prioritize bug fixes and feature enhancements. You can't fix every bug, but with telemetry you can focus development on issues and features that affect the most users. Here are some use cases that come to mind.
AVX-512 is complicated; there are 21 extensions. What percent of users have some AVX-512 support and which extensions are popular?
Which buggy devices, clients, and applications are worth adding workarounds for?
Which protocol settings can we safely turn on by default?
What are the most popular GPUs? Where would we get the largest benefit from performance improvements, testing, and bug fixes?
What percentage of users are affected by a new Intel CPU bug? How should we prioritize developing a workaround? Does it need to be deployed this week or this month?
Which GNOME applications and configuration settings are frequently used?
What are the most common display resolutions?
What are the most popular installed packages? Should we add them to our base system?
You may have answered your own question. Perhaps centralization is the point as we move closer to the “extinguish” phase of open source as we have known it.
This makes no sense. To elaborate: It's obvious that my comment referred to RedHat. IBM owns RedHat and I'm sure they can influence this sort of thing but they should realize that RedHat is where it is because it is a company that in the past could be relied upon not to do such stupid things. Telemetry doesn't serve the users, and the kind of users that are customers of RedHat will certainly not appreciate this sort of bait-and-switch and may well leave for other platforms. IBM won't lose any business on account of this but they're burning up value that they paid good money for when they acquired RedHat.
Fedora has a very close relationship with RedHat, far more so than other community projects and RedHat is in turn owned by IBM. To all intents and purposes the Fedora community is on a leash. Many of the Fedora maintainers are RedHat employees. There are financial and infrastructure ties as well.
Their legal page even refers to 'Red Hat’s Fedora Project' so as far as I'm concerned they're one and the same and can be dealt with as such. I'm not sure I am following your cleverness, it might be more productive to spell it out, and when making quotes in French you should at least try to do it right ;).
They're in business because they eschew observability of their software? Seems like a poor strategy. I'll take the company which is actually paying attention to how their software is used in order to improve it, and getting prompt notification of bugs
Opt-in telemetry is garbage. I’m going to stop responding to comments that are
requesting opt-in because I’ve made my position clear: users who opt-in are not a
representative sample, and that opt-in data will not be accurate or useful.
Accurately summarised to "Fuck off dickheads, your privacy is getting in the way of us doing development!".
=== What data might we collect? ===
We are not proposing to collect any [...] particular metrics
just yet, because a process for Fedora community approval of
metrics to be collected does not yet exist. That said, in the
interests of maximum transparency, we wish to give you an idea
of what sorts of metrics we might propose to collect in the
future.
One of the main goals of metrics collection is to analyze
whether Red Hat is achieving its goal to make Fedora Workstation
the premier developer platform for cloud software development.
Accordingly, we want to know things like which IDEs are most
popular among our users, and which runtimes are used to create
containers using Toolbx.
Metrics can also be used to inform user interface design
decisions. For example, we want to collect the clickthrough
rate of the recommended software banners in GNOME Software to
assess which banners are actually useful to users. We also want
to know how frequently panels in gnome-control-center are
visited to determine which panels could be consolidated or
removed, because there are other settings we want to add, but
our usability research indicates that the current high quantity
of settings panels already makes it difficult for users
to find commonly-used settings.
Metrics can help us understand the hardware we should be
optimizing Fedora for. For example, our boot performance on hard
drives dropped drastically when systemd-readahead was removed.
Ubuntu has maintained its own readahead implementation, but
Fedora does not because we assume that not many users use Fedora
on hard drives. It would be nice to collect a metric that
indicates whether primary storage is a solid state drive or a
hard disk, so we can see actual hard drive usage instead of
guessing. We would also want to collect hardware information
that would be useful for collaboration with hardware vendors
(such as Lenovo), such as laptop model ID.
Other Fedora teams may have other metrics they wish to collect.
For example, Fedora localization wishes to count users of
particular locales to evaluate which locales are in poorer shape
relative to their usage.
This is only a small sample of what we might want to know; no
doubt other community members can think of many more interesting
data points to collect.
That last piece "no doubt other community members can think of many more interesting data points
to collect" sounds pretty bad for telemetry that's enabled by default, with people having to opt out of it. :(
> A new metrics collection setting will be added to the privacy page in
gnome-initial-setup and also to the privacy page in
gnome-control-center. This setting will be a toggle that will enable
or disable metrics collection for the entire system. We want to ensure
that metrics are never submitted to Fedora without the user's
knowledge and consent, so the underlying setting will be off by
default in order to ensure metrics upload is not unexpectedly turned
on when upgrading from an older version of Fedora. However, we also
want to ensure that the data we collect is meaningful, so
gnome-initial-setup will default to displaying the toggle as enabled,
even though the underlying setting will initially be disabled. (The
underlying setting will not actually be enabled until the user
finishes the privacy page, to ensure users have the opportunity to
disable the setting before any data is uploaded.) This is to ensure
the system is opt-out, not opt-in. This is essential because we know
that opt-in metrics are not very useful. Few users would opt in, and
these users would not be representative of Fedora users as a whole. We
are not interested in opt-in metrics.
Those last three lines are essentially adding insult to injury. I have no problem with telemetry, I do have a problem with telemetry being on "by default" as it puts the burden of safeguarding privacy squarely with the user. As an individual you have to make sure the toggle is set to "off" yourself, and assume that telemetry really is switched off.
Back in february, the Go project proposed to add opt-out telemetry in the Go toolchain and they were almost torn in half over a weekend.
> This is to ensure the system is opt-out, not opt-in.
It totally fails to ensure that. The underlying setting is off, but if you don't take deliberate action on installation, it gets switched on. It might as well be on unless you act to switch it off.
The meaning of "opt-in" was settled a long time ago; it means that if you don't take deliberate (and informed) action to opt-in, then you haven't opted in.
[Edit] Sorry, mis-read parent comment. They're trying to say it's opt-in, but the proposal is to make an opt-out UI, with an underlying off setting that gets set to $SOMETHING at installation, defaulting to on.
Personally, I feel very differently between collection of static information (like the CPU model, how much memory I have, or whether I'm still using a slow rotating HDD), and collection of dynamic information (which tabs of the settings app I use the most, whether I actually use Gnome Software at all, etc). The former mostly tells only how much money I have to spend on cool toys, and not much more than that; the later feels as if there's someone constantly looking over my shoulder, which makes me uncomfortable.
>> Yet another Linux distribution is becoming adware.
IBM / RH Salesperson: "But think of the opportunity! GNOME Software could be a REAL app store[0] just like the Apple App Store or Google Play and bring the Linux desktop into the 21st century. We could even call them 'donations' so that users feel good about paying. We will need to have Flatpak integration to allow less technical users to get the software we are moving to community support like LibreOffice. With payment account setup and processing, we could easily add commercial software and offer vendors more customers with minimal effort on their part."
"Once we get telemetry enabled and users setup their online accounts in GNOME Online Accounts[1][2], we can make deals with the online account providers for targeted advertising metrics. We won't show any ads in GNOME for now. But after users are used to seeing some promotions for Red Hat products and services, it is just a slight change to show other promotions."
Ah yes, always hunting for reasons to remove functionality from GNOME. I wonder if they would try to collect data on how many users enable options for accessible from the GUI or use extensions to add back basic functionality?
> For example, we want to collect the clickthrough
rate of the recommended software banners in GNOME Software to
assess which banners are actually useful to users.
OK, so they need telemetry to measure the effectiveness of their ads. I think the rest is padding.
There's no such thing as privacy-preserving telemetry. How do you retrieve the telemetry from the device? Via networking, right? BAM, that's an IP address leak which is PII. We don't need to go any further than that.
They need to do more consideration if their answer is to tie it to gnome-initial-setup... unless I misunderstand, and this is purely a GNOME-cooked thing
I fear what that means where the preference is saved, and how spins (or users who simply choose to not have GNOME), may feasibly opt out
Where's the demarcation? Is this some dconf thing that a timer will read, a service, or what?
I lack trust in their handling in certain matters. For example, every Fedora device 'phones home' for AP checks:
I see people complaining about instinctive reactions about the usage of the word telemetry, but they're rightly justified in those reactions. People have those instinctive reactions for a very good reason even with this specific proposal. If you read the discussion post, the following becomes clear:
* The proposer has clearly not done any research on how to actually collect anonymous data (they'd never heard of differential privacy for example).
* They want a plug and play solution (they specifically say they don't want to do more work than that)
* They are not open to discussing privacy regulations such as GDPR
* They are not willing to bend on the most contentious points of their proposal
* The system they want to use collects invasive metrics that can be de-anonymized and has only been used by a niche distribution
Because the de-anonymization bit might not be clear, let me summarize some of the things that the Endless OS metrics collect:
* Country
* Location based on IP address to within 1 degree lat/long
* Your specific hardware profile
* Daily report that includes your hardware profile, along with the number of times the check ins have occurred in the past
* Detailed program usage (every start / stop)
* An unspecified series of additional metrics that can be sent from anywhere else on the system via a dbus interface
Additional this proposal wants to explicitly collect:
* What packages and versions of such are installed
* Specific application usage metrics (the example they give is the gnome settings panel)
They discard the IP address, but how hard do you think it is to differentiate users based on the combination of hardware profile, +/- 1 degree of location accuracy, their specific set of packages (and knowing the history of package installs/uninstalls already through their package manager). The proposal doesn't meet its stated intentions of being anonymous, and the proposer actively understands that users don't want this but believe their desire for the metrics overrides the end users desire of not being tracked.
The approach to this is a half baked idea and an unforced error. If I were contributing to another distro, I would say I was building a general differential privacy and zk-snark library and accompanying services stack that developers could use for what they found interesting. Then once it had some burn in, I'd launch a limited beta where it was trustworthy enough that participants could get useful data from the rest of the clusters without exposing the other participants to risk.
Maybe we need a particapatory privacy stack that produces valuable anonymous data and also contributes to it. You might be able to do it with homomorphic arithmetic that increments defined counters (like the hash of a package or version), and we already have distributed ledgers for collecting and distributing the data. We can do queries with differential privacy, and zksnarks.
It's not a viable product because people who actually use data want the real data, the discretion is power to them, but as a tool for coordinating a cooperative effort, we need to build something new to say that this is how we do things now.
The detailed proposal is pretty comprehensive. I'm confident that if Fedora does go through with this then it will be well designed.
A few years ago Ars Technica had a site redesign. When it first came out, it didn't have a dark mode, and the comments on the article announcing the change (after the design had been rolled out) were full of people who were upset with the lack of dark mode. Turns out, many of the subscribers and power users that used dark mode had adblock and the like that blocked metrics collection, leading the Ars web designers with the impression that basically nobody used it. Since then I've opted in to telemetry in programs I trust to be good stewards of that data so that I can at least do a little to ensure that whatever weird setup I might be using, like dark mode, is seen as relevant to the project.
But that does bring up the point that just relying on telemetry doesn't always present an accurate picture of what's going on with all of your users. Probably the best way to see what your users are doing would be a combination of telemetry (to see what the average user is doing), surveys (to see what the enthusiastic user who self-selects into completing the survey is doing), and user studies (for specific design decisions). I'd like to see what Fedora's policy for data collection in the last two categories is and if they'll integrate all of it together for more comprehensive decisions.
Also, I'm not surprised by the instinctive reactions that people have to the word "telemetry" in the headline, but the proposal is really well done. It addresses many of the complaints I see in these comments, like if it should be opt-in or opt-out, if it's legal under GDPR ("Fedora Legal has determined that if we collect any personally-identifiable data, the entire metrics system must be opt-in. Since we are only interested in opt-out metrics due to the low value of opt-in metrics, we must accordingly never collect any personally-identifiable data."), and the like. I think that this proposal was done by a person or team that genuinely sat down and thought through what a real privacy preserving telemetry implementation would look like, rather than the typical corporate claim "it's totally privacy preserving (also our product is not available in the EU)!!!"
It's open source so I'm just going modify the telemetry program to send nonsense and publish my changes on shithub so anyone else who wants to monkey wrench Fedora's telemetry can do so (even if they don't actually run Fedora).
This solves the problem only for you, and those users who are aware of the problem, have enough understanding to know that it is a problem, and also have the technical know-how to even use your published code. How many percent of the user base does this cover? Are you OK with implicitly recommending Red Hat/Fedora (by using it) when you know that most people will get caught by this?
That's assuming that gnu8's telemetry spoofing program only generates one machine's worth of spoofed telemetry data. If he thinks bigger than that (generate data from thousands of fake machines, launder the traffic through residential IP proxies, etc), he could be the source of the majority of the Fedora telemetry data. That would give him the ability to make Red Hat engineers do his bidding, which seems like a fun power to have.
The world in general and open source in particular depends on bad actors like yourself being rooted out and chased away. Nothing to do with the specific issue here.
I’m not the bad actor, those who want to add telemetry to their open source software are. To the extent that jamming telemetry systems with false or inaccurate data discourages Fedora and others from adding telemetry to their software, I am making a positive contribution to OSS.
Stop trying to treat my computers like they are a part of your test lab.
I think in principle this is a good idea, getting actual usage patterns not just from a loud minority seems very useful to the people making the OS. As long as they don't collect egregious stuff I'm very okay with this.
I'm pretty big in favour of removing all telemetry and studies in every way except when they're opt in. Then I'm always going to opt in. I would like the population being optimized for to lean towards me more.
Ideally, the only statistics you have are from me and my Sybils, and all of society's energy is dedicated to improving life for me.
I'm not sure but it can be changed to anything the user wants:
> all of the components of the server (discussed below) are open source, and we will provide instructions for how to run a simple server yourself and view its metrics database. You can redirect metrics from Fedora’s server to your own by changing a URL in a configuration file.
I'm fine with these changes as long as they are transparent.
So, to me, as long as they use a specific and dedicated (and known) fqdn, I may just block the whole telemetry adding that entry in /etc/hosts file), can't I?
And you need to remember to do that every time, on every host that runs Fedora. In practice, after a while you'll stop doing that. That's the beauty of opt-out, eventually nobody opts out any more.
Whether it is useful or not skates pretty close to a utilitarian argument which has little to do with privacy, it is quite possible to violate privacy and get some use out of it but it shouldn't happen anyway.
The privacy respecting part here seems to be handled carefully, I'm not sure if it is opt-in or not (in my opinion it should be) and the payloads appear to be benign, but, that could still go off the rails if there is a bug in ABRT.
In principle I want my machines (desktops, servers) only to initiate network calls and to respond to network calls that I allow. Outbound firewall rules are there for a reason.
> I wish they didn't use the name "telemetry" for this; it's going to produce a lot of instinctive reactions from people that don't lend themselves to a productive conversation.
It's a perfectly accurate description of what they're proposing. Even if they called something else people would immediately recognise it for what it is (and then be even more upset due to weasle-wording)
Looking at the current developments in AI, I am concerned that AI models can easily de-anonymize and guess end point users when being fed with "telemetry data" of hundreds of thousands clients.
I hear a lot and read a lot of software and hardware vendors saying that "telemetry" is supposed to somehow magically improve the user experience in the long run, but in actuality software tends to get worse, unstable and less useful.
So, I would like to know how exactly any telemetry data from Fedora Linux clients is going to help them, or how is it going to improve anything.