MDD - Opt-in vs Opt-out

I tend to be distracted, not for nothing sometimes I fall when jumping.

2 Likes

Been eating catnip again, eh? :stuck_out_tongue:

2 Likes

Let me explain why data collection is useful for this project and for the Linux ecosystem in general. When I ask someone for funding or getting support for some application for an ecosystem other than Windows almost always one of the first questions from that person or company is:

  • How many users have your system installed?
  • What is the monthly usage of your OS?
  • In which region is your OS used?
  • What kind of hardware is used to run your OS?

Those are always very hard to answer when you don’t know anything on how your product, in this case Manjaro Linux, is used. If you start guessing you almost certain will loose any potential deal you try to gain or support those could otherwise provide.

So lets break down the metrics you can find at our dashboard. I will share screenshots to also illustrate it here. That data is collected from our test users participating on their own will to improve the tooling and doesn’t represent the current real usage of Manjaro Linux. We hope to roll out into production soon, so more of our community members and users of Manjaro Linux can share their data, so we have a better picture on how our OS is used.

Manjaro Devices / Installation counting and monthly usage

image

This graph records how many users have executed mdd.py so far. With the unique ID it will only count the one machine, even if you executed mdd.py multiple times on that day. When you recall it next day or in any later point of time you will still say, yes I’m still using Manjaro. This is then recorded in the monthly usage graph.

The data needed for those graphs is the following:

mdd --disable-telemetry
Welcome to MDD - The Manjaro Data Donor
Preparing data submission...

------------------------------------------
        Sending the following data
------------------------------------------
{
    "meta": {
        "version": 1,
        "timestamp": "2024-11-13T03:37:43.667169+00:00",
        "device_id": "4c695e2c-b62c-50df-872f-7618d1d1b7f5",
        "distro_id": "manjaro"
    }
}
------------------------------------------

Succesful sent at 2024-11-13 10:37:44

Now we have the other method to also send telemetry data like this:

mdd
Welcome to MDD - The Manjaro Data Donor
Preparing data submission...

------------------------------------------
        Sending the following data
------------------------------------------
{
    "meta": {
        "version": 1,
        "timestamp": "2024-11-13T03:40:23.900203+00:00",
        "device_id": "4c695e2c-b62c-50df-872f-7618d1d1b7f5",
        "distro_id": "manjaro",
        "release": "24.1.2",
        "inxi": true
    },
    "system": {
        "kernel": "6.6.60-1-MANJARO",
        "form_factor": "laptop",
        "install_date": "2024-05-05T09:46:53+00:00",
        "product_name": "HERO-RPL-RTX",
        "product_family": "RPL",
        "sys_vendor": "SLIMBOOK",
        "board_name": "HERO-RPL-RTX"
    },
    "boot": {
        "uefi": true,
        "uptime_seconds": 66502
    },
    "cpu": {
        "arch": "x86_64",
        "model": "13th Gen Intel Core i7-13620H",
        "cores": 10,
        "threads": 16
    },
    "memory": {
        "ram_gb": 15.36336898803711,
        "swap_gb": 16.901763916015625
    },
    "graphics": {
        "comp": "xfwm4",
        "dri": "iris",
        "gpus": [
            {
                "vendor": "AIstone Global",
                "model": "Intel Raptor Lake-P [UHD Graphics]",
                "driver": "i915"
            },
            {
                "vendor": "AIstone Global",
                "model": "NVIDIA AD107M [GeForce RTX 4060 Max-Q / Mobile]",
                "driver": "nvidia"
            }
        ],
        "outputs": [
            {
                "model": "BOE Display 0x0b40",
                "res": "2560x1440",
                "refresh": 165.0,
                "dpi": 189.0,
                "size": "344x194",
                "mapped": "eDP-1"
            }
        ]
    },
    "audio": {
        "servers": [
            {
                "name": "PipeWire",
                "active": true
            }
        ]
    },
    "disk": {
        "disks": [
            {
                "size_gb": 465.7617416381836,
                "root": {
                    "size_gb": 448.56261587142944,
                    "fstype": "ext4",
                    "crypt": false
                },
                "home": null
            }
        ],
        "windows": false
    },
    "locale": {
        "region": "en_US.UTF-8",
        "language": "en",
        "timezone": "Asia/Ho_Chi_Minh"
    },
    "package": {
        "last_update": "2024-11-12T12:19:08+07:00",
        "branch": "unstable",
        "pkgs": 1871,
        "foreign_pkgs": 28,
        "pkgs_update_pending": 4,
        "flatpaks": 0,
        "pacman_mirrors": {
            "total": 7,
            "ok": 7,
            "country_config": ""
        }
    },
    "desktop": {
        "cli": "/bin/bash",
        "gui": "Xfce",
        "dm": "LightDM",
        "wm": "xfwm4",
        "display": "x11",
        "display_with": "Xwayland"
    }
}
------------------------------------------

Succesful sent at 2024-11-13 10:40:28

Lets break down each data-set sent like that:

System information

    "system": {
        "kernel": "6.6.60-1-MANJARO",
        "form_factor": "laptop",
        "install_date": "2024-05-05T09:46:53+00:00",
        "product_name": "HERO-RPL-RTX",
        "product_family": "RPL",
        "sys_vendor": "SLIMBOOK",
        "board_name": "HERO-RPL-RTX"
    },

Here we have the used Kernel, form factor of your device, when Manjaro was installed on the device plus some manufacturer data. This data you will find in the following graphs:

image

CPU & Memory

    "cpu": {
        "arch": "x86_64",
        "model": "13th Gen Intel Core i7-13620H",
        "cores": 10,
        "threads": 16
    },
    "memory": {
        "ram_gb": 15.36336898803711,
        "swap_gb": 16.901763916015625
    },

Here we collect basic data about the CPU used in your system and how much memory you have installed.

image

Here we see that we have more AMD CPUs and a tendency to higher multicore CPUs installed on Manjaro systems.

image
Installed RAM modules showcase us that most have mid-range 16 GB to 32 GB installed.

Graphics, Outputs and Desktop

    "graphics": {
        "comp": "xfwm4",
        "dri": "iris",
        "gpus": [
            {
                "vendor": "AIstone Global",
                "model": "Intel Raptor Lake-P [UHD Graphics]",
                "driver": "i915"
            },
            {
                "vendor": "AIstone Global",
                "model": "NVIDIA AD107M [GeForce RTX 4060 Max-Q / Mobile]",
                "driver": "nvidia"
            }
        ],
        "outputs": [
            {
                "model": "BOE Display 0x0b40",
                "res": "2560x1440",
                "refresh": 165.0,
                "dpi": 189.0,
                "size": "344x194",
                "mapped": "eDP-1"
            }
        ]
    },
    "desktop": {
        "cli": "/bin/bash",
        "gui": "Xfce",
        "dm": "LightDM",
        "wm": "xfwm4",
        "display": "x11",
        "display_with": "Xwayland"
    }

This gives us an idea on which type of GPUs are used and how.

image
We see that X11 is still important and 1080p is the most used resolution so far.

image
AMD GPUs are the most dominant, followed by Intel and Nvidia. And those who participated so far are using Plasma Desktop

Location and Language

    "locale": {
        "region": "en_US.UTF-8",
        "language": "en",
        "timezone": "Asia/Ho_Chi_Minh"
    },

With this we can find out in which area you relatively are from.

image
People tend to use their region setting and timezone, language may differ and is a preference of the user.

Package usage

    "package": {
        "last_update": "2024-11-12T12:19:08+07:00",
        "branch": "unstable",
        "pkgs": 1871,
        "foreign_pkgs": 28,
        "pkgs_update_pending": 4,
        "flatpaks": 0,
        "pacman_mirrors": {
            "total": 7,
            "ok": 7,
            "country_config": ""
        }
    },

This information explains how often a system gets updated, which branch is used, how many packages got installed and if additional package types like flatpaks or snaps are used.

image
So far most are using the stable branch, but also testing and unstable branches are used.


We know now how this data is presented on the graphs and why they are useful for the project and us as a Linux community.

In a week we have roughly 2,526,932 Manjaro users. About 500.000 are at the same time online. Only 88,118 out of them 29.800 have a forum account and only 516 of them were active the last 24 hours, 2.3k the last 30 days. So having mdd properly implemented we give them also a voice without the need to use other services such as the forum or instant messaging chats we also offer.

All in all we want to improve this project and concentrate on things which matter.

When Canonical introduced their Ubuntu Report tool with 18.04 release some interesting data was shared with the community:

  • 66% of Ubuntu users chose to share data with Canonical
  • 98% of all Ubuntu 18.04 installs were 64-bit
  • 12% of users chose the ‘minimal’ install option
  • 91% choose to download and update software whilst installing Ubuntu

Sadly I can’t find any live statistics from Canonical anymore to know what today the case would be. However Canonical shared the following in 2023 about their user base:

  • Ubuntu Desktop has more than 6 million monthly active users (based on devices checking for desktop-specific updates and not including those behind a corporate firewall or proxy).
  • Ubuntu Desktop is by far the most popular Linux distribution for developers (~27% in the 2023 Stack Overflow developer survey).
  • Ubuntu Desktop is the most used desktop Linux distribution for gaming (when you include older LTS and interim releases grouped inside the ‘Other category’ on the Steam hardware survey).
    `
11 Likes

I think that might be one of those grey areas that nobody cares too much about for the sake of statistics, generally. But, certainly without accurate stats (within a margin for error) statistics alone are useless without as many as possible participating.

Invalid statistics → less financial support → less innovation.

Everyone is affected by this in some way, in today’s world.

2 Likes

If I want to purchase a product, whether online or in a brick and mortar store, I weigh several things before I pull out my credit card.

  • Is this a product I want enough to give out my personal information?
  • Is this a merchant I trust to properly use my personal information (my credit card number)?

Sometimes the answers are not so easy, sometimes there is conflict, and often there are no hard and fast yes or no answers. Take for example my decision to use Gmail. It’s a product that I have found useful; it does exactly what I want to do and it does it for the most part really well. However, I’m not overly trusting of Google and what they may or may not be doing with all the information that passes back and forth in my emails. But I like using the product enough that I’m willing to turn a blind eye to that uneasiness.

Microsnot is a different story. Their products have become useless to me, insulting almost. There is a feeling I get of an arrogance for the user, one of “I’m taking your data whether you like it or not; I’m not telling you what I’m taking and I will use it however I feel like using it in order to make as much profit off you as I possibly can”. I don’t care for their product, I will not use it and I have no trust or respect whatsoever for Microsnot. I would do whatever I can to keep my data from this kind of company.

Manjaro is something else. I’ve come to really like this OS. The forum has been truly helpful as I stumble and bumble along trying to learn whatever I can. Updates roll along and if there are problems found, the actual Manjaro developers themselves get directly involved to fix the troubles. In the handful of years that I have had Manjaro running on my computers I have had very positive feelings for the product and I have come to trust the “company” that offers it. And none of this is contradicted or changed in any way by @philm 's explanation of what the data collection is for. To me, it’s quite clear that this data will be used to improve Manjaro, both financially and technically. That being said, I would also take the side of using opt-out over opt-in because, first and foremost, I trust the Manjaro developers to use the captured data to improve the product and more data makes for better data.

I know that this topic has been discussed at times quite heatedly and so I want to make it clear that my post is made not to start any more arguments but merely to offer my perspective and my opinion.

4 Likes

At least for me, the amount of data you are planning to collect seems alright and reasonable, no doubt about that. But it does not change that I strictly oppose data collection without explicit given consent. And opt out does not reflect explicit given consent for me.

I also understand the benefits of data collection for development and (let’s call it) marketing of a product.

2 Likes

Opt-out will definitely burn you, as a distro and probably legally as well. Opt-in is the only real option. The question then is, how.

Again. Make a little desktop app the user can open to see these nice statistics. Tie access to this data to the user enabling data sharing in the app (which then activates the MDD service). During installation, make a nice slide show with pictures inviting the user to enable MDD to be able to see these stats at any time.

EDIT: Later… extend the app to add personal system/hardware recommendations. It might sound silly, but Marketing throughout the history has shown that even a little trinket can drive people to participate.

5 Likes

So did you guys had in mind to use this Telemetry daily/weekly/monthly or once a year? Obviously not one time only, when i reading your statement correctly.

And how will this report be triggered? Automatically or manually… there is still a lot questions unspoken.

This is ideal, having it be an Opt-in option where you explain that this data is useful to Manjaro and provide the end user the ability to view this data, very few people would take issue with this.
The key problem is forcing this to be Opt-out. I have willingly submitted to the Steam hardware surveys, because they asked nicely and, in my mind, they have earned my trust. I have been happily using Manjaro as my main OS since abandoning Windows, but forcefully taking data about my PC would make me switch to another distro.
A polite request whilst offering something extra in return for the data collection is far better optics. Even if it is just a pop-up that appears after updating that defaults to the Yes option.

2 Likes

While probably a lot of work, I think this is actually a great idea. :+1:


In a way, there is something to be said — at least, if the user gives their consent — in favor of running it after each major update, regardless of the branch one is on. After all, due to the nature of Manjaro as a curated rolling-release distribution, unmaintained systems are not supported and cannot be relied upon.

On the other hand, excluding unmaintained systems from the statistics may or may not be useful in effectively furthering Manjaro’s success, and the support from hardware vendors and third-party software vendors — the latter being of no interest to myself personally, as an advocate of Free & Open Source Software.

As a suggestion — and merely that — I would therefore recommend running mdd once a month or possibly every fortnight, but no sooner than that. We should also try to avoid too large a volume of submitted statistics (and the associated traffic) so as to not skew the results and make it harder to weed out the junk data, as has obviously been submitted already during the completely voluntary trial run this week.

I also still firmly believe in the viability of making all this opt-in — as is legally required within the EU — as long as Manjaro’s honesty, transparency and integrity are sufficiently highlighted in the ultimately GUI-based version of mdd.

As to how it will ultimately be implemented at the GUI level — but now I’m wandering off-topic again — there is of course the difficulty presented by the fact that there are so many different desktop environments.

As a Plasma user, I would propose a kcm module, but given that the Manjaro Settings Manager module and the systemd module have yet to be ported to Plasma 6, I can already foresee the difficulties for the team.

So, a standalone application, added to the Autostart of each of the XDG-compliant desktop environments — and with a self-disabling function if the user does not opt in — may be the way to go for now.

But as I said, the technical aspects of how to implement it are off-topic for now. I do however like @SyMutex’ idea.

2 Likes

What happened to the public dashboard?

1 Like

Welcome to the forum! :vulcan_salute:

The dashboard has temporarily been paused, because a few jokers thought it was funny to corrupt the data uploaded by mdd with invalid information, such as non-existing filesystems, et al.

The server-side code will have to be adapted to weed out nonsensical information from the results.

4 Likes

a few jokers? That’s a very polite assumption…

I do get the feeling that they aren’t jokers - but ex-forum members, joined by a small minority of Arch users, who have nothing better to do than to pursue their vendettas pushing the agenda heavily via reddit and Youtube (Brodie joined the club with a large headline shouting 'Manjaro OPT OUT telemetry).

This has to be the #2 bug in Linux: a large proportion of it’s community are utter @55H0Les

3 Likes

I wonder what the take on Fedora F42 will be when their metrics get implemented …

3 Likes

But it should also be a reminder that the first step in infosec is to rigorously scrub input for validity. Particularly in open source. The fact that this wasn’t happening is perhaps greater cause for concern than the opt-in/out arguments.

Who knew anyone would read the source and modify it towards their own ends? :crazy_face: Miscreants are gonna miscreate… Or something.

3 Likes

OT:

Have you ever thought of moving to Proton Mail, it gives you everything Gmail does, and privacy as well.

1 Like

Well, this takes it into a more technical direction, but the vulnerability in this case lays with the fact that mdd is a Python script. This makes it very transparent, but also very easy to modify/sabotage. And in order to avoid that, a compiled language such as C/C++, Rust or even Pascal would have been a better choice.

The problem with that, however, is that we’re then probably going to see even greater levels of paranoia and drama, because then they’d be looking at a binary tool, and then they would have to scrutinize the code via GitLab, GitHub, SourceForge, or some other platform.

Again, I’ve already briefly alluded to it higher up the thread, but a few days ago, we had a headless chicken running around the forum in a frenzy, refusing to hear the explanations given by multiple people, refusing to look at the data on the dashboard page, refusing to look at the code, flagging every single reply they got on their thread as “inappropriate” — I kid you not :man_facepalming: — and then starting a separate thread in which they were telling everyone on the forum to stop using Manjaro because “it is spyware.”

All I can say is this: whatever you do, somebody somewhere isn’t going to like it. And if anything, it is also quite evident that some people have quite a few more serious bugs going on between their keyboard and their chair than inside their computer housing.

:man_shrugging:

3 Likes

Yes, a while ago I did check into that possibility. For sure, there are features I like but one of the things about Gmail is the cost – nothing. (Well, I suppose the value of my privacy is worth something…) The thought of the cost and the time and energy of converting about a dozen Gmail boxes linked and inter-linked over to Proton made me put it off to the side. The relatively limited storage given to a free Proton account as compared to Gmail’s also would squeeze me a bit. I haven’t completely given up on Proton but until I get control of my packrat tendencies, I’ll stay status quo. Thanks a lot for the suggestion though!

1 Like

Is it still viewable. I missed it.

:crazy_face:

1 Like

For the excersise - not expecting it to be used - I actually did mock a data-collection inteface - not a lot of work - A few days ago I ripped a day out of my calender - reworked some of the details yesterday

3 Likes