I worked the last few days on a new Manjaro project: Manjaro Data Donor - short MDD.
It is a way for us to gather a few usage statistics about Manjaro.
The motivation for that at the start was to improve our user counting. Until now what has been done, was counting systems via ping.manjaro.org. These pings are sent from Manjaro systems via the NetworkManager.
There were some problems with this approach though:
- Individual systems were only distinguished on basis of IP address. This doesn’t allow statistics over time and systems behind the same NAT are counted as one. Also one needs to store the IP address at least for a short period of time. The analysis software that was used for that was Matomo, and they promise IP addresses are masked, but we still had to rely on this promise.
- Matomo is a rather bulky tool and wasn’t really made for system telemetry. It is meant for website analysis. The setup therefore was kind of hacky while the results rather meager and the data was only available to few people.
- Using NetworkManager pings to check the online status for user counting is acceptable but also not what it was meant for, nor was it communicated as such. I think it’s better to be explicit and transparent about these kind of things.
So I wanted to improve upon that for quite some time now. MDD is now the tool for that and some more as it will also provide interesting hardware and environment statistics about the systems that Manjaro is being used on.
How to Help
You can install the tool simply as a package: sudo pacman -S mdd
It’s a simple Python script. You can check out the source code on GitHub. For the actual data retrieval it mostly uses the excellent inxi internally. Shoutout to @h2-1!
If you can, please try it out and tell me about any bugs you encounter. From preliminary tests we did, it works most often fine on x86, but ARM is not yet handled correctly.
Your data will be sent to a ClickHouse database I deployed on one of our Hetzner servers (Nuremberg, Germany). After the testing is done I’ll delete all data again.
Before sending the data, you can dry-run the tool to see what would be sent: mdd --dry-run
. If you’re fine with the data being transmitted then just run MDD again, this time without any arguments to submit the data: mdd
If you encounter bugs, then run mdd --log DEBUG
to get additional logs in your terminal.
You can study visualization of the data we received at: Manjaro Metrics
Next Steps
In the next few days we’ll do some more testing and if results are positive, I plan on installing it on all Manjaro systems and adding a systemd service to submit the data automatically.
As a reminder: Right now you have to install MDD manually and there is no systemd service yet.
With this systemd service later in place, sending the hardware data with MDD will be opt-out because I believe, if you do opt-in, the data you gather will be so heavily skewed you can just leave it be.
Let me know what you think. I know telemetry is a contentious subject, but we need at least some data about how Manjaro is being used by so many people around the world in order to show that the project has a future and also to plan for that future.