We host Matomo (formerly called Piwik) ourselves. And we also host the fonts we use ourselves. Since we are a healthcare based startup we prefer not to share any data outside of our controlled servers.
We even disabled the cookie based tracking inside Matomo at the cost of not linking different visit sessions. Same session visits are fully tracked though. Saves us a cookie warning.
Funny thing... I went on their site (fr.matomo.org here in France) using Safari.
All images are not displayed (? on each images). Tried on Firefox, displays the images fine...
Checked what kind of images are these, all .webp ! :D
They have improvements to do if they want to be "google free" themselves...
This is the way! Glad you went that way, still struggling to get everything set up like this for our company. But marketing will come around the corner soon... ;-)
It looks like self-hosting Posthog (https://posthog.com/) should work, and they look great.
They're a US company, so you can't use their cloud service, but it's designed to be self-hosted and they have a list of EU cloud providers so you can do 100% EU-based self-hosting if you want: https://posthog.com/docs/self-host/deploy/hosting-in-eu
What type of server specs (memory, CPU, disk size, etc.) do you use to self host it?
Based on an open issue[0], it's suggested to run a server with 32GB+ of memory to handle hosting Clickhouse but that would mean self hosting Plausible would end up being $160 / month on DigitalOcean which would make it 10x more expensive than hosting my custom app that I want to see analytics for.
I know you can use less memory but it sounds like using less can result in an unpredictable environment where everything can stop working at any given moment depending on what Clickhouse wants to do. This happened to someone who replied in that issue. Their production set up stopped working because it ran out of memory.
Someone else wrote about it using close to 8GB of disk space to track ~8k page views at https://cyberhost.uk/plausible-3-month-review/. That was only written back in March 2021 too. They said they are going to look for an alternative solution because the the storage costs are too high.
Clickhouse has got a lot better in limited memory environments. They now recommend 4GB minimum.
The production environment that crashed due to Clickhouse OOM was our hosted product a while ago :) After that, we haven't had any downtime on our Clickhouse DB for over a year.
The issue with disk space stems from a bad default configuration. Clickhouse used to have EXTREMELY noisy debug level logging enabled by default with no rotation. This has been fixed in our hosting repo[1] so you get sensible defaults.
If you don't want to worry about downtime, planning disk space or compute capacity, then that's exactly what we offer at https://plausible.io. We process and keep the visitor data on our Hetzner servers in Germany.
The Clickhouse instance run on a Render[0] "Standard" private service. So 1 CPU (no idea what that means), 2GB of RAM, and a 10 GB disk. So far I've been using 10% of the disk and it's not growing very much.
Plausible founder here. There's nothing automatic but you can track your campaigns with utm_campaigns manually.
Google has made sure that analytics for Google Ads works best within their own walled garden. Same with Facebook and Twitter with their Pixel products.
Instead of using the Referer header or utm parameters as intended, these large corps send obtuse random IDs (gclid, t.co/<id> links) which only they can correlate to an ad, search query or tweet using their internal database.
So until there is anti-trust action in this space towards more oppenness and competition, you're stuck with the ad provider if you want tight integration between ads and analytics.
Self hosted Matomo/piwik is pretty good. You probably want to make sure it's on servers in the EU owned by a EU company (Hetzner, OVH, Griscale, etc). Alternatively you can configure it in a way that avoids collecting PII [1] (which also removes the need for consent popup, privacy policy etc). You won't get much info about repeat visitors that way, but I imagine it's quite usable for many use cases.
It's only ok if you self-host on a server in the EU, right? It'll be interesting when different regions of the world start having mutually exclusive laws about where data has to be stored.
Self-hosting something is always going to be less complex, but you'll still need to determine what you're tracking and why, write that down in a form people can understand easily, and let people opt in explicitly (with a just-as-easy way to opt out later).
People don't have to opt in for you to keep the data for technical reasons, for instance if you keep IP addresses for while to find and block abuse, but you can't keep data longer than strictly necessary and can't use the data for other purposes than you declared beforehand.
Write down your policies and put them in an (again, easy to read, understand and find) privacy statement and you should be pretty much GDPR-proof.
I track page view counts as simple sums, and it's not feasible to drop an individual user's page counts because I don't have enough info to identify a unique user. In fact, I put no cookies on the user's machine (but that means I have no way to identify a specific user for opt-out purposes for these aggregated page counts).
I'm the creator of Fugu (https://github.com/shafy/fugu), if you're looking for an event-based analytics solution that is open-source, free and self-hostable. Fugu doesn't track unique users, just anonymous events.
I also offer hosted version if you don't want to deal with hosting (currently using Digital Ocean with their Frankfurt data center, but will switch to an EU company soon).
I just started using Goatcounter for a noncommercial site (music history research blog) and I'm happy with it. All I wanted was a glorified hit counter.
It doesn't have the goal conversion metrics and other advanced features of GA, so obviously not a drop-in replacement for all use cases.
Nobody is going to get in “legal hot water” on account of Google Fonts or Google Analytics unless they’re Google themselves or a top 10 ecommerce company some politician wants to make an example of. There’s millions of sites relying on those things.
Is the EU going to drag them all into court?
This is like saying you never jay walk because you want to avoid the legal hot water. The water isn’t even lukewarm!
I would venture most of the internet is not hosted in the EU. You expect US, Chinese, and Japanese citizens to respect an EU fine for a law they have no say in? Sure they are doing "business" in the EU, but many of them are not doing business at all.
> You expect US, Chinese, and Japanese citizens to respect an EU fine for a law they have no say in?
No. What is the EU going to do, besides nothing? If you do business in the EU they will take your business away, and if you don't there's nothing they can do. I'm sure we all break some foreign countries laws every day and there's nothing they can do about it.
I do expect fines to be handed to EU companies and I expect them to pay them though.
> I would venture most of the internet is not hosted in the EU
Most content isn't made in the US, and the US somehow still forced its copyright system on the world.
Not the EU itself... but your competitors, who can not just complain at your respective data protection agency but also file for c&d letters, court injunction orders or penalties.
> The ruling directs the website to stop providing IP addresses to Google and threatens the site operator with a fine of €250,000 for each violation, or up to six months in prison, for continued improper use of Google Fonts.
So, if you feel brave you can challenge some courts on this.
GDPR mechanisms are directed at pushing you towards compliance, not getting big payouts. So in many cases you can even avoid any fine if you cooperate on first notice.
> I've already offloaded Google Fonts due to the German ruling. I'm happy to self-host piwik if needed, but could that fall foul of regulators?
Well... if you self-host Piwik or Matomo, you're relatively safe and you can avoid a lot of the bureaucracy bullshit that you'd have with external services.
However, check with a lawyer before setting it up, and definitely get user consent for detailed tracking. There are basically two camps of thought how much is allowed without explicit user consent: the more strict camp (which I belong to) believes that it is illegal to even use technically required data (like IP address, browser agent, date/time of visit, URL/query parameters) for analytics of any kind. The other camp is more relaxed and believes that it is OK to conduct basic analytics on that data (justified as "legitimate interest" of the site operator to provide a good experience to the user), but don't set anything like cookies or localStorage that could allow detailed tracking.
It is not yet clear by a supreme court decision which school of thought is going to win out - personally, I follow the requirement of data minimization per Art. 5 Nr. 1 lit c) EU-GDPR. Data that you do not have cannot be stolen, seized, abused or used as justification for fines, after all.
If the web-page's javascript ONLY stores and processes data stored in the client's localStorage to generate the local page, and sends nothing back to the server, so the web-site operator never sees that data, then is the web-site operator processing that data, or is it only the user-agent's operator ?
The web-site operator certainly wouldn't be a "data controller" since it isn't collecting or storing the data. And it's hard to see how the web-site operator would be a "data processor" in that circumstance.
Never thought about that scenario, I only mentioned localStorage or sessionStorage because it has been abused in the past to get around tracking blockers and to create "supercookies".
I've just asked the UK ICO for advice and got a confirmation it wouldn't be considered as a data controller or processor. I gave this example:
Me: "Effectively, in my case, the user is adding 'post-it' notes of their own devising that remain 'sticky' so the next time they visit the same page they'll see their own notes - but those notes are never sent to the server"
Me: "It's effectively the same circumstance as a classical computer program being downloaded by the user, and then used (locally) to create/save files on their local device. In that case you wouldn't consider the author of the computer program to be the data controller, surely?"
ICO (Flynn): "Flynn: Okay that sounds reasonable."
ICO (Flynn): "So if your product/service is not dependant on personal data and you are not processing it then you appear to not be captured by data protection legislation."
I've already offloaded Google Fonts due to the German ruling. I'm happy to self-host piwik if needed, but could that fall foul of regulators?