Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is anyone using an alternative that provides some basic analytics and isn't likely to get me in legal hot water in the future?

I've already offloaded Google Fonts due to the German ruling. I'm happy to self-host piwik if needed, but could that fall foul of regulators?



We host Matomo (formerly called Piwik) ourselves. And we also host the fonts we use ourselves. Since we are a healthcare based startup we prefer not to share any data outside of our controlled servers.

We even disabled the cookie based tracking inside Matomo at the cost of not linking different visit sessions. Same session visits are fully tracked though. Saves us a cookie warning.


Funny thing... I went on their site (fr.matomo.org here in France) using Safari. All images are not displayed (? on each images). Tried on Firefox, displays the images fine... Checked what kind of images are these, all .webp ! :D They have improvements to do if they want to be "google free" themselves...


This is the way! Glad you went that way, still struggling to get everything set up like this for our company. But marketing will come around the corner soon... ;-)


It looks like self-hosting Posthog (https://posthog.com/) should work, and they look great.

They're a US company, so you can't use their cloud service, but it's designed to be self-hosted and they have a list of EU cloud providers so you can do 100% EU-based self-hosting if you want: https://posthog.com/docs/self-host/deploy/hosting-in-eu


I've been using [Plausible](https://plausible.io) in its self-hosted version for about a month, on a 7M+ page views per month. So far so good


What type of server specs (memory, CPU, disk size, etc.) do you use to self host it?

Based on an open issue[0], it's suggested to run a server with 32GB+ of memory to handle hosting Clickhouse but that would mean self hosting Plausible would end up being $160 / month on DigitalOcean which would make it 10x more expensive than hosting my custom app that I want to see analytics for.

I know you can use less memory but it sounds like using less can result in an unpredictable environment where everything can stop working at any given moment depending on what Clickhouse wants to do. This happened to someone who replied in that issue. Their production set up stopped working because it ran out of memory.

Someone else wrote about it using close to 8GB of disk space to track ~8k page views at https://cyberhost.uk/plausible-3-month-review/. That was only written back in March 2021 too. They said they are going to look for an alternative solution because the the storage costs are too high.

[0]: https://github.com/plausible/docs/issues/67


Clickhouse has got a lot better in limited memory environments. They now recommend 4GB minimum.

The production environment that crashed due to Clickhouse OOM was our hosted product a while ago :) After that, we haven't had any downtime on our Clickhouse DB for over a year.

The issue with disk space stems from a bad default configuration. Clickhouse used to have EXTREMELY noisy debug level logging enabled by default with no rotation. This has been fixed in our hosting repo[1] so you get sensible defaults.

If you don't want to worry about downtime, planning disk space or compute capacity, then that's exactly what we offer at https://plausible.io. We process and keep the visitor data on our Hetzner servers in Germany.

1. https://github.com/plausible/hosting


The Clickhouse instance run on a Render[0] "Standard" private service. So 1 CPU (no idea what that means), 2GB of RAM, and a 10 GB disk. So far I've been using 10% of the disk and it's not growing very much.

[0]: https://render.com


I also just deployed plausible on Fly.io I wrote a [blog post](https://intever.co/blog/plausible-self-hosted-with-fly) and a created a [github](https://github.com/intever/plausible-hosting) repo to document the process


Works fine for me as well, though I use the hosted version (not a high volume site atm).


The powerful thing about GA is the link with Google Ads, does that work nice for Plausible as well?


Plausible founder here. There's nothing automatic but you can track your campaigns with utm_campaigns manually.

Google has made sure that analytics for Google Ads works best within their own walled garden. Same with Facebook and Twitter with their Pixel products.

Instead of using the Referer header or utm parameters as intended, these large corps send obtuse random IDs (gclid, t.co/<id> links) which only they can correlate to an ad, search query or tweet using their internal database.

So until there is anti-trust action in this space towards more oppenness and competition, you're stuck with the ad provider if you want tight integration between ads and analytics.


Self hosted Matomo/piwik is pretty good. You probably want to make sure it's on servers in the EU owned by a EU company (Hetzner, OVH, Griscale, etc). Alternatively you can configure it in a way that avoids collecting PII [1] (which also removes the need for consent popup, privacy policy etc). You won't get much info about repeat visitors that way, but I imagine it's quite usable for many use cases.

1: https://matomo.org/faq/new-to-piwik/how-do-i-use-matomo-anal...


The cnil.fr page hosting this article seems to use self-hosted piwik, which is a good sign that the regulators think it's ok.

(I wonder why they need to collect analytics information for this page at all.)


It's only ok if you self-host on a server in the EU, right? It'll be interesting when different regions of the world start having mutually exclusive laws about where data has to be stored.


>It's only ok if you self-host on a server in the EU, right?

In the EU/EEA or in a jurisdiction that has adequate level of data protection.


Self-hosting something is always going to be less complex, but you'll still need to determine what you're tracking and why, write that down in a form people can understand easily, and let people opt in explicitly (with a just-as-easy way to opt out later).

People don't have to opt in for you to keep the data for technical reasons, for instance if you keep IP addresses for while to find and block abuse, but you can't keep data longer than strictly necessary and can't use the data for other purposes than you declared beforehand.

Write down your policies and put them in an (again, easy to read, understand and find) privacy statement and you should be pretty much GDPR-proof.


What's the rule for aggregated data?

I track page view counts as simple sums, and it's not feasible to drop an individual user's page counts because I don't have enough info to identify a unique user. In fact, I put no cookies on the user's machine (but that means I have no way to identify a specific user for opt-out purposes for these aggregated page counts).


I am not a legal advisor, but I believe the matter is settled by what you said:

> I don't have enough info to identify a unique user

If it is not user identifying information, then it should not be an issue.


I'm the creator of Fugu (https://github.com/shafy/fugu), if you're looking for an event-based analytics solution that is open-source, free and self-hostable. Fugu doesn't track unique users, just anonymous events. I also offer hosted version if you don't want to deal with hosting (currently using Digital Ocean with their Frankfurt data center, but will switch to an EU company soon).


I just started using Goatcounter for a noncommercial site (music history research blog) and I'm happy with it. All I wanted was a glorified hit counter.

It doesn't have the goal conversion metrics and other advanced features of GA, so obviously not a drop-in replacement for all use cases.

https://www.goatcounter.com/


Another very happy user here. Was super easy to add to my Jekyll site hosted on GH pages. I believe the creator is active here as well btw.


happy goatcounter user here to, for the same reasons as you say, way less complex than GA but it has more metrics I care about.


I think that self-hosting is the way to go, get a server in your own region/country and don't send the data to any 3rd party.


This roundup has a lot great & lightweight options[0].

[0]: https://stackdiary.com/open-source-analytics/


We’re using our own logs with https://goaccess.io processing over 300M requests a month with no issues.

No privacy issues to worry about using trackers.


If your logs are storing IP addresses without consent from users, you are probably (IANAL, but heard this from lawyers) infringing GDPR.


Yes! I'm currently using https://usefathom.com/, works pretty great


We decided to go for (selfhosted) Umami[0] but don't have it in production yet.

It is not really a replacement for GA though, it collects much less data. We've decided it is enough for us.

[0] - https://umami.is/


Take a look at Redistats that I built in 2013, privacy policy: https://redistats.com/privacy-policy


https://usefathom.com/ (what we use), plausible.io, umami.is


Check out Pirsch Analytics: https://pirsch.io


Nobody is going to get in “legal hot water” on account of Google Fonts or Google Analytics unless they’re Google themselves or a top 10 ecommerce company some politician wants to make an example of. There’s millions of sites relying on those things.

Is the EU going to drag them all into court?

This is like saying you never jay walk because you want to avoid the legal hot water. The water isn’t even lukewarm!


> Is the EU going to drag them all into court?

Why would they need to? Just hand out fines, like you do with traffic tickets, no courts required.


I would venture most of the internet is not hosted in the EU. You expect US, Chinese, and Japanese citizens to respect an EU fine for a law they have no say in? Sure they are doing "business" in the EU, but many of them are not doing business at all.


> You expect US, Chinese, and Japanese citizens to respect an EU fine for a law they have no say in?

No. What is the EU going to do, besides nothing? If you do business in the EU they will take your business away, and if you don't there's nothing they can do. I'm sure we all break some foreign countries laws every day and there's nothing they can do about it.

I do expect fines to be handed to EU companies and I expect them to pay them though.

> I would venture most of the internet is not hosted in the EU

Most content isn't made in the US, and the US somehow still forced its copyright system on the world.


You can sue, as said google fonts case awarded damages.

I'm now wondering if I can scale this for profit.


> Is the EU going to drag them all into court?

Not the EU itself... but your competitors, who can not just complain at your respective data protection agency but also file for c&d letters, court injunction orders or penalties.


Some courts beg to disagree with your position: https://www.theregister.com/2022/01/31/website_fine_google_f...


Oh, wow, didn’t realize 1 website had been fined $100. The legal water is boiling!


The fine is only $100 if your lawyers and legal team work for free.


You must have missed this part:

> The ruling directs the website to stop providing IP addresses to Google and threatens the site operator with a fine of €250,000 for each violation, or up to six months in prison, for continued improper use of Google Fonts.

So, if you feel brave you can challenge some courts on this.


No, I didn't miss that part. "Next time, I'll really punish you" rarely works until there's actual consequences.


There are actual consequences: https://www.dsgvo-portal.de/gdpr-fine-database.php (I think I have seen one of those databases somewhat more official before)


This is basically a 'we are watching you' warning, second time the fine will be different


Yeah, that's definitely a slap on the wrist. But now that website needs to stop doing that, or it would face actual consequences.


GDPR mechanisms are directed at pushing you towards compliance, not getting big payouts. So in many cases you can even avoid any fine if you cooperate on first notice.


It's per claimant. That would be a $15bn Equifax settlement.


Show me a single site that relies on google analytics.


Somehow I'm not surprised that my choice of words was jumped on. Let's say "making use of" to keep further pedantry at bay.


www.airbnb.com


> I've already offloaded Google Fonts due to the German ruling. I'm happy to self-host piwik if needed, but could that fall foul of regulators?

Well... if you self-host Piwik or Matomo, you're relatively safe and you can avoid a lot of the bureaucracy bullshit that you'd have with external services.

However, check with a lawyer before setting it up, and definitely get user consent for detailed tracking. There are basically two camps of thought how much is allowed without explicit user consent: the more strict camp (which I belong to) believes that it is illegal to even use technically required data (like IP address, browser agent, date/time of visit, URL/query parameters) for analytics of any kind. The other camp is more relaxed and believes that it is OK to conduct basic analytics on that data (justified as "legitimate interest" of the site operator to provide a good experience to the user), but don't set anything like cookies or localStorage that could allow detailed tracking.

It is not yet clear by a supreme court decision which school of thought is going to win out - personally, I follow the requirement of data minimization per Art. 5 Nr. 1 lit c) EU-GDPR. Data that you do not have cannot be stolen, seized, abused or used as justification for fines, after all.


Interesting that you mention localStorage.

If the web-page's javascript ONLY stores and processes data stored in the client's localStorage to generate the local page, and sends nothing back to the server, so the web-site operator never sees that data, then is the web-site operator processing that data, or is it only the user-agent's operator ?

The web-site operator certainly wouldn't be a "data controller" since it isn't collecting or storing the data. And it's hard to see how the web-site operator would be a "data processor" in that circumstance.


Never thought about that scenario, I only mentioned localStorage or sessionStorage because it has been abused in the past to get around tracking blockers and to create "supercookies".


I've just asked the UK ICO for advice and got a confirmation it wouldn't be considered as a data controller or processor. I gave this example:

Me: "Effectively, in my case, the user is adding 'post-it' notes of their own devising that remain 'sticky' so the next time they visit the same page they'll see their own notes - but those notes are never sent to the server"

Me: "It's effectively the same circumstance as a classical computer program being downloaded by the user, and then used (locally) to create/save files on their local device. In that case you wouldn't consider the author of the computer program to be the data controller, surely?"

ICO (Flynn): "Flynn: Okay that sounds reasonable." ICO (Flynn): "So if your product/service is not dependant on personal data and you are not processing it then you appear to not be captured by data protection legislation."


i am working on splitbee.io :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: