> Email addresses published on webpages usually need to be protected from email-harvesting spambots.
Do they though?
I have had my email address published on my website in a <a href="mailto:… for like 20 years and I don't get spam that would get through the spam filter.
I use both Gmail and (for some other addresses) a webmail hosted by a local company which uses some other filter. Both work well, so it's not something only Google can do.
I definitely recall in the early 2000's it absolutely did lead to spam, and e-mail obfuscation techniques were a real thing that genuinely helped.
But by 2015 or so it didn't matter at all anymore, in my personal experience. It didn't even lead to spam that needed to filtered. Spammers just stopped looking for e-mails that way.
Which makes perfect sense -- most people don't have their e-mail address listed anywhere online in the first place, but you can purchase gigantic lists of e-mail addresses. That either originate from companies that sell their own user lists, or people who hacked the companies' servers.
These days if you want to send spam, trawling the web for e-mails makes zero sense. It's practically the least efficient thing you could do.
I’ve been having all my email addresses posted plain text since like 2005 and I’ve signed up on like every website imaginable (my password manager has over 2,000 entries) and I’ve never had a spam problem, at least on Gmail.
I have two people I designed web sites for in the last year and I put both their email addresses in the footer and neither one of their accounts has received a single spam message in all of that time (not even something dropped into the Spam folder). Both sites are popular and have thousands of visitors and get scraped by every search engine and AI bot you can think of.
Interesting. Maybe footer emails tend to be support contact addresses rather than personal inboxes. Otherwise I’d find that discrepancy very surprising.
My thoughts exactly. On the other hand, an email address I used with Usenet ca 1999–2001 has had a consistent flood of spam. I think most spammers are using the same 20+-year-old list of emails.
The email address on my website doesn’t even get stuff that goes to the spam filter. Nothing, nada zilch.
I do think that there are some mailing lists that get generated by trying to guess emails, brute-forcing gmail addresses by trying dictionary attacks of the FIRSTNAME.LASTNAME variety or 1–10 letters. I get a tiny amount of spam sent to a domain@domain.com address I have, but that’s typically on the order of one message a year.
And all else aside, the overall volume of spam email has declined dramatically, even ignoring the effect of the gmail spam filter. I’m guessing that email as a spam vector just doesn’t make sense anymore and most of what goes out is a mix of 419 scammers trying to make their quotas and would-be scammers who’ve been scammed into buying that 20-year-old list of emails.
The practice of email address "obfuscation" feels like a relic of a bygone era, one that was never actually sound in its methodology, but spread. A form of cargo-cultism has kept it alive
Yeah just looking at this, it appears to add about 1K of overhead and at least one additional http request for something that ultimately boils down to a mailto: link, so it can still be scraped, and just adds bloat to your web page.
My preference is to not have my email harvested at all when possible, even if I don't personally see the spam emails. (I'm not saying it's a critical privacy/security issue, but a preference.)
My experience is that you send email to someone whose Op Sec is not as good as your own, then your email will be harvested at the point when that person's address book is harvested. I don't know all the details of how these harvests occur, but using a shady mobile app with Contacts permission would be enough.
Your email address cannot be (and isn't) secret, if you give it to other people (regular people, i.e. friends, colleagues, etc.) so they can send you emails. If you don't want your email harvested, you can never use it (at least to receive emails).
I've also had my email posted in mailto's in a half dozen places for... a long time. I remember in the early 00's when I'd cargo cult the old "type the whole email out as adrian at adrianpike dot com" thing on forums thinking it would work as some mystical talisman, and it turns out considering emails to be secret isn't worth the time.
this used to be a problem in the early 00s. I don’t think spam filtering was as good back then so protecting your public email from spam was necessary.
Also this was a time when mail boxes were often allocated 10-25 megabytes. So spam bots could easily flood your email.
Agreed. I have several web sites with publicly visible email addresses and they don't get much spam.
The spam I get is rather mis-targeted. For a while I was getting spam for equipment which would be useful were I a bulk producer of olive oil. "We have 15 years of experience in the research, development and production of automatic edible oil filling equipment...." There are the usual fake financing deals: "We’ve pre-approved your business for financing..." Whatever sends that crap doesn't look at the web site at all.
When I get spam from Gmail or Outlook accounts, I report it, so they will get a strike against their account. I don't hear from those people again.
All other spam is so obviously bogus that simple filters are dumping it into a junk folder. Most of it seems to be phishing emails. "You have won a (some tool)..." seems to be popular this week.
They do. My wife lost her 10-year-old Instagram account to a well crafted phishing attack against an email she had published…
Instagram/Meta’s customer support is absolutely atrocious and disgraceful on this front. They basically treat my wife like she’s also a spammer and there’s no way to recover the account or undo any of the changes the spammers made.
It’s hilarious how they ask you to “appeal” a ban by clicking a single button without giving any chance to rectify what the spammers did to her account. Of course their automated bots just reject your appeal almost instantly. Shameful.
Clicking the appeal button is like a trap to permanently ban your account.
You can get it back by paying off a Meta employee through a site like Swapd. It's either that or get your comment to the front page of HN. Those are the only two customer support channels for Meta or Google.
Does her email show up on any leaks on https://haveibeenpwned.com/ ? I'm wondering if not publishing it would have made any difference to receiving phishing messages.
Would such an attacker be stymied by this? It seems like automated email harvesting wouldn't be a big time saver for any attack that required a well-crafted anything. I don't know anything about that particular attack, though.
Same here, I've had my email plainly visible on my website in mailto links and on Github, and I don't get any spam that breaks through Fastmail's spam filters.
My email address: Listed at the top of the front page. In a H3 tag.
This email address's spam problem: Not a problem. 15ish per day get to me including Junk folder. Thanks Purelymail.
What is a problem: Transactional email unrelated to transactions, Promotional email which is newsletter junk spam, Social networks complaining of not being used.
This is my biggest one. I get more spam from Facebook begging me to log in than I do from almost anything else. I haven't used the account in about 7 years, you'd think they'd figure it out.
Cost of pissing off inactive user: Essentially zero.
Cost of convincing inactive user to come back: Positive.
Add in a bunch of other factors like some product manager twisting stats to make it look like they are getting users back even if they really aren't and you see why it happens.
I have a few old domains I registered in the late 90s, and some of them still have the mailto with my email, some I rarely get any spam, and others it's dozens a day. SpamAssassin does a great job of caching the spam.
If anything's the "new IE", it's Chrome! Dominant market position, >50% of people using it, (arguable?) abuse of a platform monopoly (search this time) to drive popularity. It also supports stuff other browsers don't, so we get sites that only work in Chrome - like we used to get sites that only worked in IE. Yes these are "web standards" now but the effect is similar.
(When did you see a site that only worked in Firefox, or only in Safari?)
It's not a Firefox issue, it's a NoScript issue. Chrome with its market share and propensity to implement its own standards, or Safari with its market share, quirks, and propensity to implement its own standards, make much better candidate IEs.
Firefox is not at all the new IE. Firefox isn't worse at implementing the spec than Chrome they just make different decisions sometimes. People tend to see Chrome behaviour as the default of how things should be, rather than the specs.
This isn't an issue with Firefox, it's a consequence of the NoScript extension blocking unsafe features. The behaviour is likely same/similar with NoScript and Chrome.
Chrome is the new MSIE. In both cases, a dominant position was used to dictate web standards in an unhealthy way. Microsoft did it through strategic neglect. Google is doing it by strategic smothering. Firefox and Safari are the web's last stand against an impending Chrome browser monoculture, against Google endlessly ramming new features down our throats and declaring them "standards".
> against Google endlessly ramming new features down our throats and declaring them "standards".
New features that will be used for their intended purpose maybe 1% of the time and fingerprinting by AdTech the rest of the time. What could possibly go wrong handing the Web over to an advertising company?
While there's nothing stopping this technique from being accessible in principle, the example given in the article is a really bad one.
The article uses "Email us!" as the label on the svg and a elements, which effectively hides the actual email address from screen readers. Using aria labels in this way is a really bad practice, a screen reader user should have the same experience as anybody else unless there's a very good reason to do otherwise, and if you think your reason is a good reason, you're probably wrong.
The proper way to do this would be to put the actual email address in the labels,.
The NVDA screen reader reads this text as: "This is my email frame link email us." That is by no means equivalent to actually seeing the email address. I found that HTML entity encoding every single character of the link takes care of any spam problem already and is much more accessible.
Being accessible and behind machine-unreadable are literal opposites. A screen reader is not that different from an ad blocker or web scraper in how it accesses content.
There's a reason that many end-to-end testing experts recommend writing selectors based on accessibility labels instead of CSS classes or IDs, especially if you're using a library like Styled Components.
The email address wouldn't be in the document directly, only in the SVG. Whether the title of the SVG contains "Email us" or the email address wouldn't affect how it works.
If the scrapper is searching the DOM rather than simply downloading the webpages, then the email will found regardless.
This can also affect voice dictation software like Dragon - if a user says 'Click myemail@mydomain.tld' it won't activate the link as Dragon is expecting 'Click email us', as that's now what the browser exposes as the link text.
That point might be academic anyway as I'm not sure Dragon would activate a link inside an SVG
I still receive "spam" tho, but it seems they manually collected the email because what I receive are B2B proposals clearly targeted at the topic of my website.
If the scraper uses a headless browser, I think that it might defeat your method. That said, using a headless browser to crawl for emails is relatively expensive so perhaps the spam is not from your site.
Not only "protecting your email" is pointless like others have already pointed out, it's actively harmful.
There are a fair few sites, where most all content is perfectly readable without JS, except things like "1920x1080@60Hz" are displayed as literal "[email protected]" text.
> There are a fair few sites, where most all content is perfectly readable without JS, except things like "1920x1080@60Hz" are displayed as literal "[email protected]" text.
Do you have one on hand? That sounds absurd and I've never seen it
Is there really a point to any of this? It's a fun exercise, but also a complete waste of time if you're actually trying to hide from spammers. You're making a piece of information public by sharing it with the entire world, yet somehow expecting it to only stay accessible to the "good guys".
Unless you change your email address at least monthly, all it takes is for one person or company to share your contact with someone else or enter it into a database/CRM, or one service to get breached, then your email address is on a list that eventually gets propagated to every spammer worldwide. If you use that email with any regularity, the chance of those things happening can be rounded up to 100%.
If hiding your email address from scrapers actually worked, spam wouldn't exist. I never published my personal contact anywhere, yet I get dozens of spam emails per week. They all get filtered as spam, it's not a big deal.
A friend of mine is an absolute wizard and has been building essentially “responsive images” as SVGs with JS inside. They adapt to their size programmatically. It’s… interesting.
The fact that SVGs can even have JS embedded feels both untapped and kind of dangerous.
I gave up on this sort of thing. Spam filters are good enough nowadays that I don't think I see an increase in spam by having my email address publicly available without obfuscation. (That is, an increase beyond other spam sources, like crappy companies who have my email address for a legitimate purpose, but sell it to third parties.) In general I see less than 1 spam email hit my inbox per day, and that's fine.
Granted, this may depend on email provider and spam filter, so YMMV, but it hasn't been an issue for me.
Try to query it though via document.querySelectorAll('a') for example. It's a good first line of defense as a lot of scraping techniques do this approach.
However, if you have a headless browser setup for scraping, and simply fetch the current URL while on the page[0], you can get the plain text, and do a regex search for email addresses which will get you the email address - albeit this is a strange approach to take I admit.
> It's a good first line of defense as a lot of scraping techniques do this approach.
Most basic scrappers, the ones that are not for your testing or devtools or automation or ... Actually use basic text, without any interpretation. They grep the source code, they don't run a dom and javascript engine, because it's a major difference in computing needs and speed.
I am not saying there is no evil scrapper doing dom evaluation, there are tons, I am reacting to your "FIRST line of defense", that one is scrambling the raw text, which is why we got there.
What parent is saying, is that this is trying to upgrade the defense that we have generated to stop the threat that evolved, but it forgot why we got there and thus makes itself vulnerable to the original threat.
Absolutely. The basic tools just fetch sites recursively and use regular expressions. The advanced tools are Chromium-based, so will render SVGs just fine (and then potentially run OCR / AI to extract text even from JPEGs).
This technique protects from a "neither here nor there" subset of programs, I wonder how large is that set in practice.
If they’re saying it, I think that they’re wrong. One of those naively written scrapers won’t pick up an email address ‘protected’ in this way. It’s simply continuing the game of cat and mouse.
The idea being that spam bots don't parse svg's looking for email addresses, just the page html. I'm not sure how effective this really is with modern spam protection, however.
I think that nowadays most spam lists come from data breaches and address-collecting malware. It's cheaper than running a bot to scan the web for addresses. We get spam on addresses that were never published online.
I think so too. And I think the majority of data breaches that have lead to spam for me are from ages ago, from random services I signed up for as a teenager.
For a few years after that I did the "+" Gmail alias thing, to try to filter and catch companies. But I realised that's easy and obvious to strip, so it wasn't worth the effort (although I have caught PayPal leaking my email somehow).
Don't, there are many smaller email providers that will take that load off your shoulders for a small fee. I've been using purelymail and have had good experience with it, and heard good things about migadu and fastmail. The latter two are more well known and better staffed, but also expensive.
I've been using similar aliases for years (paypal@domain.tld, ebay@domain.tld, etc), but make sure you have a contingency plan for when you're no more. I've received lots of account info from previous owners of the domain by setting up a catchall mailbox. We will obviously not care, but when someone takes over your account, they might use it to do harm to others (spam or fraud or whatever else).
>Sounds good! I might go even further and just use a custom address for each service, i.e. paypal@example.com or something.
Which is exactly what I do. As soon as I see spam sent to any particular email address, I know who it is that leaked the address and I can block it without issue.
>But self-hosting email is an adventure I'm nervous to embark on.
Why are you nervous about it? I've been doing so for decades and haven't had many issues at all. There are a bunch of all-in-one solutions like mailinabox[0] (I roll my own, but as I said, I've been doing this for decades) and others which would likely make things simpler for you. Go for it! You won't be disappointed.
Anecdotally, sending mail to example.com from example@mydomain.com can cause a whole host of human-factors problems which can be eliminated with something like RaoulPtoExample@mydomain.com.
I think this is a valid question. I see lots of effort at obfuscation but don't know if there's still a need.
I barely get spam and have a bigger issue with false positives in my spam folder. On the other hand I don't think there are many pages on the web that display my email address, so I'm curious about others' experience.
I get a similar volume, and gmail likely detects almost all of them. Problem is, it also falsely detects the occasional non-spam message, so I do need to periodically scan through the spam box, which is a bit of a pain when it contains hundreds or thousands of emails.
"When displaying an e-mail address on a website you obviously want to obfuscate it to avoid it getting harvested by spammers. But which obfuscation method is the best one? I drove a test to find out."
While the specific claim made about copying is true, you can right click and select copy email address, simply selecting the text and doing copy does not work. Similarly if you do select all into copy etc, so all in all, I wouldn't expect a regular user to be able to successfully copy this.
Until some bot dev sees this, accepts the challenge, and then solves it as a function within their package that never needs updating again because it is now done. So, live it up while it is not solved. After that, just shrug your shoulders at yet another idea no longer being useful
My secret is that I'm not simulating. Being blind forces you into it. :D
For testing purposes, the nvda screen reader is free and open source. I'm not sure if there is a driver for it to have an api access to what it would output, but it might be a fun project to try for a11y testing purposes.
Heavily guarded fortress would indicate something of value inside, and the big crooks may spend a little more effort. In the age of AI, this becomes even easier.
True - but that cost just halved with today's introduction of "GPT-4o". The other cost is time.
IMHO - I think there is more to worry about than email scraping..
In no world is anyone wasting resources to run an AI model to parse a page that may or may not include an email address. Even running a DOM parser is more than they’d typically do. This is silly.
it won't be widespread imho, not when you share you email address with other parties that then lose/sell your details. fastmail like 'temporal' email addresses could help, however.
Funny I'm seeing this now, I've finally ade the first tentative steps into making a website, and noticed that pandoc has an --email-obfuscation option and the whole topic was on my mind. I don't remember the last time I received an actual spam email (not counting desparate marketters trying to remind me of that one website I tried ages ago). Funnily enough, the new frontier seems to be what's app and SMS of all things. A month or two back I got a job offer from an indonesian phonenumber from what's app, and then something similar directly to my SMS. I didn't publish my phone number anywhere online, the closest thing to making it public was joining my college's what's app group and giving my phone number to a bank for a student credit card, and honestly I wouldn't put leaking them to some spam agency beyond either.
I'm using voice over on MacOS chromium and I have the same experience as the NVDA user, although if I interact with the "link" I'll eventually find the email. If I wasn't aware of the ofuscation however I probably would just think the webpage was weird, saying "this is an email" but actually giving a mailto: link. In general, if you're doing something special to improve accessibility then odds are you're doing it wrong, and if it's anything web related the odds are at least 90%. Most accessibility issues on the internet are developers trying to be smart by using ARIA labels or such which usually just make it worse. The example I have to deal with most often are manpages on man.openbsd.org. All of their cross references to other manpages say something like "openssl, section 1" instead of "openssl(1)", which is what's displayed on the screen and what the browser's find command sees while searching.
For completeness, I also tried the page with various terminal browsers, specifically lynx, felinks, w3m, and edbrowse. None, and I mean NONE of them could display the svg properly, they couldn't even recognise it as an image.
More ironically, some of them sell anti-spam SaaS. I recently received the following email:
> Le 2 mai 2024 à 12:13, Geoffrey Callaghan <irishgeoff@yahoo.com>
>
> Hello there,
>
> Please don't shoot the messenger ( that's me don't shoot :)
>
> But you should not post your email address like that in its raw email format on the hacker news
>
> You should use a tool like https://veilmail.io to hide your email address from spam bots :)
>
> You can always go back and change your hacker news post with a veilmail address.
>
> Have a nice day
>
> Geoff
I think the main thing people forget with stuff like this is that yes, all these setups are possible (or even trivial) to bypass, but you're not really dealing with a dedicated adversary that's targeting you in particular.
Spammers probably aren't going to update their tools to take into account every possible way every site obfuscates their email addresses, so the main trick to dealing with them would be to do something other sites/services don't. If you or your company become successful enough that people are actually targeting you in particular, then congrats, you're probably in a good place anyway.
> Spammers probably aren't going to update their tools to take into account every possible way every site obfuscates their email addresses
But this is also sort of a security through obscurity approach, if enough people adopt one of these methods of obfuscation then the spammers absolutely will change their tools.
Maybe I’m too stupid but I don’t get why you would want to do this at all.
Had my email in plaintext on the website for ages and never had an issue with spam…
I think I get more unsolicited email from related businesses trying to get a foot in the door with my company - I assume they're connecting dots either from LinkedIn or Github (probably both). This is an interesting solution to the problem, but I don't genuinely think that anyone is scraping websites for email addresses anymore. I don't think it's cost effective for the modern spammer.
Seems like a great solution but I'd like to embed the data directly rather than linking an external file. Then one issue I see is that dumb scrapers just look for the email address (also in the embedded SVG, which they might not for external <object> or <img> files.) But for direct embeds, if the string is not otherwise encoded, that could potentially leak the email address.
While this obviously (re)introduces JS into the mix, how would a simple compressed string fare against base64 svg embedding?
I would like to push back on the idea that you should obfuscate your email address at all.
My email addreas is danny@spesh.com. I get a lot of spam -- possibly, since I have been distributing that address deliberately on the web and inadvertently in hacked datadumps, a near maximum amount of spam.
But the benefits of having people easily find a way to contact me directly has for me far outweighed the (largely solved) challenge of discarding automated spam.
Publish your email address! It's okay! Very little bad will happen, and people will be able contact you without going through some strange social media intermediary!
This is appropriate advice for the average HN reader. For everyone else, probably not. I've seen first hand otherwise intelligent people being unable to discern an obvious (to me) online scam from a legitimate business. These are the people spammers are targeting. These are the people that need to obfuscate their email address.
Even sophicasted users can slip up in the right circumstances.
Personal anecdote: one morning, whilst still quite sleepy received a very well crafted Namecheap phishing expedition. I half knew the product they were claiming was lapsed was actually fine, but I had just recently renewed so I thought perhaps there had been a problem I missed, and it was convincing enough that I clicked the link before doing the normal sanity checks. Thankfully the address it went to didn't resolve. Hopefully I would have noticed the obviously incorrect URL before I entered any details, and I have 2FA enabled, but still, I should and do know better, it was just perfect timing for a well crafted attack...
Don't modern spam filters filter out most mails received this way and most spammers purchase lists for a specific targeted domains - house owners, porn users, dentists etc. rather than blindly scraping the web?
Interesting idea...but could a crawler not just incorporate some AI like LLava2 or convert the SVG to a JPG and use OCR to get the email addresses out?
It just seems like this adds a couple of steps to existing crawler scripts.
Seems kind of easy to defeat, just read the SVG to extract the email address from the mail to: link contained therein. Bonus the harvesting bots will now download all SVG files going forward.
At that point, isn't adding a good old contact form a simpler solution? You can link it with your email address or other channels. It can even works with static websites, I hooked up mine with Nextcloud Forms.
I appreciate the hacker creativity at display here, but as other said obfuscating an email address raises accessibility issues. Hiding content from some programs and not others (spam bots vs assistive technologies) seems inherently a losing game, for you or for users.
Kind of neat but I would rather just have a “throwaway” email if I was sharing globally.
In my case, I setup an email alias with a sieve rule (if email sent to alias move to “public inquiry” folder). Prior to processing rule, spam assassin takes care of the non technical folks that couldn’t be bothered to run their spam campaign through spam assassin testers. Or even nontechnical folks that wouldn’t know how to setup their domain for sending email (spf, dkim, dmarc, …)
I tested the example using the TalkBack screenreader on Android. With Firefox I was able to select and click on the link, but it did not announce the email address. With Chromium it completely ignored the existence of the SVG email. I was unable to select it and it was like the email wasn't there at all.
i bought an premium .app domain a few months ago. not published in websites yet. no history of previous owners. just a fact that it's listed as a premium domain on registrars.
first emails I received after the gmail welcome email were b2b sales from construction companies (i'm not in this field), shopify optimizations (i don't run one), agencies suggesting how i improve the ui/ux of my site (no website yet).
thankfully, they're all in the spam folder. i'm using google workspace.
i believe these spammers get their leads on newly-registered domains. so, how do we protect ourselves from that?
I assume that nowadays emails are pulled directly from hacked mailbox contacts list. Nobody has the time to go through each individual website and collect emails one by one.
This is a cool trick. The email is in cleartext in the source, meaning mailto works and copy-paste works. But most scrapers probably skip the .svg file.
All you're doing I making it slightly more difficult for the people that want to contact you to do so.
OCR has been a thing for years.
Just put your email out there. That's what spam filters are for.
charles@geuis.com. There. Scrape it. Spam it. I don't care.
Edit:
Yes, thank you for signing me up for the DNC (already a member), some random Trump org, something about Scientology, and another random christian-based website. Honestly, I'm kind of sad at the lack of originality given the otherwise extremely ingenious community we have here.
But you just proved the point. You might not care to be signed up for some random Trump org, Scientology, or whatever, but other people do care and if you want to author a website that responsibly uses people's emails without subjecting them to unnecessary spam, then it's worth taking these techniques (not necessarily this specific one) into consideration.
While OCR does exist it's incredibly expensive compared to text scraping. The main way to combat spam is to make the cost of spamming more expensive than the benefit.
It even uses the exact same syntax as in html, so as long as svg content isn't specifically excluded, normal web scraping would just work without modification.
a much easier way is to convert the email address into html entities. It then displays and can be copied, but the actual source code doesnt have the email address.
Do they though?
I have had my email address published on my website in a <a href="mailto:… for like 20 years and I don't get spam that would get through the spam filter.
I use both Gmail and (for some other addresses) a webmail hosted by a local company which uses some other filter. Both work well, so it's not something only Google can do.