Protecting your email address via SVG instead of JavaScript

yreg · on May 13, 2024

> Email addresses published on webpages usually need to be protected from email-harvesting spambots.

Do they though?

I have had my email address published on my website in a <a href="mailto:… for like 20 years and I don't get spam that would get through the spam filter.

I use both Gmail and (for some other addresses) a webmail hosted by a local company which uses some other filter. Both work well, so it's not something only Google can do.

crazygringo · on May 13, 2024

Exactly.

I definitely recall in the early 2000's it absolutely did lead to spam, and e-mail obfuscation techniques were a real thing that genuinely helped.

But by 2015 or so it didn't matter at all anymore, in my personal experience. It didn't even lead to spam that needed to filtered. Spammers just stopped looking for e-mails that way.

Which makes perfect sense -- most people don't have their e-mail address listed anywhere online in the first place, but you can purchase gigantic lists of e-mail addresses. That either originate from companies that sell their own user lists, or people who hacked the companies' servers.

These days if you want to send spam, trawling the web for e-mails makes zero sense. It's practically the least efficient thing you could do.

treflop · on May 13, 2024

I’ve been having all my email addresses posted plain text since like 2005 and I’ve signed up on like every website imaginable (my password manager has over 2,000 entries) and I’ve never had a spam problem, at least on Gmail.

r-w · on May 13, 2024

Unless you’re the one trying to sell them, in which case that’s part of doing business :)

qingcharles · on May 13, 2024

I have two people I designed web sites for in the last year and I put both their email addresses in the footer and neither one of their accounts has received a single spam message in all of that time (not even something dropped into the Spam folder). Both sites are popular and have thousands of visitors and get scraped by every search engine and AI bot you can think of.

r-w · on May 13, 2024

Interesting. Maybe footer emails tend to be support contact addresses rather than personal inboxes. Otherwise I’d find that discrepancy very surprising.

dhosek · on May 13, 2024

My thoughts exactly. On the other hand, an email address I used with Usenet ca 1999–2001 has had a consistent flood of spam. I think most spammers are using the same 20+-year-old list of emails.

The email address on my website doesn’t even get stuff that goes to the spam filter. Nothing, nada zilch.

I do think that there are some mailing lists that get generated by trying to guess emails, brute-forcing gmail addresses by trying dictionary attacks of the FIRSTNAME.LASTNAME variety or 1–10 letters. I get a tiny amount of spam sent to a domain@domain.com address I have, but that’s typically on the order of one message a year.

And all else aside, the overall volume of spam email has declined dramatically, even ignoring the effect of the gmail spam filter. I’m guessing that email as a spam vector just doesn’t make sense anymore and most of what goes out is a mix of 419 scammers trying to make their quotas and would-be scammers who’ve been scammed into buying that 20-year-old list of emails.

paradox460 · on May 13, 2024

The practice of email address "obfuscation" feels like a relic of a bygone era, one that was never actually sound in its methodology, but spread. A form of cargo-cultism has kept it alive

SoftTalker · on May 13, 2024

Yeah just looking at this, it appears to add about 1K of overhead and at least one additional http request for something that ultimately boils down to a mailto: link, so it can still be scraped, and just adds bloat to your web page.

digging · on May 13, 2024

My preference is to not have my email harvested at all when possible, even if I don't personally see the spam emails. (I'm not saying it's a critical privacy/security issue, but a preference.)

doug-moen · on May 14, 2024

My experience is that you send email to someone whose Op Sec is not as good as your own, then your email will be harvested at the point when that person's address book is harvested. I don't know all the details of how these harvests occur, but using a shady mobile app with Contacts permission would be enough.

jakubmazanec · on May 13, 2024

So then you never use your email, right?

r-w · on May 13, 2024

I think they’re obliquely referring to the scanning practices of major providers like Gmail, which most people use to filter their spam.

digging · on May 13, 2024

What?

jakubmazanec · on May 14, 2024

Your email address cannot be (and isn't) secret, if you give it to other people (regular people, i.e. friends, colleagues, etc.) so they can send you emails. If you don't want your email harvested, you can never use it (at least to receive emails).

adrianpike · on May 13, 2024

I've also had my email posted in mailto's in a half dozen places for... a long time. I remember in the early 00's when I'd cargo cult the old "type the whole email out as adrian at adrianpike dot com" thing on forums thinking it would work as some mystical talisman, and it turns out considering emails to be secret isn't worth the time.

xyst · on May 13, 2024

this used to be a problem in the early 00s. I don’t think spam filtering was as good back then so protecting your public email from spam was necessary.

Also this was a time when mail boxes were often allocated 10-25 megabytes. So spam bots could easily flood your email.

WirelessGigabit · on May 13, 2024

When I signed up for Hotmail it was 2MB.

Then on April 1st, 2004 Google launched wasn't an April 1st joke... GMail with 1GB! I remember getting a beta invite and inviting others.

ibcj · on May 15, 2024

I had to buy my invite then, like a sucker. Apparently, I didn't make the cut of my friends who got legit invites to pass around.

Animats · on May 13, 2024

Agreed. I have several web sites with publicly visible email addresses and they don't get much spam.

The spam I get is rather mis-targeted. For a while I was getting spam for equipment which would be useful were I a bulk producer of olive oil. "We have 15 years of experience in the research, development and production of automatic edible oil filling equipment...." There are the usual fake financing deals: "We’ve pre-approved your business for financing..." Whatever sends that crap doesn't look at the web site at all.

When I get spam from Gmail or Outlook accounts, I report it, so they will get a strike against their account. I don't hear from those people again.

All other spam is so obviously bogus that simple filters are dumping it into a junk folder. Most of it seems to be phishing emails. "You have won a (some tool)..." seems to be popular this week.

a_random_canuck · on May 13, 2024

They do. My wife lost her 10-year-old Instagram account to a well crafted phishing attack against an email she had published…

Instagram/Meta’s customer support is absolutely atrocious and disgraceful on this front. They basically treat my wife like she’s also a spammer and there’s no way to recover the account or undo any of the changes the spammers made.

It’s hilarious how they ask you to “appeal” a ban by clicking a single button without giving any chance to rectify what the spammers did to her account. Of course their automated bots just reject your appeal almost instantly. Shameful.

qingcharles · on May 13, 2024

Clicking the appeal button is like a trap to permanently ban your account.

You can get it back by paying off a Meta employee through a site like Swapd. It's either that or get your comment to the front page of HN. Those are the only two customer support channels for Meta or Google.

crtasm · on May 13, 2024

Does her email show up on any leaks on https://haveibeenpwned.com/ ? I'm wondering if not publishing it would have made any difference to receiving phishing messages.

chefandy · on May 13, 2024

Would such an attacker be stymied by this? It seems like automated email harvesting wouldn't be a big time saver for any attack that required a well-crafted anything. I don't know anything about that particular attack, though.

hoherd · on May 13, 2024

This gave me "Press F to appeal ban" images.

dgb23 · on May 13, 2024

This could happen to anyone. You’re tired or thinking of something else, the attack weirdly aligns and you don’t notice it until it’s too late.

4u00u · on May 13, 2024

very recently, within a day of publishing an email on a footer of a page i got a phishing email that was not filtered by spam and looked very genuine

nozzlegear · on May 13, 2024

Same here, I've had my email plainly visible on my website in mailto links and on Github, and I don't get any spam that breaks through Fastmail's spam filters.

michaelcampbell · on May 14, 2024

Indeed; this is preference more than a problem to be solved, and this is not the solution.

An almost tin-foil hat wearing colleague of mine went on loudly and proudly about how he'd never give HIS phone number to Google, oh no! Not him!

I just had to say, "John, they have it - you're in my contact list."

He hadn't even considered that.

zufallsheld · on May 13, 2024

I host my own Mailserver and all addresses that are publicly visible get spam, e.g. my blog or my mail that was visible on github.

throwaway598 · on May 13, 2024

My domain: 24 years registered to me. A .com.

My email address: Listed at the top of the front page. In a H3 tag.

This email address's spam problem: Not a problem. 15ish per day get to me including Junk folder. Thanks Purelymail.

What is a problem: Transactional email unrelated to transactions, Promotional email which is newsletter junk spam, Social networks complaining of not being used.

zufallsheld · on May 13, 2024

15 spam mails do seem quite much to me. I blacklisted addresses for less.

anamexis · on May 14, 2024

If they're getting filtered, who cares?

SoftTalker · on May 13, 2024

> Social networks complaining of not being used

This is my biggest one. I get more spam from Facebook begging me to log in than I do from almost anything else. I haven't used the account in about 7 years, you'd think they'd figure it out.

kevincox · on May 13, 2024

> you'd think they'd figure it out.

Cost of sending spam: Effectively zero.

Cost of pissing off inactive user: Essentially zero.

Cost of convincing inactive user to come back: Positive.

Add in a bunch of other factors like some product manager twisting stats to make it look like they are getting users back even if they really aren't and you see why it happens.

speckx · on May 14, 2024

I have a few old domains I registered in the late 90s, and some of them still have the mailto with my email, some I rarely get any spam, and others it's dozens a day. SpamAssassin does a great job of caching the spam.

magnat · on May 13, 2024

> even when a human visitor has their JavaScript turned off, the email address displayed on the page remains usable

NoScript on Firefox with default settings don't render <object> tags (replaces them with placeholders), so this technique doesn't work here.

https://imgur.com/2tCAgAf

Laaas · on May 13, 2024

uBlock Origin can block JS too FWIW. There’s a convenient button for it in the extended menu.

brettermeier · on May 13, 2024

Thank you, didn't know that!

yau8edq12i · on May 13, 2024

That's a different thing, though. Not sure why you'd make this point.

jaeh · on May 13, 2024

it's the same in chromium.

gwervc · on May 13, 2024

[flagged]

sanitycheck · on May 13, 2024

If anything's the "new IE", it's Chrome! Dominant market position, >50% of people using it, (arguable?) abuse of a platform monopoly (search this time) to drive popularity. It also supports stuff other browsers don't, so we get sites that only work in Chrome - like we used to get sites that only worked in IE. Yes these are "web standards" now but the effect is similar.

(When did you see a site that only worked in Firefox, or only in Safari?)

jdiff · on May 13, 2024

It's not a Firefox issue, it's a NoScript issue. Chrome with its market share and propensity to implement its own standards, or Safari with its market share, quirks, and propensity to implement its own standards, make much better candidate IEs.

herpdyderp · on May 13, 2024

> Safari with its market share

Safari's market share is completely dwarfed by Chrome. Safari is nowhere close to IE's monopoly.

Forced WebKit on iPhones and iPads is the only thing standing between Chrome and complete IE monopoly.

StrauXX · on May 13, 2024

Firefox is not at all the new IE. Firefox isn't worse at implementing the spec than Chrome they just make different decisions sometimes. People tend to see Chrome behaviour as the default of how things should be, rather than the specs.

phito · on May 13, 2024

Nonsensical comment. Sounds like you just wanted to take a shot at Firefox.

simondotau · on May 13, 2024

This isn't an issue with Firefox, it's a consequence of the NoScript extension blocking unsafe features. The behaviour is likely same/similar with NoScript and Chrome.

Chrome is the new MSIE. In both cases, a dominant position was used to dictate web standards in an unhealthy way. Microsoft did it through strategic neglect. Google is doing it by strategic smothering. Firefox and Safari are the web's last stand against an impending Chrome browser monoculture, against Google endlessly ramming new features down our throats and declaring them "standards".

giantrobot · on May 13, 2024

> against Google endlessly ramming new features down our throats and declaring them "standards".

New features that will be used for their intended purpose maybe 1% of the time and fingerprinting by AdTech the rest of the time. What could possibly go wrong handing the Web over to an advertising company?

miki123211 · on May 13, 2024

While there's nothing stopping this technique from being accessible in principle, the example given in the article is a really bad one.

The article uses "Email us!" as the label on the svg and a elements, which effectively hides the actual email address from screen readers. Using aria labels in this way is a really bad practice, a screen reader user should have the same experience as anybody else unless there's a very good reason to do otherwise, and if you think your reason is a good reason, you're probably wrong.

The proper way to do this would be to put the actual email address in the labels,.

47282847 · on May 13, 2024

Isn’t the whole point of the exercise to not have the document contain the email address in a (machine-)readable format?

janosd · on May 13, 2024

The NVDA screen reader reads this text as: "This is my email frame link email us." That is by no means equivalent to actually seeing the email address. I found that HTML entity encoding every single character of the link takes care of any spam problem already and is much more accessible.

miki123211 · on May 15, 2024

Being accessible and behind machine-unreadable are literal opposites. A screen reader is not that different from an ad blocker or web scraper in how it accesses content.

There's a reason that many end-to-end testing experts recommend writing selectors based on accessibility labels instead of CSS classes or IDs, especially if you're using a library like Styled Components.

Doe-_ · on May 13, 2024

The email address wouldn't be in the document directly, only in the SVG. Whether the title of the SVG contains "Email us" or the email address wouldn't affect how it works.

If the scrapper is searching the DOM rather than simply downloading the webpages, then the email will found regardless.

matteason · on May 13, 2024

This can also affect voice dictation software like Dragon - if a user says 'Click myemail@mydomain.tld' it won't activate the link as Dragon is expecting 'Click email us', as that's now what the browser exposes as the link text.

That point might be academic anyway as I'm not sure Dragon would activate a link inside an SVG

janmo · on May 13, 2024

Here is what I do:

reanospaml@maisjsl.com

I still receive "spam" tho, but it seems they manually collected the email because what I receive are B2B proposals clearly targeted at the topic of my website.

jszymborski · on May 13, 2024

If the scraper uses a headless browser, I think that it might defeat your method. That said, using a headless browser to crawl for emails is relatively expensive so perhaps the spam is not from your site.

kees99 · on May 13, 2024

Not only "protecting your email" is pointless like others have already pointed out, it's actively harmful.

There are a fair few sites, where most all content is perfectly readable without JS, except things like "1920x1080@60Hz" are displayed as literal "[email protected]" text.

digging · on May 13, 2024

> There are a fair few sites, where most all content is perfectly readable without JS, except things like "1920x1080@60Hz" are displayed as literal "[email protected]" text.

Do you have one on hand? That sounds absurd and I've never seen it

tentacleuno · on May 13, 2024

Mastodon instances fronted by Cloudflare (with Email Protection on) are good examples.

dns_snek · on May 13, 2024

Is there really a point to any of this? It's a fun exercise, but also a complete waste of time if you're actually trying to hide from spammers. You're making a piece of information public by sharing it with the entire world, yet somehow expecting it to only stay accessible to the "good guys".

Unless you change your email address at least monthly, all it takes is for one person or company to share your contact with someone else or enter it into a database/CRM, or one service to get breached, then your email address is on a list that eventually gets propagated to every spammer worldwide. If you use that email with any regularity, the chance of those things happening can be rounded up to 100%.

If hiding your email address from scrapers actually worked, spam wouldn't exist. I never published my personal contact anywhere, yet I get dozens of spam emails per week. They all get filtered as spam, it's not a big deal.

donatj · on May 13, 2024

A friend of mine is an absolute wizard and has been building essentially “responsive images” as SVGs with JS inside. They adapt to their size programmatically. It’s… interesting.

The fact that SVGs can even have JS embedded feels both untapped and kind of dangerous.

soperj · on May 13, 2024

SVGs are responsive out of the box? I'm confused about what the Javascript would be doing to help that situation within the svg.

asynchronous · on May 13, 2024

I think they’re talking about dynamically actually changing the image itself, not just resizing

alemanek · on May 13, 2024

That sounds super interesting. Does your friend have a GitHub or site that shows what they’re doing on that front. If so could you post link.

This is super far out of my wheelhouse technically as a backend engineer but it sounds really cool.

johnny99k · on May 13, 2024

This has been known in the security community for quite some time.

kelnos · on May 13, 2024

I gave up on this sort of thing. Spam filters are good enough nowadays that I don't think I see an increase in spam by having my email address publicly available without obfuscation. (That is, an increase beyond other spam sources, like crappy companies who have my email address for a legitimate purpose, but sell it to third parties.) In general I see less than 1 spam email hit my inbox per day, and that's fine.

Granted, this may depend on email provider and spam filter, so YMMV, but it hasn't been an issue for me.

cwillu · on May 13, 2024

Email is still plain-text within an xml document referenced in the page source.

shanehoban · on May 13, 2024

Try to query it though via document.querySelectorAll('a') for example. It's a good first line of defense as a lot of scraping techniques do this approach.

However, if you have a headless browser setup for scraping, and simply fetch the current URL while on the page[0], you can get the plain text, and do a regex search for email addresses which will get you the email address - albeit this is a strange approach to take I admit.

[0]: fetch('./').then((res) => res.text()).then((text) => console.log(text))

nolok · on May 13, 2024

> It's a good first line of defense as a lot of scraping techniques do this approach.

Most basic scrappers, the ones that are not for your testing or devtools or automation or ... Actually use basic text, without any interpretation. They grep the source code, they don't run a dom and javascript engine, because it's a major difference in computing needs and speed.

I am not saying there is no evil scrapper doing dom evaluation, there are tons, I am reacting to your "FIRST line of defense", that one is scrambling the raw text, which is why we got there.

What parent is saying, is that this is trying to upgrade the defense that we have generated to stop the threat that evolved, but it forgot why we got there and thus makes itself vulnerable to the original threat.

animuchan · on May 13, 2024

Absolutely. The basic tools just fetch sites recursively and use regular expressions. The advanced tools are Chromium-based, so will render SVGs just fine (and then potentially run OCR / AI to extract text even from JPEGs).

This technique protects from a "neither here nor there" subset of programs, I wonder how large is that set in practice.

cqqxo4zV46cp · on May 13, 2024

If they’re saying it, I think that they’re wrong. One of those naively written scrapers won’t pick up an email address ‘protected’ in this way. It’s simply continuing the game of cat and mouse.

nkozyra · on May 13, 2024

You can just query for all the image elements and then read any svg using the document model.

This is trivial to overcome for most basic scrapers and not much harder even if you try to obfuscate with paths for more sophisticated ones.

_joel · on May 13, 2024

The idea being that spam bots don't parse svg's looking for email addresses, just the page html. I'm not sure how effective this really is with modern spam protection, however.

turboturbo · on May 13, 2024

The idea also seem to be that spam bots don’t look for `href="mailto:something"` in the DOM

edave64 · on May 13, 2024

The mailto is inside the SVG, not the HTML document. So that's not "also" it's the same idea of bots not looking at the svg at all

rrr_oh_man · on May 13, 2024

That seems surprising, tbh

majestic5762 · on May 13, 2024

yeah, useless stuff portrayed as smart

okasaki · on May 13, 2024

Is it still necessary to obfuscate email addresses? Mine isn't and I get around 50 generic spam emails per month to gmail.

ale42 · on May 13, 2024

I think that nowadays most spam lists come from data breaches and address-collecting malware. It's cheaper than running a bot to scan the web for addresses. We get spam on addresses that were never published online.

RaoulP · on May 13, 2024

I think so too. And I think the majority of data breaches that have lead to spam for me are from ages ago, from random services I signed up for as a teenager.

For a few years after that I did the "+" Gmail alias thing, to try to filter and catch companies. But I realised that's easy and obvious to strip, so it wasn't worth the effort (although I have caught PayPal leaking my email somehow).

ale42 · on May 13, 2024

If you self-host your email, you can use "." as a delimiter instead of the "+". People would already need to know they can strip that part...

RaoulP · on May 13, 2024

Sounds good! I might go even further and just use a custom address for each service, i.e. paypal@example.com or something.

But self-hosting email is an adventure I'm nervous to embark on.

steve_rambo · on May 15, 2024

Don't, there are many smaller email providers that will take that load off your shoulders for a small fee. I've been using purelymail and have had good experience with it, and heard good things about migadu and fastmail. The latter two are more well known and better staffed, but also expensive.

I've been using similar aliases for years (paypal@domain.tld, ebay@domain.tld, etc), but make sure you have a contingency plan for when you're no more. I've received lots of account info from previous owners of the domain by setting up a catchall mailbox. We will obviously not care, but when someone takes over your account, they might use it to do harm to others (spam or fraud or whatever else).

nobody9999 · on May 13, 2024

>Sounds good! I might go even further and just use a custom address for each service, i.e. paypal@example.com or something.

Which is exactly what I do. As soon as I see spam sent to any particular email address, I know who it is that leaked the address and I can block it without issue.

>But self-hosting email is an adventure I'm nervous to embark on.

Why are you nervous about it? I've been doing so for decades and haven't had many issues at all. There are a bunch of all-in-one solutions like mailinabox[0] (I roll my own, but as I said, I've been doing this for decades) and others which would likely make things simpler for you. Go for it! You won't be disappointed.

[0] https://en.wikipedia.org/wiki/Mail-in-a-Box

samatman · on May 13, 2024

Anecdotally, sending mail to example.com from example@mydomain.com can cause a whole host of human-factors problems which can be eliminated with something like RaoulPtoExample@mydomain.com.

RaoulP · on May 13, 2024

I think this is a valid question. I see lots of effort at obfuscation but don't know if there's still a need.

I barely get spam and have a bigger issue with false positives in my spam folder. On the other hand I don't think there are many pages on the web that display my email address, so I'm curious about others' experience.

martyvis · on May 13, 2024

Is that all. I get around 70 genuine spam emails to my Gmail account every day now (all detected correctly by Gmail)

tempestn · on May 13, 2024

I get a similar volume, and gmail likely detects almost all of them. Problem is, it also falsely detects the occasional non-spam message, so I do need to periodically scan through the spam box, which is a bit of a pain when it contains hundreds or thousands of emails.

sitzkrieg · on May 13, 2024

it isnt but people like to make a problem of it with elaborate whatifs

dxs · on May 13, 2024

This is fun [2008]: https://web.archive.org/web/20180908103745/http://techblog.t...

"Nine ways to obfuscate e-mail addresses compared

"When displaying an e-mail address on a website you obviously want to obfuscate it to avoid it getting harvested by spammers. But which obfuscation method is the best one? I drove a test to find out."

Etheryte · on May 13, 2024

While the specific claim made about copying is true, you can right click and select copy email address, simply selecting the text and doing copy does not work. Similarly if you do select all into copy etc, so all in all, I wouldn't expect a regular user to be able to successfully copy this.

throwaway11460 · on May 13, 2024

Don't have time to test myself right now - what about accessibility, can a screen reader read it?

gostsamo · on May 13, 2024

I tested and seems accessible on the live demo. Not sure if is as protected as the author claims though, but it might throw some bots for a spin.

dylan604 · on May 13, 2024

> but it might throw some bots for a spin.

Until some bot dev sees this, accepts the challenge, and then solves it as a function within their package that never needs updating again because it is now done. So, live it up while it is not solved. After that, just shrug your shoulders at yet another idea no longer being useful

gostsamo · on May 13, 2024

The key in this case is that this is not a problem for me even if someone implements such a protection.

The rest is mice and traps.

rrr_oh_man · on May 13, 2024

Man, I’ve always wondered how to test apps with a (simulated) screen reader, but never got too far

gostsamo · on May 13, 2024

My secret is that I'm not simulating. Being blind forces you into it. :D

For testing purposes, the nvda screen reader is free and open source. I'm not sure if there is a driver for it to have an api access to what it would output, but it might be a fun project to try for a11y testing purposes.

rrr_oh_man · on May 14, 2024

Thank you! And sorry for all the shitty code I produced over the years.

throwaway11460 · on May 13, 2024

I use this: https://chromewebstore.google.com/detail/aria-devtools/dneem...

Not sure about desktop apps.

rrr_oh_man · on May 14, 2024

Added this. Thank you a lot!

Operyl · on May 13, 2024

Given the entire bottom section, it seems like accessibility was taken into account here.

throwaway11460 · on May 13, 2024

Unfortunately then I think it won't help at all - going through the accessibility tree is a standard web crawling play.

muzster · on May 13, 2024

Heavily guarded fortress would indicate something of value inside, and the big crooks may spend a little more effort. In the age of AI, this becomes even easier.

   {
     "model" : "gpt-4-turbo",
     "messages" : [ 
       {
         "role" : "system",
         "content" : [ {
          "type" : "text",
          "text" : "return a json array of all valid emails found in the image."
          } ] 
       }, 
       {
         "role" : "user",
         "content" : [ {
           "type" : "image_url",
           "image_url" : {
           "url" : "data:image/png;base64,{{ INSERT_BASE64_PNG_DATA }}"
         }
       } ]
     } ],
      "temperature" : 0.5,
      "max_tokens" : 2048,
      "top_p" : 1.0,
      "frequency_penalty" : 0.0,
      "presence_penalty" : 0.0
    }

Edit: Converting web page to an image is trivial.

internetter · on May 13, 2024

We've had OCR for decades before GPT. I suspect GPT might perform worse than OCR. What a waste.

muzster · on May 13, 2024

Agreed - it's a waste. GPT is not too bad at reading text from image and with the added bonus that you can reason with it.

zipping1549 · on May 13, 2024

It won't make sense cost wise though

muzster · on May 13, 2024

True - but that cost just halved with today's introduction of "GPT-4o". The other cost is time. IMHO - I think there is more to worry about than email scraping..

omneity · on May 13, 2024

Except the cost is only going down over time

barkbyte · on May 15, 2024

In no world is anyone wasting resources to run an AI model to parse a page that may or may not include an email address. Even running a DOM parser is more than they’d typically do. This is silly.

fp64 · on May 13, 2024

I don't get it, I can just curl the svg and grep for mailto?

rany_ · on May 13, 2024

Yes, but these scrapper bots aren't that sophisticated.

fp64 · on May 13, 2024

Crawl every link, now including SVG, and grep all 'mailto:' does not sound super sophisticated?

    wget --recursive --quiet $BASE_URL && grep -roh 'mailto:\([^"]*\)'

works on the example and just prints the email

planede · on May 13, 2024

I think the idea is that email scraper bots typically don't bother downloading images referenced by <img> tags.

winternewt · on May 13, 2024

But they will be as soon as this sees widespread use.

_joel · on May 13, 2024

it won't be widespread imho, not when you share you email address with other parties that then lose/sell your details. fastmail like 'temporal' email addresses could help, however.

amsterdorn · on May 13, 2024

Querying DOM nodes is inherently more complicated than a regex on unparsed HTML.

readmemyrights · on May 13, 2024

Funny I'm seeing this now, I've finally ade the first tentative steps into making a website, and noticed that pandoc has an --email-obfuscation option and the whole topic was on my mind. I don't remember the last time I received an actual spam email (not counting desparate marketters trying to remind me of that one website I tried ages ago). Funnily enough, the new frontier seems to be what's app and SMS of all things. A month or two back I got a job offer from an indonesian phonenumber from what's app, and then something similar directly to my SMS. I didn't publish my phone number anywhere online, the closest thing to making it public was joining my college's what's app group and giving my phone number to a bank for a student credit card, and honestly I wouldn't put leaking them to some spam agency beyond either.

I'm using voice over on MacOS chromium and I have the same experience as the NVDA user, although if I interact with the "link" I'll eventually find the email. If I wasn't aware of the ofuscation however I probably would just think the webpage was weird, saying "this is an email" but actually giving a mailto: link. In general, if you're doing something special to improve accessibility then odds are you're doing it wrong, and if it's anything web related the odds are at least 90%. Most accessibility issues on the internet are developers trying to be smart by using ARIA labels or such which usually just make it worse. The example I have to deal with most often are manpages on man.openbsd.org. All of their cross references to other manpages say something like "openssl, section 1" instead of "openssl(1)", which is what's displayed on the screen and what the browser's find command sees while searching.

For completeness, I also tried the page with various terminal browsers, specifically lynx, felinks, w3m, and edbrowse. None, and I mean NONE of them could display the svg properly, they couldn't even recognise it as an image.

niutech · on May 13, 2024

This requires loading an external SVG file, better use an inline version:

    <object data="data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%3D%220%200%20200%2024%22%3E%3Ca%20href%3D%22mailto%3Amyemail%40mydomain.tld%22%3E%3Ctext%20x%3D%2250%25%22%20y%3D%2250%25%22%20dominant-baseline%3D%22middle%22%20text-anchor%3D%22middle%22%3Emyemail%40mydomain.tld%3C%2Ftext%3E%3C%2Fa%3E%3C%2Fsvg%3E" type="image/svg+xml"></object>

Also have a look at this: https://spencermortensen.com/articles/email-obfuscation/

franky47 · on May 13, 2024

Ironically, the only spam I receive these days comes from the address I used here for the "Who wants to be hired" threads.

lesquivemeau · on May 13, 2024

More ironically, some of them sell anti-spam SaaS. I recently received the following email:

  > Le 2 mai 2024 à 12:13, Geoffrey Callaghan <irishgeoff@yahoo.com>
  >
  > Hello there,
  >
  > Please don't shoot the messenger ( that's me don't shoot :)
  >
  > But you should not post your email address like that in its raw email format on the hacker news 
  >
  > You should use a tool like https://veilmail.io to hide your email address from spam bots :)
  > 
  > You can always go back and change your hacker news post with a veilmail address.
  > 
  > Have a nice day
  > 
  > Geoff

Spread the disease, sell the cure

CM30 · on May 13, 2024

I think the main thing people forget with stuff like this is that yes, all these setups are possible (or even trivial) to bypass, but you're not really dealing with a dedicated adversary that's targeting you in particular.

Spammers probably aren't going to update their tools to take into account every possible way every site obfuscates their email addresses, so the main trick to dealing with them would be to do something other sites/services don't. If you or your company become successful enough that people are actually targeting you in particular, then congrats, you're probably in a good place anyway.

cmiller1 · on May 13, 2024

> Spammers probably aren't going to update their tools to take into account every possible way every site obfuscates their email addresses

But this is also sort of a security through obscurity approach, if enough people adopt one of these methods of obfuscation then the spammers absolutely will change their tools.

portaouflop · on May 13, 2024

Maybe I’m too stupid but I don’t get why you would want to do this at all. Had my email in plaintext on the website for ages and never had an issue with spam…

sircastor · on May 13, 2024

I think I get more unsolicited email from related businesses trying to get a foot in the door with my company - I assume they're connecting dots either from LinkedIn or Github (probably both). This is an interesting solution to the problem, but I don't genuinely think that anyone is scraping websites for email addresses anymore. I don't think it's cost effective for the modern spammer.

cantSpellSober · on May 13, 2024

Can't be copied and pasted.

It's your domain, why not just have "contact@example.com" for incoming mail instead?

(Novel approach, thanks for sharing!)

SahAssar · on May 13, 2024

If concealing it in an object tag works then you could just have the object tag show it as plain text or html, right? Not sure why its an svg.

juped · on May 13, 2024

probably because the scraper has "that's an image, skip it" logic

_blk · on May 13, 2024

Seems like a great solution but I'd like to embed the data directly rather than linking an external file. Then one issue I see is that dumb scrapers just look for the email address (also in the embedded SVG, which they might not for external <object> or <img> files.) But for direct embeds, if the string is not otherwise encoded, that could potentially leak the email address.

While this obviously (re)introduces JS into the mix, how would a simple compressed string fare against base64 svg embedding?

``` const compressedBase64Svg = '...';

function decompressAndInsertSVG(encodedData) { const decodedData = atob(compressedBase64Svg); const decompressedSvg = decompress(decodedData); const svgContainer = document.getElementById('svgContainer'); svgContainer.innerHTML = decompressedSvg; }

decompressAndInsertSVG(encodedSVG); ```

dannyobrien · on May 13, 2024

I would like to push back on the idea that you should obfuscate your email address at all.

My email addreas is danny@spesh.com. I get a lot of spam -- possibly, since I have been distributing that address deliberately on the web and inadvertently in hacked datadumps, a near maximum amount of spam.

But the benefits of having people easily find a way to contact me directly has for me far outweighed the (largely solved) challenge of discarding automated spam.

Publish your email address! It's okay! Very little bad will happen, and people will be able contact you without going through some strange social media intermediary!

SushiHippie · on May 13, 2024

> in hacked datadumps

https://haveibeenpwned.com/

45 data breaches and 7 pastes

Wow, I don't know if I've ever seen a real address in so many breaches haha

parasti · on May 13, 2024

This is appropriate advice for the average HN reader. For everyone else, probably not. I've seen first hand otherwise intelligent people being unable to discern an obvious (to me) online scam from a legitimate business. These are the people spammers are targeting. These are the people that need to obfuscate their email address.

richrichardsson · on May 13, 2024

Even sophicasted users can slip up in the right circumstances.

Personal anecdote: one morning, whilst still quite sleepy received a very well crafted Namecheap phishing expedition. I half knew the product they were claiming was lapsed was actually fine, but I had just recently renewed so I thought perhaps there had been a problem I missed, and it was convincing enough that I clicked the link before doing the normal sanity checks. Thankfully the address it went to didn't resolve. Hopefully I would have noticed the obviously incorrect URL before I entered any details, and I have 2FA enabled, but still, I should and do know better, it was just perfect timing for a well crafted attack...

_joel · on May 13, 2024

So you're saying the same people unable to discern a spam email knows how to embed a mailto: link in an XML document and write webpages. Ok.

parasti · on May 13, 2024

Never said that. I'm a web developer. People ask me to add their emails to web pages. Comment quality on here seems to have taken a dive.

mediumsmart · on May 13, 2024

this works if you write it into the html on fullmoon tuesdays :

<a href="mailto:some.dude@the.otherdudes.site">some.dude@the.otherdudes.site</a>

kevin_thibedeau · on May 13, 2024

That works for humans. There's no reason to believe bots aren't handling entity parsing.

robszumski · on May 13, 2024

In my experience they haven't been in the past, but LLMs change the game by doing it by default.

rishikeshs · on May 13, 2024

how des this work

helsinkiandrew · on May 13, 2024

Don't modern spam filters filter out most mails received this way and most spammers purchase lists for a specific targeted domains - house owners, porn users, dentists etc. rather than blindly scraping the web?

hhsectech · on May 13, 2024

Interesting idea...but could a crawler not just incorporate some AI like LLava2 or convert the SVG to a JPG and use OCR to get the email addresses out?

It just seems like this adds a couple of steps to existing crawler scripts.

karol · on May 13, 2024

Spam filters work in 2024.

Does the fact someone independently discovers Gauss method to sum up all the numbers 1...100 today make it worth sharing?

My point is that this is a primitive and easy to break workaround and better methods exist.

CodeWriter23 · on May 13, 2024

Seems kind of easy to defeat, just read the SVG to extract the email address from the mail to: link contained therein. Bonus the harvesting bots will now download all SVG files going forward.

saint-loup · on May 13, 2024

At that point, isn't adding a good old contact form a simpler solution? You can link it with your email address or other channels. It can even works with static websites, I hooked up mine with Nextcloud Forms.

I appreciate the hacker creativity at display here, but as other said obfuscating an email address raises accessibility issues. Hiding content from some programs and not others (spam bots vs assistive technologies) seems inherently a losing game, for you or for users.

xyst · on May 13, 2024

Kind of neat but I would rather just have a “throwaway” email if I was sharing globally.

In my case, I setup an email alias with a sieve rule (if email sent to alias move to “public inquiry” folder). Prior to processing rule, spam assassin takes care of the non technical folks that couldn’t be bothered to run their spam campaign through spam assassin testers. Or even nontechnical folks that wouldn’t know how to setup their domain for sending email (spf, dkim, dmarc, …)

zaxomi · on May 13, 2024

Cool.

1 hour later.

Spam-scraper updated to support this.

mrbluecoat · on May 13, 2024

Exactly

brap · on May 13, 2024

I’ve been using the same gmail address for like 20 years.

I don’t think I got a single spam email in the last 5-10 years.

SMS, on the other hand…

rvnx · on May 13, 2024

A couple of modern spammers send you spam from Gmail and say “I included my colleague in CC please hit ‘reply all’ if you are interested”

nloomans · on May 13, 2024

I tested the example using the TalkBack screenreader on Android. With Firefox I was able to select and click on the link, but it did not announce the email address. With Chromium it completely ignored the existence of the SVG email. I was unable to select it and it was like the email wasn't there at all.

So yeah, I wouldn't call this accessible.

cyptus · on May 13, 2024

there is a quite big stackoverflow discussion about ideas how to protect your email on your website: https://stackoverflow.com/q/163628/1216595

zigzag312 · on May 13, 2024

Sadly stackoverflow closed the discussion. Even though discussion is both interesting and valuable.

seanvelasco · on May 13, 2024

i bought an premium .app domain a few months ago. not published in websites yet. no history of previous owners. just a fact that it's listed as a premium domain on registrars.

first emails I received after the gmail welcome email were b2b sales from construction companies (i'm not in this field), shopify optimizations (i don't run one), agencies suggesting how i improve the ui/ux of my site (no website yet).

thankfully, they're all in the spam folder. i'm using google workspace.

i believe these spammers get their leads on newly-registered domains. so, how do we protect ourselves from that?

hu3 · on May 13, 2024

I believe the only effective protection against these fresh domain spammers is what you did:-some pretty good anti-spam mechanism such as Gmail.

butz · on May 13, 2024

I assume that nowadays emails are pulled directly from hacked mailbox contacts list. Nobody has the time to go through each individual website and collect emails one by one.

Closi · on May 13, 2024

I assume that emails are pulled from every method available.

Tagbert · on May 13, 2024

No body. Web crawler bots.

nojs · on May 13, 2024

This is a cool trick. The email is in cleartext in the source, meaning mailto works and copy-paste works. But most scrapers probably skip the .svg file.

pdonis · on May 13, 2024

> most scrapers probably skip the .svg file

But they won't as soon as they realize it's just easy to parse text that contains data they're looking for.

perilunar · on May 14, 2024

If you have an email address in the HTML of a page served by Cloudflare, they will obfuscate it and add their own decoder script.

ceving · on May 13, 2024

It does not work if you change the font-size.

FrostKiwi · on May 14, 2024

You are not referring to page zoom or dpi I presume, which works.

I added `style="height: 2em"` to the `<object>` to fit it within my use-case. Should be able to adapt to current font size that way I think.

robbyiq999 · on May 13, 2024

How about posting 2 email addresses, a hidden one, and the actual one. Using the hidden one to filter the actual one

JohnFen · on May 13, 2024

This has been my approach since the mid '90s. It works very well.

geuis · on May 13, 2024

Why? What's the point?

All you're doing I making it slightly more difficult for the people that want to contact you to do so.

OCR has been a thing for years.

Just put your email out there. That's what spam filters are for.

charles@geuis.com. There. Scrape it. Spam it. I don't care.

Edit:

Yes, thank you for signing me up for the DNC (already a member), some random Trump org, something about Scientology, and another random christian-based website. Honestly, I'm kind of sad at the lack of originality given the otherwise extremely ingenious community we have here.

Maxatar · on May 13, 2024

But you just proved the point. You might not care to be signed up for some random Trump org, Scientology, or whatever, but other people do care and if you want to author a website that responsibly uses people's emails without subjecting them to unnecessary spam, then it's worth taking these techniques (not necessarily this specific one) into consideration.

While OCR does exist it's incredibly expensive compared to text scraping. The main way to combat spam is to make the cost of spamming more expensive than the benefit.

tamiral · on May 14, 2024

This is a cool way of doing things, I'll try it on my blog ! thanks!

dartos · on May 13, 2024

Idk LLM powered scraping can pull the email out of this without any issue

stkdump · on May 13, 2024

It even uses the exact same syntax as in html, so as long as svg content isn't specifically excluded, normal web scraping would just work without modification.

judge2020 · on May 13, 2024

Perhaps, but I think OCR is more likely.

ChrisMarshallNY · on May 13, 2024

That's a pretty cool trick.

I was not aware that we could embed CSS in SVG.

emayljames · on May 13, 2024

a much easier way is to convert the email address into html entities. It then displays and can be copied, but the actual source code doesnt have the email address.

replete · on May 13, 2024

<a href="{rewritten by js}">domain.com</a> a::before { content: "username@" }

iforgotmysocks · on May 13, 2024

I just have a simple contact page that sends message to discord webhook

kindawinda · on May 13, 2024

google might start indexing your email