Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've seen the same here in Germany but they do appear only if you use the results within the last 24h functionality. It looks like the German content is generated through GPT2 or 3. It makes no real sense if you read it. If you go on the page you are immediately redirected to a scam just like the article mentions. Interestingly they use ".it" domains here. It also looks like the domains might have been hacked or are expired domains that have been bought.

For example if you check havfruen4220.dk on archive.org you can see that it appears to have been a legitimate business website before. https://web.archive.org/web/20181126203158/https://havfruen4...

How do they rank so well?

I've checked the domain on ahref and it has almost no backlinks. But if you look closely you will see that all the results that rank very well have been added very recently. On the screenshots in the article you can see things like "for 2 timer siden" which means 2 hours ago. It looks like google is ranking pages that have a very recent publishing date higher.

Edit: Here is what the content of such a site looks like: https://webcache.googleusercontent.com/search?q=cache:Bk0VsM...



Typically Google has a warming/trial period for new large content sites, after their search bot is introduced to the content and has spidered its way through the site.

For example there used to be a very common content farm system, that was structured like like this:

https://domainsites.com/site/nytimes.com

So when people searched for sites by domain name, the zillions of low traffic long-tail results of this farm system would be all over Google's results.

What it would present on the page is a mess of data about nytimes.com, such as traffic, or keywords pulled from the site header, maybe a manufactured description (or pulled right from the site head), sometimes images / screenshots of the site. Anything that could be stuffed in there to fill up enough content to get Google to not do an automatic shallow content kill penalty on the content farm. This worked for several years very successfully until Google's big algorithm updates, 9-10 years ago or whatever now (Penguin et al.). You could just build a large index of the top million domains (eg Alexa and Quantcast used to provide that index in a zip file), spider & scrape info from the domains, and build a content farm index out of it and have a million pages of content to then hand off to Googlebot.

So initially such a farm will boom into the search rankings, Google would give them a trial period and let out the flood gates of traffic to the site. Then Google would promptly kill off the content farm after the free run period expired and they had figured out it was a garbage site.

I still occasionally see this model of content farm burst up into traffic rankings, and it's usually very short lived. It makes me wonder if that's not more or less what's going on with the Mermaid farm.


This definitely looks like an expired domain that was bought. Havfruen seems to be a restaurant in the city of Korsør - which conveniently have the postal code of 4220.


.it pages are used in Norway too, but I'm not sure it's something GPT-ish that's being used. Whole sentences are copied word for word from other articles.(might be a small dataset it's trained on?)

It could of course be that its something similar to GPT that is trained on all the content it could find and then writes articles, cause it's clearly messing up sometimes, form the small piece of content available at the search results page.

I'm not sure if this is an ML race and the reason we're not seeing the same thing in English is because Google might understands English better than spammers. While in Norwegian and German it's the other way around?

Clearly freshness is a large part of it. Google seems to have indexed millions upon millions of pages tied to this in the last 24 hours.


Seems like not a new thing. Here is a warning tweet from beginning July from Danish Cybersec guy @peterkruse who saw his name coming up for a different domain owned by the same registrant as havfruen4220.dk

https://twitter.com/peterkruse/status/1410895961803665410


I presume "GPL" was an autocorrect from the intended "GPT" right?


Correct, it was a typo


I don't know. I've tried reading the GPL2 & 3, and a lot of it just sounds like lawyer gibberish to me that could easily be attributed to GPT


Interesting, I've been seeing the same spam for Norwegian searches, but with the domain nem-multiservice dot dk, or nem-varmepumper dot dk - presumably another legitimate business' domain that expired and was grabbed by the scammers. Visiting those domains show the same graphic as shown in the article.

Almost any search in Norwegian will have obvious scam sites like these in the top 10 results.

Other domains part of the same scam that show up in my results today: mariesofie dot dk, bvosvejsogmontage dot dk

I wonder if it is related to this: https://www.dk-hostmaster.dk/en/news/dk-hostmaster-takes-102...


Yup. Those domains are the same thing, and redirects to the same thing. There are even more domains.

Never seen anything on this scale before. I can search for basically anything(tax rules, baking, stocks, property, hygiene...) and Google will most likely show those domains somewhere.


I had similar experiences with: https://www.xspdf.com/resolution/51859292.html

The content seems taken from other websites and mixed in a nonsensical way. It comes up frequently in my search results. www.xspdf.com has completely unrelated content and seems a separate business.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: