The article mentions one rationale for stripping diacritics and I won't deny there are others. That being said:
Stripping diacritics from text will annoy people whose language uses those diacritics. For us the difference between an e, an è and an é is significant. An ü is something entirely different from an u. Calling me Muller when my name is Müller is like calling someone Jan whose name is Jon.
Just as the article says: "But then again, removing diacritics is already linguistically nonsensical. Nonsensical operation is nonsensical."
It's kind of strange how a huge number of monolingual anglophones don't seem to grasp this at all. Witness nearly every English newspaper talking about German football players who are apparently called Muller, Ozil, etc. They are actually different vowels, not just decoration.
Mispronunciations are understandable, but this is as simple as copy/paste.
The goal of the exercise is to use the stripped text to check for spam. That doesn't mean that the stripped text is what ends up in the user's inbox. The idea is to see if the text contains things like VᎥÄgԻa, not whether a given word means "country" or "father". This would only be a problem if some set of characters that looked like "viagra" was actually a valid non-spammy word in some language.
There was also an article here a few weeks back about Russian government officials securing fat contracts for their friends in private industry by intentionally replacing one or two letters in the common search terms of their bid requests with Latin lookalike characters. This was impossible to detect while reading the bid request, but also made it impossible for other contractors to find the bid request through the website. As a result only their buddy would submit a bid, and at much higher than market rate. Something similar to this, but converting down to Cyrillic characters instead of ASCII, could be used to check for hinky bid requests on upload.
Stripping diacritics from text will annoy people whose language uses those diacritics. For us the difference between an e, an è and an é is significant. An ü is something entirely different from an u. Calling me Muller when my name is Müller is like calling someone Jan whose name is Jon.
Just as the article says: "But then again, removing diacritics is already linguistically nonsensical. Nonsensical operation is nonsensical."