The second example for each language sample where the generated squid ends up being “B4aajs” essentially reads as “P0ooop” to a Swedish speaker.
Which is fine, they don’t propose to filter “bad” words in other languages, but kind of funny when that’s one of the highlighted examples, right next to the goal of filtering words. Goes to show how hard it is to filter profanity generally for international audiences
Could you just remove vowels and hit 99.9% of profanity in all languages? Ditto for removing their 0-9 equivalents, if you're really worried about it. Quick out of the box support for that via being able to define a custom alphabet.
Well until we figure out a way to remove pattern matching from humans... use GUIDs if that's an issue. Removing vowels fixes "spelling almost all bad words explicitly", though I'm open to being proven wrong with fun new swears in exotic (to me) languages :)
The problem of "pick any N symbols that don't make any profanity in any language across all time" isn't what this is solving, nor should it have to. Take the same concept but use whitelisted words to build the token if you're that adverse to computer generated, fill in the blank naughty words. Keep "pen" and "island", among other things, off that list ;)
Which is fine, they don’t propose to filter “bad” words in other languages, but kind of funny when that’s one of the highlighted examples, right next to the goal of filtering words. Goes to show how hard it is to filter profanity generally for international audiences