Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like the idea, though I use nanoid with the safe letter dictionary (it excludes letters used for profanity[0])

They should use a similar dictionary approach IMO because I looked at the implementation and it’s hardcoded to look for “bad” words

Otherwise looks real straightforward! I’d love to see some performance test suites for it

[0]: https://github.com/sqids/sqids-javascript/blob/ebca95e114932...

[1]: though with UUID v4 so common to generate and well optimized in most languages I wonder if these userland solutions are really better. You can always generate a UUID and re-encode with base32 or base64 with also is well optimized in most languages



> it excludes letters used for profanity

That doesn't seem possible. How would that work?

> I looked at the implementation and it’s hardcoded to look for “bad” words.

If you mean https://github.com/y-gagar1n/nanoid-good, that seems to be doing the same thing.

In general, I'm a bit weary of solutions that "guarantee no bad words" – this is usually highly language-specific: One language's perfectly acceptable name is another language's swear word.


This is the implementation: https://github.com/CyberAP/nanoid-dictionary

We use it in a highly internationalized product spanning multiple languages and haven’t yet ran into a complaint or value on audit that would constitute something offense in any language per our intl content teams anyway.

That isn’t to say it’s 100% (and simply enough we don’t audit every single URL) but I suspect we would have gotten at least a user heads up by now

Never the less we are moving our approach to uuids that get base32 encoded for some of our use case for this. They’re easier to work for us in many scenarios


It's particularly funny because their example docs for .NET outputs "B4aajs", which to any Swedish l33t speaking individual, would read "Bajs", which means "shit"


Somewhere there's a database for every bad word and every bad typo in every language and that one just got added.


Omit vowels and you're 90% of the way there; omit the vowel-looking digits 0,1,3,4 and you're probably >99% of the way there.


fxck


Which is, evidently, why nanoids also excludes x and X, as well as v and V (fvck).


fjck


> That doesn't seem possible. How would that work?

agree; b00b, DlCK, cntfcker

But I suppose, if user doesn't get to craft input, the collision space of converted numerical ids and words like above is sufficiently small to be ignorable.


Besides vowels, nanoid excludes 0, 1, 3, 4, 5, I, l, x, X, v, V, and other lookalikes, so the chances of generating something naughty in any language are close to zero.


Humans have a high capacity for spotting rudeness. Nanoid’s nolookalikesSafe alphabet would allow blwjb69FKmyD7CK.

(Sorry)


Buy me drink first, jeez


Looks like the dictionaries used are from this file?

https://registry.npmjs.org/naughty-words/-/naughty-words-1.2...

From a quick look, the lists are pretty short, except for the one with English words that at least have some 404 words, but I can imagine there are far more bad words that you want to avoid than just those?



Grepping out naughty words in randomly generated text definitely strictly weakens the information content if you're using it for a secure application but is often necessary.

In the early dotcom era the company I worked for were about to go live and the final step was demoing the end to end flow to the ceo. I had done the back end stuff and hadn't paid much attention to the front-end. The person who did the account creation process wanted to nudge people to generate memorable yet strongish passwords, so when it created your account it would generate with a random password which he did by choosing 2 four letter words at random from the unix dictionary and putting a two digit number between them. He ran that past me as an idea and I thought "yeah, good idea" and didn't think more of it.

However he forgot to first grep out all the naughty words so when we demoed it to the CEO/non-technical founder both of the words in his randomly generated password were swearwords.


I tried something similar with a fixed alphabet that guarantees no profanity and a checksum (luhn)

https://github.com/tttp/dxid




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: