Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I believe this is also the Unicode recommendation when context doesn't determine a different algorithm to read.

Except that emojis are universally two "characters", even those that are encoded as several codepoints. Also, non-composite Korean jamo versus composited jamo.



Like this: “:)” ?

Japanese kana also count as two characters. Which they largely are when romanized, on average. Korean isn’t identical but the information density is approximately the same. Good enough to approximate as such and have a consistent rule.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: