I had an IRC markov-like bot in Python. It used to be so simple in Python 2:
- the bytes come in from the IRC server
- they go in a string in the log, I can print this log to the console even if it has some IRC control codes (there will be a garbage character here or there but that's OK)
- if I want to make a bold/color/etc I can just throw \xwhatever in the string. if a word from the log had bold in it and it's repeated back it will also have the bold in it.
Then I reluctantly ported it to Python 3 and it was awful. Python now had to micromanage every single string to ensure I don't have any "naughty" bytes in it. Conversions to/from byte/string everywhere. Massive headache every time I wanted to read or write IRC encoded strings to a txt file or print them to the console or anything.
In Python 2 the encoding only lived outside the program, in my terminal, text editor, and IRC clients. Bytes went in, bytes went out, and everyone was happy. Python 3 decided it needed to know exactly what was going on every step of the way and didn't trust me anymore, forcing me to do an elaborate song and dance to get anything done.
I prefer my tools to have as little an opinion as possible. Let me open the door while I'm driving, let me run my web browser as root, and let me print \xDE\xAD\xBE\xEF to the console.
> Python now had to micromanage every single string to ensure I don't have any "naughty" bytes in it.
Oh that is annoying, our use case just fitted better into how Python 3 works. We went the opposite way. We micromanaged strings all over the place to ensure that encodings would always be correct. We needed unicode but also connected to Windows systems, which uses their own weird codepage system. Encoding and decoding was everywhere along with encoding detection. Python 3 made all that go away.
- the bytes come in from the IRC server
- they go in a string in the log, I can print this log to the console even if it has some IRC control codes (there will be a garbage character here or there but that's OK)
- if I want to make a bold/color/etc I can just throw \xwhatever in the string. if a word from the log had bold in it and it's repeated back it will also have the bold in it.
Then I reluctantly ported it to Python 3 and it was awful. Python now had to micromanage every single string to ensure I don't have any "naughty" bytes in it. Conversions to/from byte/string everywhere. Massive headache every time I wanted to read or write IRC encoded strings to a txt file or print them to the console or anything.
In Python 2 the encoding only lived outside the program, in my terminal, text editor, and IRC clients. Bytes went in, bytes went out, and everyone was happy. Python 3 decided it needed to know exactly what was going on every step of the way and didn't trust me anymore, forcing me to do an elaborate song and dance to get anything done.
I prefer my tools to have as little an opinion as possible. Let me open the door while I'm driving, let me run my web browser as root, and let me print \xDE\xAD\xBE\xEF to the console.