The lack of arrays, dicts and local variables when trying to be POSIX compliant becomes quickly annoying when writing big programs though. There are of course workarounds to deal with those but I wish we didn't have to use them.
What's interesting with this example of command1 | command2 is that some shells such as zsh will optimize the last member of the pipeline to be executed in the current process (nothing mandated by POSIX here), so effectively this works on zsh.
> [[ ]] is a bash builtin, and is more powerful than [ ] or test.
Agreed on the powerful bit, however [[ ]] is not a "builtin" (whereas [ and test are builtins in bash), it's a reserved word which is more similar to if and while.
That why [[ ]] can break some rules that builtins cannot, such as `[[ 1 = 1 && 2 = 2 ]]` (vs `[ 1 = 1 ] && [ 2 = 2]` or `[ 1 = 1 -a 2 = 2 ]`, -a being deprecated).
Builtins should be considered as common commands (like ls or xargs) since they cannot bypass some fundamental shell parsing rules (assignment builtins being an exception), the main advantages of being a builtin being speed (no fork needed) and access to the current shell process env (e.g. read being able to assign a variable in the current process).
> Yeah, I know there's D3.js but that involved way more knowledge and learning than I was interested in.
Same, d3 looked very powerful but had a steep learning curve. I was looking for something simple to generate process trees in real time and ended up using cytoscape js [0], helped me have a working POC in an hour, highly recommended.
https://www.grymoire.com/Unix/Sed.html is a good resource to get started, though it's definitely something you should not just read due to the fact that sed code is pretty hard to read. Have a terminal on the side and try it out live.
Once you understand pattern and hold spaces sed becomes a pretty fun esoteric language :)
> I would be curious if mksh is also faster. ksh93 also claims great performance increases, but it is no longer (well) maintained.
mksh `printf` is not builtin, so if the script makes heavy use of it then it tends to be slower than bash.
ksh is very interesting because it has the ability to avoid spawning new processes on most command susbstitutions, making it usually much faster than all other POSIX shells.
But again this depends heavily on what the script being executed is doing, here's a highlight of what I described:
$ time mksh -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
real 0m2.121s
user 0m1.321s
sys 0m0.875s
$ time ksh -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
real 0m0.013s
user 0m0.008s
sys 0m0.005s
$ time dash -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
real 0m0.311s
user 0m0.279s
sys 0m0.073s
$ time bash -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
real 0m0.664s
user 0m0.538s
sys 0m0.186s
Great post, also love what you are trying to do with C playground, this is awesome!
I've recently been trying to build something similar, visualizing forks/exeve/read/write, but using the strace output of a binary, which is much less challenging.
Yes most of the time the respecting POSIX to the letter is not needed, but it is of course satisfying knowing that your script can run fine on BSDs and other less common distributions :)
Though sometimes you don't need to go that far to break stuff, for instance switching from Fedora to Ubuntu.
I've seen many scripts fail on debian derivatives because people think using #!/bin/sh as a shebang is fine since it works on their computer where sh was in fact a symlink to bash.
But on debian based distributions /bin/sh is often dash, not bash, and dash is basically the strict POSIX subset + local, all fancy stuff like [[ ]], &>, arrays, ... will fail.
Though this is less about long options here and more about general shell scripting.
The thing is when you restrict yourself to only what POSIX specifies you don't even have basic data structures, no array, no hash map not even local variables, and that often leads to code that relies on multiple hacks.
For instance: you can't even slice "$@" which is the only array like construct specified by POSIX (excluding unsafe splitting), so you end up shifting and re-enqueuing with set -- "$@" "$1", which is unreadable.
Want to read a single character? Impossible with the POSIX read, but you can workaround with dd
etc.
All those workarounds to POSIX limitations are fascinating, but it forces a lot of arcane constructs, that's for sure.
And of course this is usually the point where you ask yourself if you've chosen the correct language for the task, but that's another debate.
> so you end up shifting and re-enqueuing with set -- "$@" "$1", which is unreadable.
How is that unreadable? Set all the positional arguments as is, then set the first one again, into new positional arguments. I'm not sure why you want the first arg twice, but thats your call.
Either way, it's hardly unreadable.
> And of course this is usually the point where you ask yourself if you've chosen the correct language for the task, but that's another debate.
It isn't another debate though. It's all the same debate: do you know the language you're reading and/or writing? Do you know what it can do, what it can be pushed to do, and what it really shouldn't do?
Those are all related to the same basic point: if you think Shell is unreadable, I'd suggest it's because you don't know shell.
Remember readable means that you can read and understand it. Not that it's written out in words that someone with on page 1 of "how to program for dummies" can understand.
> Remember readable means that you can read and understand it
Yes you are right, my usage of "readable" was probably wrong there.
Maybe the point I was trying to make was less about the individual constructs but more about how the lack of "common" features makes the whole program less "understandable", or maybe less easy to get familiar with, simply due to the amount of code needed to achieve a specific task.
The same way assembly is considered less "readable" than C. Not because assembly is less readable on a line by line basis, it's even simpler, but because of the number of lines and operations needed to achieve a simple task.
Basically it's easier to understand 10 lines than a 1000.