More

lhoursquentin · on Oct 27, 2022

The lack of arrays, dicts and local variables when trying to be POSIX compliant becomes quickly annoying when writing big programs though. There are of course workarounds to deal with those but I wish we didn't have to use them.

masklinn · on Oct 27, 2022

Needing arrays, dicts, and local variables is a strong hint that you need something more capable than a shell. So is calling the artefact a program.

lhoursquentin · on Oct 27, 2022

What's interesting with this example of command1 | command2 is that some shells such as zsh will optimize the last member of the pipeline to be executed in the current process (nothing mandated by POSIX here), so effectively this works on zsh.

lhoursquentin · on Oct 27, 2022

> [[ ]] is a bash builtin, and is more powerful than [ ] or test.

Agreed on the powerful bit, however [[ ]] is not a "builtin" (whereas [ and test are builtins in bash), it's a reserved word which is more similar to if and while.

That why [[ ]] can break some rules that builtins cannot, such as `[[ 1 = 1 && 2 = 2 ]]` (vs `[ 1 = 1 ] && [ 2 = 2]` or `[ 1 = 1 -a 2 = 2 ]`, -a being deprecated).

Builtins should be considered as common commands (like ls or xargs) since they cannot bypass some fundamental shell parsing rules (assignment builtins being an exception), the main advantages of being a builtin being speed (no fork needed) and access to the current shell process env (e.g. read being able to assign a variable in the current process).

sharat87 · on Oct 28, 2022

Thanks. Didn't know the word builtin had a specific meaning in bash, which, seems obvious now in hindsight. Should be fixed soon.

lhoursquentin · on June 2, 2022

> Yeah, I know there's D3.js but that involved way more knowledge and learning than I was interested in.

Same, d3 looked very powerful but had a steep learning curve. I was looking for something simple to generate process trees in real time and ended up using cytoscape js [0], helped me have a working POC in an hour, highly recommended.

[0] https://js.cytoscape.org/

lhoursquentin · on Jan 11, 2022

https://www.grymoire.com/Unix/Sed.html is a good resource to get started, though it's definitely something you should not just read due to the fact that sed code is pretty hard to read. Have a terminal on the side and try it out live. Once you understand pattern and hold spaces sed becomes a pretty fun esoteric language :)

lhoursquentin · on Oct 30, 2021

> I would be curious if mksh is also faster. ksh93 also claims great performance increases, but it is no longer (well) maintained.

mksh `printf` is not builtin, so if the script makes heavy use of it then it tends to be slower than bash.

ksh is very interesting because it has the ability to avoid spawning new processes on most command susbstitutions, making it usually much faster than all other POSIX shells.

But again this depends heavily on what the script being executed is doing, here's a highlight of what I described:

  $ time mksh -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
  
  real    0m2.121s
  user    0m1.321s
  sys     0m0.875s

  $ time ksh -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
  
  real    0m0.013s
  user    0m0.008s
  sys     0m0.005s

  $ time dash -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
  
  real    0m0.311s
  user    0m0.279s
  sys     0m0.073s
  
  $ time bash -c 'for i in $(seq 1000); do printf $(printf $(printf $(printf hello))); done' > /dev/null
  
  real    0m0.664s
  user    0m0.538s
  sys     0m0.186s

lhoursquentin · on Nov 19, 2020

Great post, also love what you are trying to do with C playground, this is awesome!

I've recently been trying to build something similar, visualizing forks/exeve/read/write, but using the strace output of a binary, which is much less challenging.

ksml · on Nov 19, 2020

Thank you! It's open source, and I'd love to hear if you have any suggestions for it. Would also love to see what you're building!

lhoursquentin · on Nov 19, 2020

Cool I'll definitely try to set it up in the coming days!

Here's my humble strace visualizer: https://lhoursquentin.github.io/visual-strace/

lhoursquentin · on Sept 21, 2020

You should probably note that this hack spawns an additional shell process per comment :)

lhoursquentin · on Sept 18, 2020

Yes most of the time the respecting POSIX to the letter is not needed, but it is of course satisfying knowing that your script can run fine on BSDs and other less common distributions :)

Though sometimes you don't need to go that far to break stuff, for instance switching from Fedora to Ubuntu. I've seen many scripts fail on debian derivatives because people think using #!/bin/sh as a shebang is fine since it works on their computer where sh was in fact a symlink to bash.

But on debian based distributions /bin/sh is often dash, not bash, and dash is basically the strict POSIX subset + local, all fancy stuff like [[ ]], &>, arrays, ... will fail.

Though this is less about long options here and more about general shell scripting.

kazinator · on Sept 18, 2020

You don't know that your script will run fine anywhere, if you've not actually run it there.

But, still, that's no reason to adopt a mindset of actively wrecking the chances of such success.

(Which is what passive ignorance amounts to, effectively).

lhoursquentin · on Aug 25, 2020

The thing is when you restrict yourself to only what POSIX specifies you don't even have basic data structures, no array, no hash map not even local variables, and that often leads to code that relies on multiple hacks.

For instance: you can't even slice "$@" which is the only array like construct specified by POSIX (excluding unsafe splitting), so you end up shifting and re-enqueuing with set -- "$@" "$1", which is unreadable.

Want to read a single character? Impossible with the POSIX read, but you can workaround with dd

etc.

All those workarounds to POSIX limitations are fascinating, but it forces a lot of arcane constructs, that's for sure.

And of course this is usually the point where you ask yourself if you've chosen the correct language for the task, but that's another debate.

stephenr · on Aug 25, 2020

> so you end up shifting and re-enqueuing with set -- "$@" "$1", which is unreadable.

How is that unreadable? Set all the positional arguments as is, then set the first one again, into new positional arguments. I'm not sure why you want the first arg twice, but thats your call.

Either way, it's hardly unreadable.

> And of course this is usually the point where you ask yourself if you've chosen the correct language for the task, but that's another debate.

It isn't another debate though. It's all the same debate: do you know the language you're reading and/or writing? Do you know what it can do, what it can be pushed to do, and what it really shouldn't do?

Those are all related to the same basic point: if you think Shell is unreadable, I'd suggest it's because you don't know shell.

Remember readable means that you can read and understand it. Not that it's written out in words that someone with on page 1 of "how to program for dummies" can understand.

lhoursquentin · on Aug 25, 2020

> Remember readable means that you can read and understand it

Yes you are right, my usage of "readable" was probably wrong there.

Maybe the point I was trying to make was less about the individual constructs but more about how the lack of "common" features makes the whole program less "understandable", or maybe less easy to get familiar with, simply due to the amount of code needed to achieve a specific task.

The same way assembly is considered less "readable" than C. Not because assembly is less readable on a line by line basis, it's even simpler, but because of the number of lines and operations needed to achieve a simple task.

Basically it's easier to understand 10 lines than a 1000.