> Automation was glued together in one of these with a series of grep, awk, sed, ls, test, commands glued together. Anything more complicated was written in C and called from one of these things.
This doesn't sound that horrific to me. It's the classic Unix approach of building small tools that do one thing well, and composing them in novel ways to solve problems. For any problem that can't be solved this way you write another small tool using your programming language of choice. Rinse and repeat.
But occasionally Unix attracts users and programmers who reject this approach, and who prefer building a monolithic tool, or in the case of Larry Wall, new programming languages. To be clear, I'm a fan of Perl and think it has its place, especially in the era it came out. It inspired many modern languages, and its impact is undeniable.
Personally, I find solutions you refer to as "unmaintainable nightmare" to be simple and elegant, if used correctly. No, you probably shouldn't abuse shell scripts to build a complex system, and beyond a certain level of complexity, a programming language is the better tool. But for most simple processing pipelines, Unix tools are perfectly capable and can be used to build maintainable solutions.
The classic Knuth-Mcllroy bout[1] comes to mind. Would you rather maintain Knuth's solution or Mcllroy's?
I don’t think you’ve seen the kind of scripts the person you are responding to is talking about.
I have, and mentioned one lower down in the comments. Unix philosophy was great but does not scale well in terms of maintainability or efficiency. Invoking processes over and over again loops is godawful slow. And the horror of complicated shell scripts is legendary.
As a self-taught coder, I've experienced many times how highly skilled software engineers groan and sweat when they encounter shell scripts. I don't understand why, but it seems like people with a CS background are never really taught shell scripts and have come to irrationally fear them. It's sort of taboo.
This results in weird behavior, such as writing a groovy (Java?) script for Jenkins to execute bazel in order to build a go binary that runs the very same commands in an exec.Command() construct. Or people who download and import pandas to grab the third field in a csv file.
During the course of learning, I've naturally written code in bash that should have been written in another language. I replaced if statements with case because they turned out to be more performant. It's a great learning experience and why I got into python and go.
IMO we should use the right tool for the job. Sometimes that tool is a combination of unix utilities that you can put in a shell script for easier maintenance. It's just procedural execution of (usually very efficient) binaries, akin to a jenkins script or gitlab pipeline. Just mind the exceptions and use exit codes.
I'm the type of person that would grab pandas to parse a CSV. Here's my reasoning
* often times, it's not just the third column I want. Sometimes it becomes "third column unless the first column is 'b' then instead grab the fourth column". Having a good data representation makes sure that I'm not mixing logic code with representation parsing code
* I don't have to care about CSV parsing edge cases. Escaped comma? Quotes? I don't care, the library will either handle it or throw an explicit error. With custom parsing code, instead of an error, I'll get some mangled result in the middle of the file that I won't even catch / notice until later down the line
* when working with CSVs, in my area (ML / scientific compute), Python is often the right context to be in.
> I've experienced many times how highly skilled software engineers groan and sweat when they encounter shell scripts. I don't understand why, but it seems like people with a CS background are never really taught shell scripts and have come to irrationally fear them. It's sort of taboo.
It's not a lack of being "taught" shell scripts. It's the fact that shell programming constructs aren't well documented, your "standard library" is basically dependent on whatever binaries happen to be available on the filesystem, error handling is almost non-existent, etc.
It's very easy to write a bad shell script that "solves" a problem as long as a bunch of assumptions aren't violated. In my experience, senior software engineers are extremely averse to hidden assumptions and very concerned with reliability of the systems they build.
Yea... "Well the script works fine in MY SYSTEM" was the most common issue with said scripts. Running across different versions of Linux was fraught with issues, much less any other operating system that could execute a shell script.
Of course this can happen with any language, especially as it ages and adds complexity.
I certainly have, and might have written a few of those myself.
But this doesn't make this approach inherently wrong or obsolete. The programmer is wrong for trying to use the tools beyond their capabilities. Where that line is drawn is subjective, as is the concept of maintainability, but if you feel that you're struggling to accomplish something, and that it's becoming a chore to maintain, the path forward is choosing a more capable tool, like a programming language.
I think we are actually mostly in agreement there then.
Perl was invented because the gap from shell to more capable languages was (and is) really big. Languages like Python and Ruby didn’t exist yet, and Perl had a really, really strong sweet spot in text processing.
Perl is usually installed by default on Linux and Unix systems. Ruby might be there, it depends.
Perl is faster than Ruby. Ruby has been one of the slower scripting languages.
But Ruby has been working on performance improvements in the past few releases. I have not seen any benchmarks of the current Perl versus the current Ruby, so this may have changed.
Perl is more concise than Ruby allowing more functionality for less code.
>> As for speed it's useful to distinguish between startup and runtime.
The JVM would like a word. (It has slow startup, but can be very fast at runtime due to JIT optimization and cacheing.)
Scripts should start and run quickly.
Ruby has historically been fairly slow which is why Ruby 3 focused heavily on performance. It has been improving a lot, but I have not seen any benchmarks against other programming languages.
The thing to remember here is that speed is relative. I haven't checked but Ruby 3 is probably faster than Perl 5.0. For most scripting purposes on modern hardware Ruby is plenty fast enough. Whilst there may may be marginal speed differences between Perl, Ruby and Python the differnce is insignificant.
I used Perl then Ruby as my main language for almost a decade each. These days, I don't really write Ruby anymore; I moved on to Elixir and never looked back. But I still find myself using Perl on the command line, in contexts where Awk or Sed would also make sense. Ruby never optimized for the one-liner case IMO.
Yeah, the actual experience of leaky abstractions and non-portable code is forgotten. Perl solved a very real problem in the 90s. Grief, I shudder to think back to the sheer complexity of my bashrc file back then.
Yes, exactly. I have seen entire backend systems written in bash. Everything was shell script, sed and awk. The owner didn't want python or perl because he only knew bash and the related tools.
Everything was needlessly hard because these tools were not built for that. Easy to talk about philosophy and the "classic Unix approach" if you don't have to build modern applications this way.
You hit it on the head with the slowness of loops when the body comprises a series of program invocations. The horror really seeps in when you realize the original author wasn't stopped by the lack of data structures: they could get around that with some creative variable names.
Programming environments including shells and operating systems are just tools. And every tool can been misused. I opened a can of beans with a screwdriver once. Reality is messy. That doesn’t make the tool bad.
At the time, before everything became Linux, all these tools and the shells used to glue them together were an incoherent mess. Was your glue sh, ksh, csh, tcsh, bash or something uncommon like zsh? Did your grep, awk and sed use the same regexp syntax as your text editor? Single letter command line options, all meaning something different to each tool. Dozens of domain specific languages (shells, awk, sed etc.) meant dozens to learn and keep in your head. And you needed it in your head, because finding the information you needed in the massive single document man file reference was a pain because hypertext links had not been invented yet (well, probably in Emacs, which was another tool like Perl that people used to avoid the command line nightmare).
By the time our lord and savior zsh appeared on the scene Perl was already at Perl 3. And, to be fair, I do not think many used zsh before 2.1 which was some time 1991 fall and by then the Camel Book was out for half a year or something like that. So the pre-Perl and the we-use-zsh days do not really overlap.
I remember capturing every password at my university via "methods". Because we had a printer quota. In the summer when everyone was gone I printed out all the man pages (all the mans, the system libraries, etc) so I'd have a nice reference book. I made sure to make it so no one was charged any money.
The one thing people can't possibly fathom if they started coding after the mid-late 90s was how much we relied on the printed medium.
I still remember when we measured the documentation IBM shipped with the mainframes not in pages but in yards it occupied on the shelves. It was a lot.
While there are many possible answers, it eventually boils down to Unix being a C runtime and thus has a C culture. Lisp is from outside of this section of the world, so it simply had less adoption and support inside Unix land. Other languages, like sed, awk, and shell are not C but share its heritage(essentially, they were made by people close to the making of C.)
The first big iron I had the luck to work with was an IBM 3090 , essentially a gift from IBM, it handled the university entrance exams of the entire country of some ten million people and it had 64 MB of RAM. (It was also the first computer in Hungary permanently connected to the Internet via a leased line to Austria so it had an Austrian IP address. Hungary didn't have its IP region for two more years.)
I think the first machine with 128MB was a VAX 6510 a year or two later at another university. A little bit later, in 1994, CERN had gifted a VAX 9000 with an astounding 256MB of RAM.
To compare, the first server I installed Linux on had a grand total of 4MB RAM -- and that was one of the largest computers a small department at the university had.
It would be a long, long time before "128MB" and "mine" entered the same sentence.
Shades of the Monty Python sketch here, but the following is true...
4MB?!?
My first encounter with IBM kit was a, er, darn I'm not sure cuz I'm getting old, but I think it was a 4300? Not big iron in some senses, but still with a box that was something like 6-8 feet long iirc and definitely several feet wide and high. (And a bank of about 6-8 tape decks, each as tall as me, and two disk units, each the size of a washing machine, and so on.)
Its RAM? A massive 1 MB.
That IBM kit was the heart of the super new expensive upgrade in 1980 that cost something like 5-10 million pounds iirc to build, including a brand new building to house it and a team of programmers.
The older setup, which is where I was until its last days, was an ICL system that was expanded at the end of its life to a whopping 48KB -- yes, KB -- of RAM.
And that kit ran all the systems, internal (payroll, accounting, etc., etc.) and external (sales etc.) for the largest car dealership in the UK.
128MB? 4MB? Even 1MB? That was an unimaginably insanely large amount of RAM!
(Yes, it was very weird to be working with this physically enormous setup, and dealing with keeping it all cool enough not to halt for a half hour or so, through super human efforts when the A/C broke down, when the likes of PETs, Sinclair Z80s, and Acorn Atoms were a thing...)
Ha yes the aforementioned IBM 3090 was so big for installation they removed the roof of the building it was living in, craned it in place and put the roof the back. Bringing it up the elevator or stairs was impossible.
Much later, in the second half of the 90s, I remember the four of us carrying an IBM HDD -- I think it was your normal 5.25" drive but it needed four people because it was mounted on a vibration dampening base ...
> Much later, in the second half of the 90s, I remember the four of us carrying an IBM HDD -- I think it was your normal 5.25" drive but it needed four people because it was mounted on a vibration dampening base ...
Continuing the shades of Monty Python theme[1]:
I remember one of my first few nights being in charge of the new IBM kit (I was a "computer operator" back then, in 1980), leaning back in the fancy new chair at the desk with its fancy "virtual" teletypes (a couple "terminals" displaying the status of the OS with a CICS system), and showing off to an "underling" by swinging a long plastic slide rule or something stupid like that (I no longer recall), and me accidentally banging it on the desk. Right "near" a recessed big red button. Or perhaps "on" the button? As I snapped my head around to look at the button and begin to understand what I may have just done I heard an ominous series of whirring and clicking sounds coming from the cpu box, right near where there was an 8" diskette drive that wasn't supposed to be doing anything while the OS was running (it was just for starting the OS). Then I looked at the console... Uhoh. They didn't fire me but it took months before they decided to let me be "in charge" again with someone else actually hovering over me...
Fast forward to when I was a coder (BCPL) in a small software startup, during the second half of the 80s, presumably 10 years before you were carrying your 5.25" drive monster, I vividly recall someone bringing a 700MB hard drive back from a local computer store. It cost an astonishingly paltry 700 quid or thereabouts. A pound a MB!
> I vividly recall someone bringing a 700MB hard drive back from a local computer store. It cost an astonishingly paltry 700 quid or thereabouts.
I ... do not know. That sounds very low. Look at https://jcmit.net/diskprice.htm and note the pound was 1.8-ish around this time so the price should have been well above 1000 pounds even in early 90s. We are talking of a 5.25" full height drive, here's an 1987 model http://www.bitsavers.org/pdf/maxtor/MXT8760E.pdf rare in personal computers, it was definitely for workstations / servers.
Elisp is very much a niche language. For whatever reasons, the use of Elisp outside of Emacs is basically non-existent. Elisp is quite clunky, and AFAIK there hasn’t really been any big efforts to make it usable outside of Emacs. People who wanted Lisp outside of Emacs already had Common Lisp. (And Chez Scheme, and Scheme 48, etc etc.)
you probably shouldn't abuse shell scripts to build a complex system, and beyond a certain level of complexity, a programming language is the better tool
but the only free programming languages available at the time were C/C++, various shells, and awk. everything else was expensive or not generally usable for other reasons. all the really useful languages to build complex systems didn't really appear or become freely available until the 90s. and perl was first among those.
I'm not saying that Perl didn't have its time and place. It certainly fulfilled a need at the time for a language more capable than shell scripts, but less cumbersome than C/C++.
But the thing is that today the shell landscape is much more mature for solving simple problems, and we have C/C++ alternatives that are saner and more capable than Perl (e.g. Go). So it arguably has lost its place, as shell tools are still in widespread use, while Perl is mostly underused. Raku is interesting, but it goes in a different direction, and its adoption is practically zero.
Unfortunately. Ruby should have been Perl's natural successor. Python is the VHS of scripting languages. For a start it doesn't have a decent answer to Perl or Ruby's one-liners. Then there's the crippled lambda implementation. Python is a sad case of worse is better.
> For a start it doesn't have a decent answer to Perl or Ruby's one-liners.
This is by design. Readability is core to the design and philosophy of python. One liners are cool and fun to write, but trying to decipher someone else's incredibly dense bash or perl one-liner is absolutely awful.
>> One liners are cool and fun to write, but trying to decipher someone else's incredibly dense bash or perl one-liner is absolutely awful.
You can write hard-to-read code in any programming language.
Python lets you with mandatory whitespace so that the awfulness spans multiple lines instead.
Really talented Python programmers can do downright demonic stuff with list comprehensions.
Python appears to be simple, but is actually quite complex. I recommend reading "Effective Python" (https://effectivepython.com/) to see beneath the surface.
The readability complaint usually comes from people who never took the time to grok the language and its idioms. At least give the user the option. Advocating a language based on what it denies you doesn't make sense. Why use a scriptig language at all if belt and braces is what you're looking for?
This topic is akin to "holy wars" and since I used to think the same way (still do to some extent), I would like you to at least consider another aspect: the effort it takes to "grok the language and its idioms" is vastly different depending on the design of the language. Letting people do whatever they want, whatever way they want it isn't only about protecting them from themselves or not.
Just think of C: I'd argue its design is actually more akin to Python than Perl (and it definitely inspired languages like Go and Zig, NOT languages like C++). It's a small language and this is a very important characteristic of it: you can count on being able to actually master it or at least very well comprehend it. Other effects of a simple and literally straightforward language can be: easier implementation and evolvement, less mental load on the developer, easier portability among developers (both for general knowledge and actual code), etc. I'm not saying that C is the way it is for all these reasons but I wouldn't overlook this factor and I do think that languages like Python are deliberately building on these advantages.
Now, I don't dislike C++ at all but back when I studied it at university, I noticed that it was the first language for me that needed to be actually studied, unlike Pascal, C, Python and "oldschool" JS. Ever since, the only languages where I felt the same were Prolog (mostly because it requires a different mindset; other than that, it didn't seem bloated) and Raku. Not C#, not Java, not Erlang. I didn't really have to touch Perl but from all I know, Raku started off as a fresh take on the Perl approach. It seems somewhat more organized but huge nevertheless, to the extent that there literally isn't one person who really "groks the language" all around. In the case of Raku, I wouldn't even say it encourages you to write unreadable code (especially if you have a thing for APL look-alikes, lol) - it's just so rich that there is a good chance you will come across something in someone else's code you have never used before and don't quite remember how it will act in your specific use case.
There are different types of freedom than "do whatever you want". Like, the freedom to feel safe and confident about code. These days, humanity has aggregated an immense amount of knowledge and technology and "I will do it all by myself" is not that much of an option. And even people with such puritanistic tendencies will choose simple and straightforward tools, even if not for the "limitations".
There have always been feature-rich and bare-bones languages. No point pitting one against the other. I don't consider myself particularly clever but with the help of "Programming Perl" and "Learning Perl" I managed to get a pretty good grasp of the language with no prior programming experience. I just feel a lot of the knee-jerk response to the mention of Perl comes from people who have never put any effort into learning it.
Well I'm saying that "you just didn't bother to learn it, duh" is not an equally valid argument for different languages; it's much less valid for Perl than it would be for Python. It rewards your efforts much less. Perl is notoriously a language that had desperate criticism among its users even by the mid 90's ($[ and $] stuff comes to mind), quickly led to the creation of Ruby and famously "forked itself" with what is now known as Raku.
Now I have barely spent any time with the oldschool Perl but trust me, I have put a lot of effort into learning Raku, the language that was meant to fix Perl. Whenever something that "seemed like a good idea at first but it's actually harmful" shows up, it's usually Perl's legacy. I'm thinking of things like the conceptual mishmash between a single-element list and a scalar value (or in general, trying hard to break down variables arbitrarily into list-alikes, hash-alikes and the rest of the world), the concept of values that try to implicitly pretend they are strings and numbers at will, or the transparency of all subroutines to loop control statements which is some next level spaghetti design. If you ever actually use something like this, you introduce a brand new level of complexity, somewhere inbetween a "goto" and a "comefrom", so I would really think about if this was worth learning at all.
Oh right... from what I remember, it was also Perl that fostered this idiotic idea that a name of a concrete thing could be overloaded to be a namespace as well, and a concrete Foo::Bar could very well be something that has no logical relation to a concrete Foo. Moreover, I'm quite sure Perl invented this nonsensical distribution-module dichotomy where you are supposed to depend on modules, despite the smallest publishable and installable unit being a distribution. There are three outcomes with that:
- if the distribution contains only one module: what was the point of drawing the distinction?
- if the distribution contains tightly coupled modules: you can pretend to only depend on one of the modules but in fact you are depending on the whole distribution together
- if the distribution is a collection of unrelated modules: why are you trying to encouple the metadata when this will make the versioning meaningless?
I can only hope that it's somehow better than Raku but the whole principle is just an anomaly.
And you know, then these people move around in the world, pretending that all of this is just normal and you just have to learn it. Well guess what, there is a reason people might want to put that effort into something else.
> This doesn't sound that horrific to me. It's the classic Unix approach of building small tools that do one thing well, and composing them in novel ways to solve problems.
This works really well if your problem can be solved in one or two liners.
It go bad very quickly when, say, you have two CSV files and want to join them the sql-way.
In sed, you have to use positional variables and think about shell escaping. In perl, you can at least name those variables and use \Q
> This works really well if your problem can be solved in one or two liners.
My personal comfort threshold is around the 100-line mark. It's even possible to write maintainable shell scripts up to 500 lines, but it mostly depends on the problem you're trying to solve, and the discipline of the programmer to follow best practices (use sane defaults, ShellCheck, etc.).
> It go bad very quickly when, say, you have two CSV files and want to join them the sql-way.
In that case we're talking about structured data, and, yeah, Perl or Python would be easier to work with. That said, depending on the complexity of the CSV, you can still go a long way with plain Bash with IFS/read(1) or tr(1) to split CSV columns. This wouldn't be very robust, but there are tools that handle CSV specifically[1], which can be composed in a shell script just fine.
So it's always a balancing act of being productive quickly with a shell script, or reaching out for a programming language once the tools aren't a good fit, or maintenance becomes an issue.
You’re discussing modern tooling in a conversation about early UNIX tooling. Back in the period being discussed, even ‘read’ was less functional. Ksh introduced a lot of the stuff we now take for granted, some of which wasn’t even available until the Ksh93 (long after Perl was released). Bash itself is a younger project than Perl. Albeit not by much.
Fair point. I'm not arguing that Perl wasn't an improvement back then, but that the approach of composing Unix tools is not inherently bad. And as the shell ecosystem evolved since then, and more capable programming languages appeared, Perl has been left by the wayside as a historical relic, rather than the replacement of Unix tools that Wall envisioned.
So I don't disagree that it was needed back then, but it's important to mention the modern context it struggles to exist in.
Perl is still a commonly used tool chain. It is far from being a “historic relic”.
I agree that there’s nothing wrong with composing UNIX tools. I mean, that was one of its key selling points. if you watch any early promo videos for UNIX you’ll see them talk heavily about the composability of the command line and shell scripting. It wasn’t an accident — it was designed that way.
The point of the conversation wasn’t to say that one shouldn’t write shell scripts, it was just to say that there was a massive and unfilled gulf between what was easy to do in Ksh, awk and sed, and what could be done in C.
I think it's possible that things that seem normal and inoffensive can become horrific simply from scale. You'll climb the stepladder without complaint, but then there's that radio tower in Canada...
Others like the idea of the family cow, then they see the 10,000 head feedlot from the highway.
Scale is sometimes sufficient by itself to induce horror.
csh was a decent interactive tool, but not great for scripting. Bourne shell had the right idea but there were so many bugs in various corners of it (I still sometimes end up writing "test "x$foo" = "xbar" even though shells that need that are long gone).
If you can depend on a recent bash and use shellcheck, then it's actually quite a pleasant programming environment, with fewer footguns than one might think. (I want a @#$@# "set -e" equivalent that returns non-zero from a function if any statement in the function results in non-zero).
There are some things that are more awkward than they should be though (e.g. given a glob, does it match 0, 1, or many files, or the way array expansions work).
Also, there's no builtin way to manage libraries (I don't know about Perl, but Python suffers from this as well). This results in me pasting a few dozen lines of shell at the top of any of my significant shell scripts, for quality-of-life functions. Then I have to use "command -v" to check if the various external programs I'm going to use are present. Say what you will about C, but a statically-linked C program can be dropped in anywhere.
Mostly agree. The modern shell scripting environment is much more robust than 30 years ago, with ShellCheck and some sane defaults, as you say. I also find it pleasant, once you get over some of its quirks.
As for managing libraries, that's true, but you can certainly import and reuse some common util functions.
For example, this is at the top of most of my scripts:
set -eEuxo pipefail
_scriptdir="$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")"
source "${_scriptdir}/lib.sh"
This loads `lib.sh` from a common directory where my shell scripts live, which has some logging and error handling functions, so it cuts down on repetition just like a programming language would.
The effortless composition of complex commands out of simple standalone programs is one of the best features of Unix. And yes, I admire and love it as well.
That said, imagine a metrics system for a huge networking company that used these methods to cover all automated testing or defect analysis. Those inner loops were made of greps and seds and so forth, and each one is the invocation of a new program. It wasn't uncommon for these runs to take almost a day.
Besides performance, the other nightmare was was someone described below: each script was a one-off that didn't leverage the work from others. If the author only new C shell, then you know you're going to be doing gymnastics to catch the stderr of some of those programs (you can't capture it in the same manner that Bourne variants do).
Anyway, yes, we all adore the Unix philosophy, but there are limits.
The "philosophy" that I never saw in commercial UNIXes, starting with my first experience with Xenix in 1993, beyond being endless repeated in FOSS circles, which is ironic given how GNU and BSD applications work, with their endless amount of command line parameters.
This doesn't sound that horrific to me. It's the classic Unix approach of building small tools that do one thing well, and composing them in novel ways to solve problems. For any problem that can't be solved this way you write another small tool using your programming language of choice. Rinse and repeat.
But occasionally Unix attracts users and programmers who reject this approach, and who prefer building a monolithic tool, or in the case of Larry Wall, new programming languages. To be clear, I'm a fan of Perl and think it has its place, especially in the era it came out. It inspired many modern languages, and its impact is undeniable.
Personally, I find solutions you refer to as "unmaintainable nightmare" to be simple and elegant, if used correctly. No, you probably shouldn't abuse shell scripts to build a complex system, and beyond a certain level of complexity, a programming language is the better tool. But for most simple processing pipelines, Unix tools are perfectly capable and can be used to build maintainable solutions.
The classic Knuth-Mcllroy bout[1] comes to mind. Would you rather maintain Knuth's solution or Mcllroy's?
[1]: https://matt-rickard.com/instinct-and-culture