Shell is not my favourite language

Powershell wasted 10 hours of my life, so I was compelled to write this self-therapy session masquerading as a blog post.

Before you discount my experience as the inane ravings of a UNIX fanboy, please understand that you couldn't be further from the truth. I sincerely tried to be fair and gave Powershell as much benefit of the doubt as I could muster.

Disclaimer aside, let's begin the descent into madness.

Why use shell scripts?

Generally, shell scripts are used when a task needs to be automated, but it's not worth pulling out a "real" (see: time-consuming) programming language. Given this, it stands to reason that shell scripting languages should be quick and easy to both learn and write. If it's easier to write a program than it is to write a shell script, it's failed to serve its only purpose.

Foreshadowing is a narrative device in which suggestions or warnings about events to come are dropped or planted.

The veritable mess that is POSIX

POSIX shell is, like most things descending from UNIX, poorly designed and incredibly unpleasant to use. Granted, once you've developed a mental model of how it works (via abstract causality and pain), you eventually learn how to kludge together scripts relatively quickly. That is to say, after you've received enough third-degree burns from touching the proverbial hot-plate, you learn what is and is not a good idea.

Over time, you learn to work around (or simply avoid) its many warts, namely: error-prone string handling and its compulsion to violate the principle of least astonishment. POSIX shell is proof that "worse is better" is real and we can't have nice things. It's a glue used to create precarious and unholy software obelisks, the lingering shadow of undead relics 3 decades past their expiration date.

Did you know "bash" is short for "bourne again shell"? Personally, I would prefer if these shells stayed dead. Unfortunately, I'm lacking a weapon with sacred affinity, so I'll have to settle for asking nicely.

A variant of the ancient Dead Space 'turn off vsync' meme. The protagonist, Isaac Clarke, stands in front a wall with 'set -euo pipefall' scrawled in blood. — Environmental storytelling.

I want to get off Mister POSIX's Wild Ride

Let's set the tone for the rest of this article with a test: how does POSIX shell implement features such as constants (true and false) and comparing values?

Was your answer: running a magic executable?!

❯ which [
/bin/[

❯ which true
/usr/bin/true

You see, when you compare a value like this:

if [ -z "$foo" ]; then
#  ┬
#  ╰─── this accursed symbol haunts my nightmares

...the shell runs the executable literally called [. As I alluded to before, [ is actually an alias for test. On macOS, they are separate executables, which checks out according to POSIX:

The test and [ utilities may be implemented as a single linked utility which examines the basename of the zeroth command line argument to determine whether to behave as the test or [ variant.

In any other context, this would be reviled as the affront to computing that it is, but since it was standardised [checks watch] 38 years ago it flies under the radar. I have a feeling the reception would have been different if shell.js tried to pull this.

Also, I lied—they're only sometimes executables! As per POSIX:

An implementation may choose to make any utility a built-in.

...meaning in that case the utility is included in the shell, removing the need for said magic executables. Why? Why have this many failure modes?! Who thought this was a good idea? After wracking my brain, the only justification for its inclusion is maybe to accommodate systems with so little storage that splitting utilities into separate files would be convenient.

Simpsons season 4, episode 21. Shot of Lionel Hutz contemplating a world without lawyers. — Can you imagine a world without shell scripts?

Byte streams are the `Any` of UNIX

POSIX shell lets you combine multiple commands together to form a "pipeline". This is accomplished by using the pipe operator (|), which connects the stdout stream of one program into the stdin stream of another.

❯ echo 'filename' | xargs cat
howdy

echo sends filename\n to stdout
xargs receives filename\n from stdin due to the pipe
xargs runs cat filename, sending howdy\n (the contents of filename) to stdout
the shell forwards stdout to the terminal, so howdy appears on my screen

               ╭─ stdin         ╭─ stdin         ╭─ stdin
               │                │                │
  echo ─┬───▶─┴─ xargs ─┬───▶─┴─ shell ─┬───▶─┴─ terminal
        │                │                │
stdout ─╯        stdout ─╯        stdout ─╯

In contrast, if I ran cat in a shell:

❯ cat
boop\n
boop

                 ╭─ stdin         ╭─ stdin       ╭─ stdin         ╭─ stdin
                 │                │              │                │
keyboard ─┬───▶─┴─ shell ─┬───▶─┴─ cat ─┬───▶─┴─ shell ─┬───▶─┴─ terminal
          │                │              │                │
  stdout ─╯        stdout ─╯      stdout ─╯        stdout ─╯

keyboard sends boop\n to cat via stdin
cat forwards stdin to stdout
the shell forwards stdout to the terminal, so boop appears on my screen twice

This is incredibly powerful... in theory. In practice it's a wilderness, because streams are unstructured. You're not sending text through streams, you're sending bytes, since text would imply encoding.

Do I mean encoding:

in a textual sense: are these bytes encoded in Windows-1250 or Shift JIS?
in a structural sense: are these bytes JSON or MessagePack?

Both! In practice, lacking a standard for structuring data means relying on guessing character encodings and ad-hoc conventions. Please enjoy this colourful account from an MPV maintainer regarding how POSIX handles locales. I have a feeling POSIX is not their favourite standard.

If you hear "plain-text" used unironically please scold the prepetrator on my behalf. Text is (bytes, encoding), omitting encoding means you just have bytes. Thank you.

Here's another example:

❯ : ls
╭───┬───────────────┬──────┬───────┬─────────────╮
│ # │     name      │ type │ size  │  modified   │
├───┼───────────────┼──────┼───────┼─────────────┤
│ 0 │ can't wake up │ file │ 666 B │ 2 hours ago │
╰───┴───────────────┴──────┴───────┴─────────────╯

❯ find . -type f | xargs ls -al
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option

Oh no, it's busted.

Our pipeline breaks because the filename contains a quote. This is fine, since filenames obviously never contain quotes. Here's one solution, tell both find and xargs to use an ASCII NUL byte as the delimiter:

❯ find . -type f -print0 | xargs -0 ls -al
-rw-r--r-- 1 opeik staff 666 Jul 17 21:48 "./can't wake up"

This is one of the nicer solutions. I hope whatever you're piping find into supports it, or you'll have to add yet more slop to handle it!

[Cueball sits before a computer on a desk while another man stands behind him.]
Man: I was fascinated by locks as a kid. I loved how they turned information and patterns into physical strength.
Cueball: Why does my script keep dying?
[Closeup on Cueball sitting at the computer.]
Man: And a lock invites you to try and open it. It's the hacker instinct. Only your ignorance stands in the way.
Cueball: Wait it's passing bad strings.
[Returns to the two shot of both men.]
Man: I admired Harry Houdini, how he could open any lock and free himself from any restraint.
Cueball: Ah - Bash is parsing the spaces.
Man: Sure some of it was fakery and showmanship. But I still wonder how he so consistently escaped handcuffs.
Cueball: Backslashes?
Man: Huh?
Cueball: Never mind. — Escape artist

Sisyphean-oriented programming

Don't get me started about error handling. Praying nothing goes wrong is a better use of your time than trying to wrangle any semblance of reliability from this cacophony of undercooked and mismatched ideas. Unhandled errors don't stop the script. Pipelines mask errors by always returning the exit code of the last command. Subshells are utterly broken. Evaluating undefined variables results in an empty string.

In 2015, Steam for Linux had a bug that nuked your root drive under certain conditions. How did this happen, you ask? It was a shell script:

# figure out the absolute path to the script being run a bit
# non-obvious, the ${0%/*} pulls the path out of $0, cd's into the
# specified directory, then uses $PWD to figure out where that
# directory lives - and all this in a subshell, so we don't affect
# $PWD
STEAMROOT="$(cd "${0%/*}" && echo $PWD)"

# Scary!
rm -rf "$STEAMROOT/"*

As previously mentioned, undefined variables are expanded to empty string, and in certain situations $STEAMROOT happens to be undefined—so this happens:

rm -rf /*

Oh no, it's busted.

Infernal fortress of suffering

Do you know how to spot an ~~traumatised~~ experienced shell programmer? It's easy! A word of power exists which, upon being uttered, will cause immediate and severe psyche damage to all shell programmers in the vicinity.

To set the mood, start playing this background music, then come back to this article.

Okay, deep breaths... the word of power is "IFS". The IFS (internal field seperator) is a value controlling how the shell handles word spliting. The semantics are insane—an example is required to truly envision the depravity contained within those three miserable letters.

Let's write a shell script that prints the size of all files in the current directory:

for FILE in *; do
    du -h $FILE
done

...and he's what our directory looks like:

❯ touch wake me up
❯ ls
╭───┬──────┬──────┬──────┬──────────────╮
│ # │ name │ type │ size │   modified   │
├───┼──────┼──────┼──────┼──────────────┤
│ 0 │ wake │ file │  3 B │ a minute ago │
│ 1 │ me   │ file │  6 B │ a minute ago │
│ 2 │ up   │ file │  9 B │ a minute ago │
╰───┴──────┴──────┴──────┴──────────────╯

Readers versed in shell have likely already begun involuntarily clenching. It's too late now. All you can do is watch helplessly while all that you love burns.

Let's try running it!

❯ ../ifs.sh
4.0K    wake
4.0K    me
4.0K    up

Hmm... it seems to work? For no particular reason, let's add a filename containing spaces:

❯ touch "can't wake up"
❯ ls
╭───┬───────────────┬──────┬───────┬──────────╮
│ # │     name      │ type │ size  │ modified │
├───┼───────────────┼──────┼───────┼──────────┤
│ 0 │ can't wake up │ file │ 666 B │ now      │
╰───┴───────────────┴──────┴───────┴──────────╯

❯ ../ifs.sh
du: cannot access "can't": No such file or directory
du: cannot access 'wake': No such file or directory
du: cannot access 'up': No such file or directory

Oh no, it's busted.

The workaround is to always quote variables (and string literals):

for FILE in *; do
    du -h "$FILE"
done

❯ ../ifs.sh
4.0K    can't wake up

As for why, recall that IFS controls how word splitting is performed. Given the default IFS ' \t\n' (that's space, tab, newline), can't wake up is being split into three words: can't, wake, up. This happens transparently (often without the programmer realising), and causes one filename to be treated as if it was three. Have you ever wondered why shell scripts often collapse like a bridge made of popsicle sticks when presented with a file containing spaces? Now you know!

If you're truly unhinged, you can leverage the IFS to perform rudimentary parsing. However, I would strongly advise against it. Take a minute to read through this meticulous Stack Overflow answer regarding string tokenization. Of the nine solutions presented, eight were incorrect (that's 88.8%!), all in incredibly subtle ways. Do you feel it now—the torment of being unable to accomplish basic programming tasks in shell scripts? Are you beginning to understand why this godforsaken language makes me so irrationally upset?!

Shot of Marie Kondo captioned: 'this one does not spark joy'. — POSIX was a mistake.

The only winning move is not to play

I have spent an embarrassing amount of my life writing and debugging POSIX shell scripts, yet it still regularly surprises me, as if it's mocking me for trying to comprehend it. At the time of writing, the Bash Pitfalls page listing "common mistakes made by bash users" contains 64 (sixty-four) entries!

POSIX shell is not my favourite language.^[1]

Sometimes, foreshadowing can be relatively obvious.

Introducing Powershell

This brings us to Powershell. In concept, I quite like Powershell; it seeks to be a shell scripting language that doesn't make you reconsider your life choices. Powershell was released in 2006, meaning it's equipped with almost 20(!) years of hindsight from POSIX shell. Similar to POSIX shell, it provides little functionality by itself, but unlike POSIX shell, it conveniently exposes existing .NET APIs removing much of wheel re-invention that goes on in POSIX shell scripts.

One massive advantage Powershell has over POSIX shell is the shift away from unstructured byte streams. When piping one command to another, structured .NET objects are passed instead. I cannot stress enough how much of an improvement is, being forced to use byte streams makes me feel like neanderthal.

It is with regret, my dear reader, that I inform you Powershell is just as bad in a variety of new and exciting ways. The rest of the post details my first-time user experience with Powershell.

Running a program

Do you know what shell scripts frequently do? Run programs. They run other programs.

To run programs in Powershell, you can either:

run it directly
use the call operator
or, use Start-Process

Let's do it the "Powershell way" and use Start-Process:

Start-Process -Wait -FilePath 'my\amazing\custom\app.exe' -ArgumentList 'hello' 'C:\this has spaces\foo.txt' 'another parameter' 'wow so many parameters!' "-yak-type=$yak_type"

Wait, how do I break this line?

Hmm, this line is getting rather long. Now that I think about it, all the examples I see of Powershell code run off the page... I wonder why?

Anyway, let's make this more legible by splitting the command over multiple lines.

To split lines in Powershell, you can either:

use a backtick:

# The space before the backtick is required. Why?
Get-ChildItem -Recurse `
    -Filter *.jpg

use a pipe:
```
Get-ChildItem |
    Select Name,Length
```

Oh, by the way, both methods are finnicky and prone to breaking code.

Wait, why's my array being coerced?

I wasn't able to get either line break methods working, so let's sidestep this nonsense by defining the arguments beforehand.

# You can define arrays like this:
[string[]] $args = 'hello', 'C:\this has spaces\foo.txt', 'another parameter', 'wow so many parameters!', "-yak-type=$yak_type"

# Or, like this:
[string[]] $args = @(
    # Using a trailing comma is an error. Why?
    'hello'
    'C:\this has spaces\foo.txt'
    'another parameter'
    'wow so many parameters!'
    "-yak-type=$yak_type"
)

It's a little awkward, but I can deal with it.

After inspecting the Start-Process docs, we find that -ArgumentList expects a string[]. Luckily for us, args is also a string[]! Given our variable and the parameter have the same type, any sane person would expect this to work:

Start-Process -ArgumentList $args # ...

...but it doesn't work.

Worse still, the command succeeds but behaves like you don't expect: it substitutes $args with the first element of the array. The conniption induced by this behaviour made me vividly recall my past POSIX shell trauma, which is generally something to be avoided when designing software.

How do we solve this? Well, if you'd read the docs properly you absolute buffoon, you'd have noticed the dedicated section explaining this footgun. Here's what you're supposed to do:

Start-Process -ArgumentList (,$args) # ...

In this example, $args is wrapped in an array so that the entire array is passed to the script block as a single object.

Very intuitive, thank you Powershell!

Wait, where's my output?

If you run a program via Start-Process, you won't get any output in your terminal. Here's what you're supposed to do:^[2]

$pinfo = New-Object System.Diagnostics.ProcessStartInfo
$pinfo.FileName = "ping.exe"
$pinfo.RedirectStandardError = $true
$pinfo.RedirectStandardOutput = $true
$pinfo.UseShellExecute = $false
$pinfo.Arguments = "localhost"
$p = New-Object System.Diagnostics.Process
$p.StartInfo = $pinfo
$p.Start() | Out-Null
$p.WaitForExit()
$stdout = $p.StandardOutput.ReadToEnd()
$stderr = $p.StandardError.ReadToEnd()
Write-Host "stdout: $stdout"
Write-Host "stderr: $stderr"
Write-Host "exit code: " + $p.ExitCode

I have nothing to add, this should speak for itself. Next section!

Wait, why doesn't anything work?

The examples in the previous section involving $args don't work, I lied. You see, I was simultaneously suffering through the two aforementioned issues and a third, secret issue!

Here's the error:

  10 |  [string[]]$args =  @(
     |  ~~~~~~~~~~~~~~~
     | Cannot assign automatic variable 'args' with type 'System.Object[]'

If you think this is an good example of error messages, you need to stop settling for less in life before it's too late. You matter, and you deserve better than this.^[3]

My sanity is rapidly deteriorating; what's the problem this time?! Well, you see, $args is an "automatic variable" that Powershell defines, and it contains the arguments passed to the script or function, obviously!

Why is $args in the global namespace? Why am I allowed to mutate it? Why does it use type erasure? Why is this error message so bad?

Closing thoughts

Throughout this ordeal, I kept asking the same question: why?

Why was this made? Why was this made this way? What was happening that created the circumstances that led to this being made the way that it was made?^[4]

In the future, I'd rather deal with the hassle of installing nushell on every CI runner than continue to subject myself to the depraved machinations of POSIX shell and Powershell. I encourage you to do the same, we all deserve better.

Powershell is not my favourite language.^[1]

Shell is not my favourite language

Why use shell scripts?

The veritable mess that is POSIX

I want to get off Mister POSIX's Wild Ride

Byte streams are the Any of UNIX

Sisyphean-oriented programming

Infernal fortress of suffering

The only winning move is not to play

Introducing Powershell

Running a program

Wait, how do I break this line?

Wait, why's my array being coerced?

Wait, where's my output?

Wait, why doesn't anything work?

Closing thoughts

Byte streams are the `Any` of UNIX