Shell is not my favourite language
Powershell wasted 10 hours of my life, so I was compelled to write this self-therapy session masquerading as a blog post.
Before you discount my experience as the inane ravings of a UNIX fanboy, please understand that you couldn't be further from the truth. I sincerely tried to be fair and gave Powershell as much benefit of the doubt as I could muster.
Disclaimer aside, let's begin the descent into madness.
Why use shell scripts?
Generally, shell scripts are used when a task needs to be automated, but it's not worth pulling out a "real" (see: time-consuming) programming language. Given this, it stands to reason that shell scripting languages should be quick and easy to both learn and write. If it's easier to write a program than it is to write a shell script, it's failed to serve its only purpose.
Foreshadowing is a narrative device in which suggestions or warnings about events to come are dropped or planted.
The veritable mess that is POSIX
POSIX shell is, like most things descending from UNIX, poorly designed and incredibly unpleasant to use. Granted, once you've developed a mental model of how it works (via abstract causality and pain), you eventually learn how to kludge together scripts relatively quickly. That is to say, after you've received enough third-degree burns from touching the proverbial hot-plate, you learn what is and is not a good idea.
Over time, you learn to work around (or simply avoid) its many warts, namely: error-prone string handling and its compulsion to violate the principle of least astonishment. POSIX shell is proof that "worse is better" is real and we can't have nice things. It's a glue used to create precarious and unholy software obelisks, the lingering shadow of undead relics 3 decades past their expiration date.
Did you know "bash" is short for "bourne again shell"? Personally, I would prefer if these shells stayed dead. Unfortunately, I'm lacking a weapon with sacred affinity, so I'll have to settle for asking nicely.
I want to get off Mister POSIX's Wild Ride
Let's set the tone for the rest of this article with a test: how
does POSIX shell implement features such as constants (true
and false
) and comparing values?
Was your answer: running a magic executable?!
❯ which [
/bin/[
❯ which true
/usr/bin/true
You see, when you compare a value like this:
if [ -z "$foo" ]; then
# ┬
# ╰─── this accursed symbol haunts my nightmares
...the shell runs the executable literally called [
.
As I alluded to before, [
is actually an
alias for test
. On macOS, they are separate
executables, which checks out according to
POSIX:
The
test
and[
utilities may be implemented as a single linked utility which examines the basename of the zeroth command line argument to determine whether to behave as the test or[
variant.
In any other context, this would be reviled as the affront to
computing that it is, but since it was standardised
[checks watch] 38 years ago it flies under the radar. I
have a feeling the reception would have been different if
shell.js
tried to pull this.
Also, I lied—they're only sometimes executables! As per POSIX:
An implementation may choose to make any utility a built-in.
...meaning in that case the utility is included in the shell, removing the need for said magic executables. Why? Why have this many failure modes?! Who thought this was a good idea? After wracking my brain, the only justification for its inclusion is maybe to accommodate systems with so little storage that splitting utilities into separate files would be convenient.
Byte streams are the Any
of UNIX
POSIX shell lets you combine multiple commands together to form a
"pipeline". This is accomplished by using the pipe operator
(|
), which connects the
stdout
stream of one program into the
stdin
stream of another.
❯ echo 'filename' | xargs cat
howdy
-
echo
sendsfilename\n
tostdout
-
xargs
receivesfilename\n
fromstdin
due to the pipe -
xargs
runscat filename
, sendinghowdy\n
(the contents offilename
) tostdout
-
the shell forwards
stdout
to the terminal, sohowdy
appears on my screen
╭─ stdin ╭─ stdin ╭─ stdin
│ │ │
echo ─┬───▶─┴─ xargs ─┬───▶─┴─ shell ─┬───▶─┴─ terminal
│ │ │
stdout ─╯ stdout ─╯ stdout ─╯
In contrast, if I ran cat
in a shell:
❯ cat
boop\n
boop
╭─ stdin ╭─ stdin ╭─ stdin ╭─ stdin
│ │ │ │
keyboard ─┬───▶─┴─ shell ─┬───▶─┴─ cat ─┬───▶─┴─ shell ─┬───▶─┴─ terminal
│ │ │ │
stdout ─╯ stdout ─╯ stdout ─╯ stdout ─╯
-
keyboard sends
boop\n
tocat
viastdin
-
cat
forwardsstdin
tostdout
-
the shell forwards
stdout
to the terminal, soboop
appears on my screen twice
This is incredibly powerful... in theory. In practice it's a wilderness, because streams are unstructured. You're not sending text through streams, you're sending bytes, since text would imply encoding.
Do I mean encoding:
- in a textual sense: are these bytes encoded in Windows-1250 or Shift JIS?
- in a structural sense: are these bytes JSON or MessagePack?
Both! In practice, lacking a standard for structuring data means relying on guessing character encodings and ad-hoc conventions. Please enjoy this colourful account from an MPV maintainer regarding how POSIX handles locales. I have a feeling POSIX is not their favourite standard.
If you hear "plain-text" used unironically please scold the prepetrator on my behalf. Text is
(bytes, encoding)
, omitting encoding means you just have bytes. Thank you.
Here's another example:
❯ : ls
╭───┬───────────────┬──────┬───────┬─────────────╮
│ # │ name │ type │ size │ modified │
├───┼───────────────┼──────┼───────┼─────────────┤
│ 0 │ can't wake up │ file │ 666 B │ 2 hours ago │
╰───┴───────────────┴──────┴───────┴─────────────╯
❯ find . -type f | xargs ls -al
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
Oh no, it's busted.
Our pipeline breaks because the filename contains a quote. This is
fine, since filenames obviously never contain quotes.
Here's one solution, tell both find
and
xargs
to use an ASCII NUL
byte as the
delimiter:
❯ find . -type f -print0 | xargs -0 ls -al
-rw-r--r-- 1 opeik staff 666 Jul 17 21:48 "./can't wake up"
This is one of the nicer solutions. I hope whatever
you're piping find
into supports it, or you'll have
to add yet more slop to handle it!
Sisyphean-oriented programming
Don't get me started about error handling. Praying nothing goes wrong is a better use of your time than trying to wrangle any semblance of reliability from this cacophony of undercooked and mismatched ideas. Unhandled errors don't stop the script. Pipelines mask errors by always returning the exit code of the last command. Subshells are utterly broken. Evaluating undefined variables results in an empty string.
In 2015, Steam for Linux had a bug that nuked your root drive under certain conditions. How did this happen, you ask? It was a shell script:
# figure out the absolute path to the script being run a bit
# non-obvious, the ${0%/*} pulls the path out of $0, cd's into the
# specified directory, then uses $PWD to figure out where that
# directory lives - and all this in a subshell, so we don't affect
# $PWD
STEAMROOT="$(cd "${0%/*}" && echo $PWD)"
# Scary!
rm -rf "$STEAMROOT/"*
As previously mentioned, undefined variables are expanded to empty
string, and in certain situations $STEAMROOT
happens
to be undefined—so this happens:
rm -rf /*
Oh no, it's busted.
Infernal fortress of suffering
Do you know how to spot an traumatised experienced
shell programmer? It's easy! A word of power exists which, upon
being uttered, will cause immediate and severe psyche damage to
all shell programmers in the vicinity.
To set the mood, start playing this background music, then come back to this article.
Okay, deep breaths... the word of power is "IFS". The IFS (internal field seperator) is a value controlling how the shell handles word spliting. The semantics are insane—an example is required to truly envision the depravity contained within those three miserable letters.
Let's write a shell script that prints the size of all files in the current directory:
for FILE in *; do
du -h $FILE
done
...and he's what our directory looks like:
❯ touch wake me up
❯ ls
╭───┬──────┬──────┬──────┬──────────────╮
│ # │ name │ type │ size │ modified │
├───┼──────┼──────┼──────┼──────────────┤
│ 0 │ wake │ file │ 3 B │ a minute ago │
│ 1 │ me │ file │ 6 B │ a minute ago │
│ 2 │ up │ file │ 9 B │ a minute ago │
╰───┴──────┴──────┴──────┴──────────────╯
Readers versed in shell have likely already begun involuntarily clenching. It's too late now. All you can do is watch helplessly while all that you love burns.
Let's try running it!
❯ ../ifs.sh
4.0K wake
4.0K me
4.0K up
Hmm... it seems to work? For no particular reason, let's add a filename containing spaces:
❯ touch "can't wake up"
❯ ls
╭───┬───────────────┬──────┬───────┬──────────╮
│ # │ name │ type │ size │ modified │
├───┼───────────────┼──────┼───────┼──────────┤
│ 0 │ can't wake up │ file │ 666 B │ now │
╰───┴───────────────┴──────┴───────┴──────────╯
❯ ../ifs.sh
du: cannot access "can't": No such file or directory
du: cannot access 'wake': No such file or directory
du: cannot access 'up': No such file or directory
Oh no, it's busted.
The workaround is to always quote variables (and string literals):
for FILE in *; do
du -h "$FILE"
done
❯ ../ifs.sh
4.0K can't wake up
As for why, recall that IFS controls how word splitting is
performed. Given the
default IFS
' \t\n'
(that's space, tab, newline),
can't wake up
is being split into three words:
can't
, wake
, up
. This
happens transparently (often without the programmer realising),
and causes one filename to be treated as if it was three.
Have you ever wondered why shell scripts often collapse like a
bridge made of popsicle sticks when presented with a file
containing spaces? Now you know!
If you're truly unhinged, you can leverage the IFS to perform rudimentary parsing. However, I would strongly advise against it. Take a minute to read through this meticulous Stack Overflow answer regarding string tokenization. Of the nine solutions presented, eight were incorrect (that's 88.8%!), all in incredibly subtle ways. Do you feel it now—the torment of being unable to accomplish basic programming tasks in shell scripts? Are you beginning to understand why this godforsaken language makes me so irrationally upset?!
The only winning move is not to play
I have spent an embarrassing amount of my life writing and debugging POSIX shell scripts, yet it still regularly surprises me, as if it's mocking me for trying to comprehend it. At the time of writing, the Bash Pitfalls page listing "common mistakes made by bash users" contains 64 (sixty-four) entries!
POSIX shell is not my favourite language.[1]
Sometimes, foreshadowing can be relatively obvious.
Introducing Powershell
This brings us to Powershell. In concept, I quite like Powershell; it seeks to be a shell scripting language that doesn't make you reconsider your life choices. Powershell was released in 2006, meaning it's equipped with almost 20(!) years of hindsight from POSIX shell. Similar to POSIX shell, it provides little functionality by itself, but unlike POSIX shell, it conveniently exposes existing .NET APIs removing much of wheel re-invention that goes on in POSIX shell scripts.
One massive advantage Powershell has over POSIX shell is the shift away from unstructured byte streams. When piping one command to another, structured .NET objects are passed instead. I cannot stress enough how much of an improvement is, being forced to use byte streams makes me feel like neanderthal.
It is with regret, my dear reader, that I inform you Powershell is just as bad in a variety of new and exciting ways. The rest of the post details my first-time user experience with Powershell.
Running a program
Do you know what shell scripts frequently do? Run programs. They run other programs.
To run programs in Powershell, you can either:
- run it directly
- use the call operator
-
or, use
Start-Process
Let's do it the "Powershell way" and use
Start-Process
:
Start-Process -Wait -FilePath 'my\amazing\custom\app.exe' -ArgumentList 'hello' 'C:\this has spaces\foo.txt' 'another parameter' 'wow so many parameters!' "-yak-type=$yak_type"
Wait, how do I break this line?
Hmm, this line is getting rather long. Now that I think about it, all the examples I see of Powershell code run off the page... I wonder why?
Anyway, let's make this more legible by splitting the command over multiple lines.
To split lines in Powershell, you can either:
-
use a backtick:
# The space before the backtick is required. Why? Get-ChildItem -Recurse ` -Filter *.jpg
-
use a pipe:
Get-ChildItem | Select Name,Length
Oh, by the way, both methods are finnicky and prone to breaking code.
Wait, why's my array being coerced?
I wasn't able to get either line break methods working, so let's sidestep this nonsense by defining the arguments beforehand.
# You can define arrays like this:
[string[]] $args = 'hello', 'C:\this has spaces\foo.txt', 'another parameter', 'wow so many parameters!', "-yak-type=$yak_type"
# Or, like this:
[string[]] $args = @(
# Using a trailing comma is an error. Why?
'hello'
'C:\this has spaces\foo.txt'
'another parameter'
'wow so many parameters!'
"-yak-type=$yak_type"
)
It's a little awkward, but I can deal with it.
After inspecting the
Start-Process
docs, we find that -ArgumentList
expects a
string[]
. Luckily for us, args
is also a
string[]
! Given our variable and the parameter have
the same type, any sane person would expect this to work:
Start-Process -ArgumentList $args # ...
...but it doesn't work.
Worse still, the command succeeds but behaves like you
don't expect: it substitutes $args
with the first
element of the array. The conniption induced by this behaviour
made me vividly recall my past POSIX shell trauma, which is
generally something to be avoided when designing
software.
How do we solve this? Well, if you'd read the docs properly you absolute buffoon, you'd have noticed the dedicated section explaining this footgun. Here's what you're supposed to do:
Start-Process -ArgumentList (,$args) # ...
In this example,
$args
is wrapped in an array so that the entire array is passed to the script block as a single object.
Very intuitive, thank you Powershell!
Wait, where's my output?
If you run a program via
Start-Process
, you won't get any output in your terminal. Here's what you're
supposed to do:[2]
$pinfo = New-Object System.Diagnostics.ProcessStartInfo
$pinfo.FileName = "ping.exe"
$pinfo.RedirectStandardError = $true
$pinfo.RedirectStandardOutput = $true
$pinfo.UseShellExecute = $false
$pinfo.Arguments = "localhost"
$p = New-Object System.Diagnostics.Process
$p.StartInfo = $pinfo
$p.Start() | Out-Null
$p.WaitForExit()
$stdout = $p.StandardOutput.ReadToEnd()
$stderr = $p.StandardError.ReadToEnd()
Write-Host "stdout: $stdout"
Write-Host "stderr: $stderr"
Write-Host "exit code: " + $p.ExitCode
I have nothing to add, this should speak for itself. Next section!
Wait, why doesn't anything work?
The examples in the previous section involving
$args
don't work, I lied. You see, I was
simultaneously suffering through the two aforementioned issues and
a third, secret issue!
Here's the error:
10 | [string[]]$args = @(
| ~~~~~~~~~~~~~~~
| Cannot assign automatic variable 'args' with type 'System.Object[]'
If you think this is an good example of error messages, you need to stop settling for less in life before it's too late. You matter, and you deserve better than this.[3]
My sanity is rapidly deteriorating; what's the problem this time?!
Well, you see, $args
is an
"automatic variable"
that Powershell defines, and it contains the arguments passed to
the script or function, obviously!
Why is $args
in the global namespace? Why am I
allowed to mutate it? Why does it use type erasure? Why is this
error message so bad?
Closing thoughts
Throughout this ordeal, I kept asking the same question: why?
Why was this made? Why was this made this way? What was happening that created the circumstances that led to this being made the way that it was made?[4]
In the future, I'd rather deal with the hassle of installing
nushell
on every CI runner than continue to subject myself to the depraved
machinations of POSIX shell and Powershell. I encourage you to do
the same, we all deserve better.
Powershell is not my favourite language.[1]