Hints for Writing Unix Tools (2014)

matheusmoreira · on April 2, 2022

Standard error stream is such an unfortunate name. It's not just for errors and diagnostics. Any non-output data should be sent there, especially messages the user is supposed to see. It should have been called standard user stream.

jolmg · on April 2, 2022

Both outputs are for the user, even if the user redirects them. Standard user stream doesn't seem like a differentiating name.

Standard error stream makes more sense when you consider the Unix philosophy[1]: "Don't clutter output with extraneous information." It's not just about stdout. The more you output, the more you obscure the important data, and the more likely the user is just going to ignore it. Ideally, there should only be regular output and errors. The names are a good guideline for that.

When you're talking about stuff like progress indicators, you're already breaking from the mold, just like TUIs also do. You have little choice but to break Unix conventions and guidelines then. TUIs like vim don't output errors to stderr, you know. They include them in stdout.

This doesn't mean that stderr isn't a good name. It fits very well for regular Unix utilities that stick to the guidelines of the Unix philosophy.

[1] https://en.wikipedia.org/wiki/Unix_philosophy#Origin

speed_spread · on April 2, 2022

How about stdlog?

gjm11 · on April 2, 2022

The author gives an example where running a benchmark tool and identifying the output for the "fizzbuzz" benchmark is

    ./runbenchmarks | grep '^fizzbuzz'

with his preferred style of output and

    ./runbenchmarks | awk '/^Benchmark:/ { bench = $2}  bench=="fizzbuzz"'

with another. That's fair enough, but what he doesn't mention is that running the tool and interpreting its output is

    ./runbenchmarks

with the second style of output and

    ./runbenchmarks
    benchmarks --help
    # damn, it just lists the command-line options
    vim benchmarks.c
    # look through source code to find what the random numbers after the program name mean

with the first. The culprit here is the fact that his preferred style not only puts each benchmark's output on one line but also omits the "Time:" and "Alloc:" and "ns/op" and "bytes/op" which make the numbers generated actually mean something to a human being.

I think the correct answer here may be to have a command-line flag selecting between two kinds of output, one intended for humans to read and one intended for programs to parse. Or maybe for the output to look like

    fizzbuzz: 10 ns/op, 40 bytes/op

or

    fizzbuzz 10 ns/op 40 bytes/op

either of which is pretty easy to parse for both humans and computers. Or even

    fizzbuzz time 10 ns/op
    fizzbuzz alloc 40 ns/op

which lets you see all the results for the fizzbuzz benchmark with the same grep as above, and all benchmarks' time results with another almost-as-simple grep, at the cost of a little redundancy in the output.

Higher-level message: when you have two competing requirements (make things readable for humans, and make things parseable for programs), before just picking one as The One That Matters consider whether maybe there's a way to get both.

mplanchard · on April 2, 2022

Yeah this was also my complaint. Personally I’d say just output the header. That’s what `tail -n +2` or whatever is for.

Another option is to detect when being piped to another program and not print the header in those cases, similar to how many programs do color, or print the headers to stderr

badsectoracula · on April 2, 2022

TBH IMO this sounds like a documentation problem, so, e.g.

    # damn, it just lists the command-line options
    vim benchmarks.c
    # look through source code to find what the random numbers after the program name mean

You could also look the manpage or whatever documentation it has :-P.

kaapipo · on April 2, 2022

Probably best would be to print the column labels in stderr

devnull255 · on April 2, 2022

These are still good guidelines. I would propose some additional guidelines, that as a tool developer myself, will make the tool more accessible and useful.

Provide custom format options, such as --format-json or format--<x> to produce output in JSON or other popular formats.

Implement both short and long options (e.g., -f/--filename) consistent with other command line tools.

Implement a --verbose and/or --debug option to enable more detailed output when needed for troubleshooting.

Provide a --version option to display the tool's version and then exit.

Provide a --help option to display program usage and options and then exit.

Provide useful error messages that at minimum inform the user what went wrong when the program aborted.

As a corollary to the previous guideline, output noisy error output like stack traces, etc. when --verbose is used and an error is encountered.

grumbel · on April 2, 2022

> Output should be free from headers or other decoration.

One way to solve this nicely, that seems to be getting more common, is to use `isatty()` to check if the output is a terminal and if so print with decorations, otherwise leave them away.

`ls` for example will output unprintable characters, even just space, in quoted form on a terminal:

    $ touch 'foo bar'
    $ ls
    'foo bar'

But when redirected, it will output the raw value:

    $ ls | cat -
    foo bar

badsectoracula · on April 2, 2022

> One way to solve this nicely, that seems to be getting more common, is to use `isatty()` to check if the output is a terminal and if so print with decorations, otherwise leave them away.

This is not a good idea because it can lead to surprising the user (ls's behavior is actually bad from that perspective). For example you run a program

    $ foo
    ID Thing   What
    4  Cat     Mews
    2  Dog     Woofs
    5  Canary  Tweets

...then you run the result through sort and trying to avoid the header...

    $ foo | tail -n +2 | sort

...except instead of the expected result you get...

    2  Dog     Woofs
    5  Canary  Tweets

...because the program tried to be smart instead of consistent. This is also against the GNU guidelines as mentioned elsewhere.

grumbel · on April 2, 2022

The little surprise is worth the general improvement in usability (e.g. colors, progress bars, filenames you can copy&paste, terminal not getting corrupted by escape sequences, etc). It also makes it clear that the terminal output is for user interaction, so programs no longer have to be both UI and API at the same time, they can focus on one or the other, making both much better and cleaner as a result.

> ...then you run the result through sort and trying to avoid the header...

The much more common scenario would be doing `foo | sort` and then ending up with random header text in the sorted data. Few people will add a `tail` the first time they type that command or remember do it every time they use it interactively. With `isatty()` it behaves as the user expects it right from the start.

g0xA52A2A · on April 2, 2022

> `ls` for example will output unprintable characters, even just space, in quoted form on a terminal:

Pedantic quibble GNU does this for special characters, as in your example the space is very much printable. Specifically this became default with coreutils 8.25 https://www.gnu.org/software/coreutils/quotes.html

benibela · on April 2, 2022

I noticed that last week and was quite surprised by it.

I was writing a script to get the permissions of all files. I wrote something like ls -l | grep -oE '^[^ ]+' and ran it in my home directory for testing. And then I was surprised that the output was wrong. Turned out I had files with \n in their name there and ls was printing them on two lines which confused the grep. (I still used that script, since I did not have any \n files on the real system)

I was actually building an exam for a course involving shell scripting. A common question was, do something with all the files in the current directory, like grep them or delete them. The lecture notes said to use * for all those files, but then I realized rm * would not work in all possible cases. I spend like an hour to find a hopefully correct solution. However, the professor said, the students would never figure it out in an exam, and I should just put * as model solution. The shells is extremely brittle

grumbel · on April 3, 2022

When dealing with filenames one has to get used to always using '\0' separated output instead of newline separated output, as filenames in Linux can contain everything except '\0' and '/'. Luckily most tools are prepared for this and have options to either output '\0' separators or accept them as input.

Dealing with all the files in the current directory would look something like this (for demonstration, can be made shorter by using '-name' or '-regex' option from 'find'):

   find . -maxdepth 1  -type f -print0 | grep -z needle | xargs -n 1 -0 echo

Looping over '\0' separated output is possible as well with this:

   find . -print0 | while read -r -d $'\0' filename; do echo "$filename"; done

It stops being brittle when used correctly, but can take a bit getting used to and can get a little verbose.

benibela · on April 3, 2022

The one I found to delete all files was rm -- .[!.]* ..?* * 2>/dev/null

ElectricalUnion · on April 2, 2022

Well, if you're parsing ls then you're in a world of hurt no matter what you do anyways: https://mywiki.wooledge.org/ParsingLs

We should stop using ls as a part of a example pipe, it's usually a very poor example.

mkdirp · on April 2, 2022

On a related note, there was an article/website that talked about how to design the ux of a cli tool properly. E.g. how to design the arguments among other things.

I've been struggling to find it again. Does anyone remember what the article/site was called?

asicsp · on April 2, 2022

Probably https://clig.dev/ (discussed here: https://news.ycombinator.com/item?id=25304257)

mkdirp · on April 2, 2022

Aah, thank you so much! Very helpful!

teddyh · on April 2, 2022

Off the top of my bookmarks, I would suggest these:

• https://www.gnu.org/prep/standards/standards.html#Command_00...

• http://www.catb.org/~esr/writings/taoup/html/

• https://www.cons.org/cracauer/sigint.html

jmclnx · on April 2, 2022

One thing I like to do and have been doing for a very long time, is have an optional flag '-e file'.

This will redirect stderr to a file. I use it because:

1. I used tcsh and older systems csh

2. works great on DOS (like FreeDOS) and Microsoft Windows

I wish more people (ie: large companies) followed this guide, but in this day and age of mega-builds, I am afraid those days are over

Cockbrand · on April 2, 2022

Out of curiosity: how's this different from the following?

  command 2>outfile.txt

Or is the intention just to be clearer/more obvious?

jmclnx · on April 2, 2022

tcsh/csh cannot redirect stderr, there is no '2>'. There is a way but hard for me to remember

legalcorrection · on April 2, 2022

Between every program being modified to work around your old shell or you switching to a modern shell like everyone else, which one do you think sounds more reasonable?

jmclnx · on April 3, 2022

UN*X has many shells for a reason, if you do not understand that then not much I can say to you.

Also I never said people should change their programs, I do that on the ones I write for my purposes.

jmclnx · on April 5, 2022

I cannot edit, but I can see how you could have thought I was advocating '-e', I was not :)

I was advocating the "Hints" article

gumby · on April 2, 2022

You can do this straight from the shell without needing every program to be modified.

I understand there are less powerful shells, as you list, but if for some reason you can’t use a more modern shell, aren’t you even less likely to be able to install updated apps?

NateEag · on April 2, 2022

As one who has worked on ancient servers:

Yes, installing updated third-party tools is very unlikely to happen.

Installing updates to your own tools is likely a regular occurrence.

So, your development style evolves to fit those constraints.

3836293648 · on April 2, 2022

Looks over at the Nix tools that just added a bunch of success messages because users found silence confusing

forty · on April 2, 2022

Maybe a good middle ground would be to use isatty to check if you should display something or not?

teddyh · on April 2, 2022

The GNU Coding Standards recommends not doing that:

“Likewise, please don’t make the behavior of a command-line program depend on the type of output device it gets as standard output or standard input. Device independence is an important principle of the system’s design; do not compromise it merely to save someone from typing an option now and then. (Variation in error message syntax when using a terminal is ok, because that is a side issue that people do not depend on.)

If you think one behavior is most useful when the output is to a terminal, and another is most useful when the output is a file or a pipe, then it is usually best to make the default behavior the one that is useful with output to a terminal, and have an option for the other behavior. You can also build two different versions of the program with different names.

There is an exception for programs whose output in certain cases is binary data. Sending such output to a terminal is useless and can cause trouble. If such a program normally sends its output to stdout, it should detect, in these cases, when the output is a terminal and give an error message instead. The -f option should override this exception, thus permitting the output to go to the terminal.

Compatibility requires certain programs to depend on the type of output device. It would be disastrous if ls or sh did not do so in the way all users expect. In some of these cases, we supplement the program with a preferred alternate version that does not depend on the output device type. For example, we provide a dir program much like ls except that its default output format is always multi-column format.”

— https://www.gnu.org/prep/standards/standards.html#User-Inter...

ElevenLathe · on April 2, 2022

Interesting that a major GNU util (ls) does exactly the opposite and prints differently (multiple entries on a line vs one line per entry) in terminal vs a pipe.

teddyh · on April 2, 2022

The last paragraph I quoted explicitly mentions ls as doing that for compatibility reasons; i.e. Unix did it that way, and GNU should be compatible.

ghostpepper · on April 2, 2022

> You can also build two different versions of the program with different names.

Do any programs actually do this? Sounds like the biggest headache of all

ls_terminal vs ls_pipeable | grep ...

enriquto · on April 2, 2022

No. That would be confusing.

Silent success is a basic tenet of unix and must not be relinquished.

mplanchard · on April 2, 2022

At least they go to stderr I think

switch007 · on April 2, 2022

mail admins must be delighted about that (cron…)

PeterWhittaker · on April 2, 2022

I wouldn’t say that this is terrible advice, just naive and limited. The only thing I almost completely agree with to allow your program to be a filter; my disagreement comes from the fact that all pipelines need a starting point. ls is a good example. The one thing I agree with completely is return code, which is especially useful when combined with a -q option (cf grep, below).

Headers are useful for humans. Don’t want them? Have -H/+H options, with the default based on whether you will be outputting most often to a human or a filter.

Space-separated output makes sense IFF fields will NOT contain spaces. Not sure? Have a -d option, like cut does, to allow the user to specify the separator.

Verbosity can be wonderful and wonderfully bad. Consider having -v, possibly multiple -v’s, like ssh, and -q, like grep, to control the exact level.

In other words, don’t take simplistic advice, certainly not this advice. Examine the behaviour of flexible commands like grep and cut and tr and determine for yourself which options are best suited to your program.

Re interactivity: if a program is used infrequently, interactivity can be good. No argv[1]? Prompt the user.

My build scripts are completely automatic, but they are run frequently (sometimes multiple times a day) by many people. Over time, we’ve gotten a pretty good handle on what we need them to do.

My addlabel and makeiso scripts, OTOH, prompt with reasonable defaults because they are run far less often and use commands that are less familiar.

Consider first the needs of the users, and do not assume they know as much as you. Or as little.

shcheklein · on April 2, 2022

For those who are interested in this topic, there are a few other good summaries / guidelines that we found useful:

- https://clig.dev/

- https://primer.style/cli/

efrecon · on April 2, 2022

This is really spot on! Thanks for summarising it all. Especially, I know that jq is almost ubiquitous, until it's not (for example: not in the busybox, nor alpine default docker images). So please: avoid JSON, or at least provide an option to choose the output format and support an alternative to JSON, YAML, whatever.

ElectricalUnion · on April 3, 2022

I like JSON in that it's structually reasonably stable; even under hostile content, it will stay the same. That can't be said of other formatting, that might get confused when newline, spaces, nulls and other "hostile" separators are used.

That's specifically one of the reasons why you never can reliably parse ls in even remotely potentially not pristine and safe environments.

ducktective · on April 2, 2022

Can someone mention an actual system that supports only busybox and for example `jq` can't be installed on it?

Embedded systems? What exact models?