Hints for writing Unix tools

hoggle · on Oct 21, 2014

“One thing well” misses the point: it should be “One thing well AND COMPOSES WELL”

If the implementation isn't respecting The Rule of Composition it's actually not adhering to the Unix philosophy in the first place. The tweet is referring to one of Doug McIlroy's (one of the Unix founders, inventor of the Unix pipe) famous quotes:

"This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface."

Pure beauty, but it's almost too concise a definition if you haven't experienced the culture of Unix (many years of usage / reading code / writing code / communication with other followers). ESR's exhaustive list of Unix rules in plain English might be a better start for the uninitiated (among which one will find the aforementioned Rule of Composition).

For all those seeking enlightenment, go forth and read the The Art of Unix Programming:

https://en.wikipedia.org/wiki/The_Art_of_Unix_Programming

17 Unix Rules:

https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E...

jzwinck · on Oct 21, 2014

Here's one more tip: did you ever notice that "ls" displays multiple columns, but "ls | cat" prints only one filename per line? Or how "ps -f" truncates long lines instead of wrapping, while "ps -f | cat" lets the long lines live?

You can do it too, and if you're serious about writing Unix-style filter programs, you will someday need to. How do you know which format to write? Call "isatty(STDOUT_FILENO)" in C or C++, "sys.stdout.isatty()" in Python, etc. This returns true if stdout is a terminal, in which case you can provide pretty output for humans and machine-readable output for programs, automatically.

dap · on Oct 21, 2014

IMO, this is an anti-pattern. It's violates the principle of least surprise. (How come I see X when I run the command, but I can't grep for X in its output? How come it works when I run it from my interactive shell, but it's broken when I run it from a script? And things like that.)

teddyh · on Oct 21, 2014

Indeed, the GNU Coding Standards explicitly argues against doing that:

“please don’t make the behavior of a command-line program depend on the type of output device it gets as standard output or standard input.”¹

① https://www.gnu.org/prep/standards/standards.html#User-Inter...

tieTYT · on Oct 21, 2014

Interesting. Doesn't git (a pretty new tool) violate this? I think running log from the terminal pauses per page of output but if you pipe it to something it pipes all the content.

dkuntz2 · on Oct 21, 2014

I think it's just piping it to another program. It's still giving you the same output, but sending you to a program meant for humans to be able to read text in a console, instead of just printing it all out.

That said, I'm not entirely sure which git pipes to.

burke · on Oct 21, 2014

I think it depends what sort of things you use it for. I often use it to switch on or off ANSI colourization, which doesn't really violate the principle of least surprise.

When used sparingly and thoughtfully, I've never personally had an issue with it.

chriswarbo · on Oct 21, 2014

You may not have issue with the sort of things you use it for, but others might.

For example, I run shells in Emacs and have had to tweak loads of shell scripts written by colleagues to fix their poorly-implemented colourisation. It's useful to know when a test has failed; it's not so useful to have the whole terminal set to white text on a pale pink background.

One day I couldn't SSH into our servers from Emacs. It turned out somebody had edited .bashrc for the admin user to make the bash prompt blue. Emacs' TRAMP process was looking for a prompt ending in "$" or "#", not "$\[\033[0m\]", so it didn't realise the connections were successful.

There are two ways of handling this: we can blame the source of the bug (the person adding the colours incorrectly, or the assumption-loaded TRAMP regex), but there will always be more bugs in situations we'd never think of. Alternatively, we can avoid being 'too clever', and instead aim for consistency and least surprise.

gknoy · on Oct 21, 2014

As much as I love Emacs, that sounds like it's a bug to assume that one's prompt will follow a convention (ends-in-$). The convention is useful and good, but it seems strange to blame someone for breaking your tool's expectations when they had made a valid prompt.

philh · on Oct 21, 2014

Are you suggesting that colored prompts violate the rules of consistency and least surprise?

(Actually, if you are suggesting that, I'm not going to disagree. But I am going to say that if so, those rules don't apply in the case of colored prompts, because colored prompts are useful.)

chriswarbo · on Oct 21, 2014

I suppose I'm suggesting that, aside from personal scripts, we shouldn't assume too much about who our users are and what they're trying to do. The principle of least power tells us to use the dumbest format that will work, eg. plain text.

Anything we add on top of that, eg. ANSI colour codes, will be useful to some but harmful to others. The tricky part is working out which of those categories the current user is in.

philh · on Oct 21, 2014

So is your proposed solution not to have colored prompts? (I vehemently disagree.) Or not to put them in a .bashrc? (I still disagree, but only strongly.) Or something else?

prakashk · on Oct 21, 2014

The main problem here is that a change that should have been done as part of an individual's customization was made to a shared configuration file.

wtbob · on Oct 21, 2014

Or, y'know, have coloured prompts but place the escape characters such that they don't mess with prompt-detection regexps.

philh · on Oct 21, 2014

To be precise, what you're suggesting is that we have prompts which are allowed to be colored except for the $/# at the end, because you can't color those without following them by escape characters. And that prompts must have a $/# at the end.

I don't consider that an acceptable solution.

dap · on Oct 21, 2014

Good point, and it makes me even more convinced that all rules of thumb have important exceptions. In fact, I've also used tools that use ANSI colorization, and disable that when not talking to a tty, and wished that they wouldn't because I was piping them to "less -R" :) (At least that's a graceful failure mode, and it's pretty clear what probably happened even if one doesn't exactly understand the details.)

burke · on Oct 21, 2014

I've always thought it would be nice to have a utility like cat that I could pipe commands to, which would trick them into thinking all their streams were attached to a tty, so you could do "uses_colours | ttycat | less -R".

I'm sure it's possible, but you'd have to acquire a new pty and decide what termios settings you want. It's a nontrivial hack, I think.

I'm actually kind of surprised it's not in moreutils[1].

[1]: https://joeyh.name/code/moreutils/

EDIT: Hmm, maybe it's not possible. I can't figure out exactly how to do it, anyway.

EDIT again: Apparently `script` does this on Linux: http://monosnap.com/image/Qlig4CHmQgV9pxvSmndVUgMTU88Adz

EDIT again: Expect's `unbuffer` also does it: http://expect.sourceforge.net/example/unbuffer.man.html#toc

JdeBP · on Oct 27, 2014

> I've always thought it would be nice to have a utility like cat that I could pipe commands to, which would trick them into thinking all their streams were attached to a tty,

Daniel J. Bernstein wrote a "pty" package back around 1991 that did this. Version 4 of the package was published in 1992 to comp.sources.unix (volume 25 issues 127 to 135). It's still locatable on the World Wide Web.

Bernstein later updated this, around 1999, with a "ptyget" package that was more modular and that had the session management commands moved out of the toolset to elsewhere. The command from that package to do exactly what you describe is "ptybandage". There is also "ptyrun". Paul Jarc still publishes a fixed version of ptyget (that attempts to deal with the operating-system-specific pseudo-terminal device ioctls in the original) at http://code.dogmap.org./ptyget/ .

As a bonus feature for people who use source code, there are similar "ptybandage" and "ptyrun" scripts, for which you will need Laurent Bercot's execline tool (http://skarnet.org./software/execline/), in the source archive for the nosh package at http://homepage.ntlworld.com./jonathan.deboynepollard/Softwa... . These make use of the terminal-management tools in the nosh toolset.

With both of these, you should be able to run "ptybandage uses_colours_if_tty | less -R"

dnlrn · on Oct 21, 2014

It’s not possible in the way you think, because the two programs don’t know about each other. All the pipe character does is put the stdout into the stdin of the other program.

So this feature must actually be present in the shell and maybe it is. I’m no expert but maybe zsh already offers something like this?

userbinator · on Oct 21, 2014

I agree, especially for the behaviour from the parent:

"ps -f" truncates long lines instead of wrapping, while "ps -f | cat" lets the long lines live

How people usually discover what these commands do is by running them interactively, and if that results in some output being hidden vs being run noninteractively, then they have little reason to believe that it could yield more output than what they're used to seeing. I think a certain number of "ps" users don't know it can display full paths and commands, if they've only ever used it interactively.

edwintorok · on Oct 21, 2014

On a quick look through ps's manpage I couldn't find anything about this. Am I missing something?

mzs · on Oct 21, 2014

edwintorok · on Oct 21, 2014

It doesn't say that it'll do that automatically when run through a pipe.

mzs · on Oct 21, 2014

Yep, and the result is I have needless sprinklings of "www" in my shell scripts cause of habit. Technically I don't need it the moment I pipe into grep, but oh well ;) Anyway I personally dislike the SysV-like 'I need to add stuff like -aef' to get all that on-the-screen/into-grep vs the BSD-like it knows about TTY but I can convince it otherwise into stuf like 'less -S' - personal taste I guess.

edwintorok · on Oct 22, 2014

Actually I just found this in the 'ps' manual, it looks like the output width is actually undefined! "If ps can not determine display width, as when output is redirected (piped) into a file or another command, the output width is undefined (it may be 80, unlimited, determined by the TERM variable, and so on)."

vog · on Oct 21, 2014

Ideed. I've seen people over and over stumbling over this weird behaviour.

It may have some merits, but as a general advice this is definitely an anti-pattern.

Another example is "curl", where "curl URL >outfile" is chatty on stderr, while "curl URL" is quiet on stderr. That's very annoying for scripting, you easily forget to set "-s" in your scripts due to that behaviour.

emmelaich · on Oct 21, 2014

And yet .. programs are not just for composition. They have to behave sensibly for people.

I love that 'git log' outputs in a pager. 'svn log' by comparison is nuts.

stsp · on Oct 21, 2014

I like the default pager in 'git log', too.

Some work has been done on the svn side: http://svn.apache.org/repos/asf/subversion/branches/automati...

However, it's not on trunk yet because it's hard to find good defaults. An automatic pager makes sense for some commands, but not all -- and in a meritocratic development model this kind of thing can cause an endless discussion... I suspect we'll eventually merge the feature in a disabled by default state and allow users to enable it on a per-command basis.

Git's hard-coding of options passed to the pager has problems, too: https://mail-archives.apache.org/mod_mbox/subversion-dev/201...

aidos · on Oct 21, 2014

That's an interesting case. Should it not just output to stdout and you have to pipe it to less? If you wanted it to always pipe to less you could just set up an alias for it.

pjc50 · on Oct 21, 2014

I've just realised that the best solution to this (which I've never seen) is for the shell or terminal emulator to capture long output into a pager for you once it exceeds a certain length.

dkuntz2 · on Oct 21, 2014

I think there may be some flags in pagers to do this for you, but I can't remember for sure.

kec · on Oct 21, 2014

The git diff and log commands are primarily intended for human consumption. I'm not the one you were relying to, but I think it makes sense that git defaults to make that consumption easier, even if it isn't strictly "Unix". (You can also disable this behavior and get a more Unix-y interface by default via gitconfig or on a case by case basis with --no-pager).

aidos · on Oct 21, 2014

For sure, and in fact, I very happy with the current behaviour. I guess the pager is a case where it doesn't matter much if it's included or not since I guess it doesn't get in the way if you pipe to something else. Does it consume extra resources?

    git log | wc

vs

    git log --no-pager | wc

I'm sure it's neither here nor there in practice. More of a hypothetical question.

emmelaich · on Oct 23, 2014

I believe software should be written to to satisfy the common case and the common case is the non-expert.

gohrt · on Oct 21, 2014

This behavior of Unix programs is basically the same concept as Perl's "context" (list vs scalar), but even moreso.

easytiger · on Oct 21, 2014

Yea most commands don't do this though. Most commonly used for colouring output.

ls is a bit more than just a command though. It's part of the furniture and prehistoric.

oakwhiz · on Oct 21, 2014

I have run into this problem when trying to automate certain tasks on UNIX boxes.

Dealing with programs that act differently depending on their output device is very annoying.

pessimizer · on Oct 21, 2014

I despise it when commands do this - mysql -e results are formatted differently depending on whether the output is directed to the terminal or to a file.

burke · on Oct 21, 2014

Or, execute "/bin/[ -t 1" (or "test -t 1", or "[[ -t 1 ]]", or ...). This is handy in shellscripts (obviously), but also in languages like Go, which lack a builtin way to test whether stdout is a TTY. e.g.:

    cmd := exec.Command("/bin/[", "-t", "1")
    cmd.Stdout = os.Stdout
    isatty := nil == cmd.Run()

ksherlock · on Oct 21, 2014

Please just call Fstat and check Stat_t.Mode & S_IFCHR

burke · on Oct 21, 2014

Wow, that's a lot easier than the OS-dependent Syscall6 crap with termios from the "real" solutions I've seen.

grymoire1 · on Oct 21, 2014

As I recall, the original ls didn't have that feature.

Examining the characteristics of the output stream and changing behavior is another "rule" that is not mentioned often. Another example is buffering the output to a large block if sending to a pipe, but making it line-buffered if going to a terminal.

voltagex_ · on Oct 21, 2014

I'm not sure I agree with the "no JSON, please" remark. If I'm parsing normal *nix output I'm going to have to use sed, grep, awk, cut or whatever and the invocation is probably going to be different for each tool.

If it's JSON and I know what object I want, I just have to pipe to something like jq [1].

PowerShell takes this further and uses the concept of passing objects around - so I can do things like ls | $_.Name and extract a list of file names (or paths, or extensions etc)

[1]: http://stedolan.github.io/jq/

seanp2k2 · on Oct 21, 2014

+1 for jq. A lot of my work these days involves using web APIs in addition to "local" ones from CLI tools. xpath was good for dealing with XML stuff in a similar fashion, and HTML-XML-utils is an awesome suite of CLI things for slicing and dicing, if you're into that sort of thing: http://www.maketecheasier.com/manipulate-html-and-xml-files-...

grosskur · on Oct 21, 2014

xmlstarlet is also handy for command-line XML parsing:

http://xmlstar.sourceforge.net/

reirob · on Oct 21, 2014

Agree, had to use it extensively several years ago, it saved me so much time. But even then the development seemed to have stopped on this tool.

ygra · on Oct 21, 2014

I was also constantly thinking of PowerShell while reading that. A PowerShell-specific list of such advice would actually be rather short, given that most of the pitfalls are already avoided. I still firmly believe that PowerShell is actually a much more consistent Unix shell in that several concepts that ought to be separate are actually orthogonal. Let's see:

Input from stdin, output to stdout: Nicely side-stepped in that most cmdlets allow binding pipeline input to a parameter (either byval or byname, if needed). Filters are trivial to write, though.

Output should be free from headers: Side-stepped as well, in that decoration comes from the Format-* cmdlets that should only ever be at the end of a pipeline that's shown to the user.

Simple to parse and to compose: Well, objects. Can't beat parsing that you don't need to do.

Output as API: Well, since output is either a collection of objects or nothing (e.g. if an exception happened) there isn't the problem that you're getting back something unexpected.

Diagnostics on stderr: Automatic with exceptions and Write-Error. As an added bonus, warnings are on stream 2, debug output on stream 3 and verbose output on stream 4. All nicely separable if needed.

Signal failures with an exit status. Automatic if needed ($?), but usually exception handling is easier.

Portable output: That's about the only advice that would still hold and be valuable. E.g. Select-String returns objects with a Filename property which is not a FileInfo, but only a string; subject to the same restrictions that are mentioned in the article.

Omit needless dagnostics: Since those would be either on the debug or verbose stream they can be silenced easily, don't interfere with other things you care about and cmdlets have a switch for either of that, which means you only get that stuff if you actually care about it.

Avoid interactivity: Can happen when using the shell interactively, e.g.

    Home:> Remove-Item

    cmdlet Remove-Item at command pipeline position 1
    Supply values for the following parameters:
    Path[0]: _

However, this only ever happens if you do not bind anything to a parameter, which shouldn't happen in scripts. If you bind $null to a parameter, e.g. because pipeline input is empty or a subexpression returned no result, then an error is thrown instead, avoiding this problem.

Nitpick: You'd need ls | % Name or ls | % { $_.Name } there. Otherwise you'd have an expression as a pipeline element, which isn't allowed.

dec0dedab0de · on Oct 21, 2014

I have never used a computer that had access to Powershell, but in my new job I may have to do some small stuff to tie some systems together. I'm terrified of learning it because I don't want to be lured into some kind of lock-in scenario.

emodendroket · on Oct 21, 2014

Well, you can only use it on Windows. But I mean come on, "terrified?" Bash scripts are not too useful in Windows either.

pessimizer · on Oct 21, 2014

Spending a lot of time learning non-portable technologies isn't a decision to be taken lightly.

emodendroket · on Oct 21, 2014

Well, if it's part of your job I'd say it's not a decision at all...

pessimizer · on Oct 21, 2014

If you can't accomplish the same goals in a portable way. Of course, if you can put the knowledge to work as soon as you learn it, you're already starting to recoup your investment.

GhotiFish · on Oct 21, 2014

I can't believe you got downvoted for that.

GhotiFish · on Oct 21, 2014

cygwin does it's job well.

emodendroket · on Oct 21, 2014

cygwin works fine but if your task is, for instance, to configure Windows machines, it's not very useful.

InfiniteRand · on Oct 21, 2014

My issue with Powershell is it creates a distance from the scripting language and an ordinary executable, which makes it difficult to use just a little bit (and I suppose violates the rule about composability).

lstamour · on Oct 21, 2014

On Mac (OS NeXT, perhaps?), the convention seems to be that most commands produce human readable output by default, but you can pass a parameter like -x or -xml to get (usually) XML, machine-readable output, and with some tools, -j or -json will give you that format.

But then you've oddities like plutil behaving like gzip by modifying the file you specify rather than printing to stdout. You have to pass -o and a dash to get it to leave the file alone and instead reformat it to stdout. That one gets me every time. And I'm not alone: https://twitter.com/mavcunha/status/417823730505895936

But other parts are nice. For instance, "system_profiler -xml > MyReport.spx" generates XML that will open in the System Profiler GUI app. The XML generated is usually a Plist, since that's as native to the platform as the Registry might be to Windows...

Let me know when PowerShell gets tabs though. Maybe there's a Terminal.app port running in Mono somewhere? Seriously, I wish somebody would build a better terminal, maybe get creative with scrollback and chaining commands, and ship it in an OS... with tabs. ;-)

tfigment · on Oct 21, 2014

Not sure if its what you had in mind for Windows and tabs but I've found ConsoleZ [1] quite nice and allows powershell, cmd and others to have tabs.

[1] https://github.com/cbucher/console

lstamour · on Oct 21, 2014

Yeah, I know it's possible with adding, I've tried Console2 and ConEmu before. But I'd like it to just work out of the box with no extra software, as it does on Mac or Linux. Until Windows 10, that terminal hadn't changed since NT days...

ygra · on Oct 21, 2014

You can use the PowerShell ISE. Which has tabs and you can just hide the script pane to get only the console itself. Startup time is a bit hefty, but if you have tabs you probably create new tabs way more often than the tab container.

Especially for PowerShell the whole problem that Console2, etc. have is trivial, as you have an API to create a host application instead of relying on polling a hidden console window. The console host is just one of those hosts.

emodendroket · on Oct 21, 2014

I'm drawing a blank on the specifics but some things will work in the PowerShell prompt that won't work in the ISE.

ygra · on Oct 21, 2014

Programs that want access to the actual console. E.g. for interactive input, moving the cursor around, etc.

tracker1 · on Oct 21, 2014

I've been digging conemu[1]

[1]: https://code.google.com/p/conemu-maximus5/

michaelmior · on Oct 21, 2014

Yeah, I love jq. With a tool like that, I'd actually like to have an option for standard *nix tools to output JSON. Dealing with structured output would be far easier than counting which columns need to be extracted, using sed to split things, etc.

jfroma · on Oct 21, 2014

jq looks nice, I use another similar tool quite a lot [1].

[1]: https://github.com/trentm/json

osandov · on Oct 21, 2014

A nitpicky tip: --help is normal execution, not an error, so the usage information should be printed to stdout, not stderr (and it should exit with a successful status). Nothing is more annoying than trying to use a convoluted program with a million flags (which should have a man page in the first place) and piping --help into less with no success.

grymoire1 · on Oct 21, 2014

I hate it when a program has a huge --help output, and the man page is nearly empty, and says "see the --help option for more details." Things like examples, see also, etc. are very valuable to someone trying to figure out how to use a program....

foobarbaz1234 · on Oct 21, 2014

I am not so sure with that. Say, your program is used in a shell script and is invoked badly - you might want to print its usage then. If you exit normally your shell script might break weirdly but if you exit with error it's easier to spot the reason of failure.

On the other hand you made me thinking and probably you should have three code passes per default:

  [0] normal behaviour (exit 0)
  [1] bad arguments (exit EINVAL)
  [2] --usage (print to stdout but but exit != 0)?

Anyway I am not sure if it makes sense to declare "usage" as normal behaviour.

Someone · on Oct 21, 2014

In my book, there is a difference between explicitly asking for help/usage and passing arguments that do not make sense, which triggers the output of help/usage.

The former, I think, should write to stdout and return 0, the latter should write to stderr and return something non-zero.

Giving help if the user asks for it is normal behaviour.

pimlottc · on Oct 21, 2014

This annoys me to no end. Of course, you can work around it:

    annoying_program 2>&1 | less

but it is very unfriendly to stymie a user's attempt to get help when they're already probably confused.

Animats · on Oct 21, 2014

1978 called. It wants its pipes back.

That approach dates from the days when you got multi-column directory listings with

  ls | mc

Putting multi-column output code in "ls" wasn't consistent with the UNIX philosophy.

There's a property of UNIX program interconnection that almost nobody thinks about. You can feed named environment variables into a program, but you can't get them back out when the program exits. This is a lack. "exit()" should have taken an optional list of name/value pairs as an argument, and the calling program (probably a shell) should have been able to use them. With that, calling programs would be more like calling subroutines.

PowerShell does something like that.

grosskur · on Oct 21, 2014

You can simulate this with so-called "Bernstein chaining". Basically, each program takes another program as an argument, and finishes by calling exec() on it rather than exit(), which preserves the environment. See:

http://www.catb.org/~esr/writings/taoup/html/ch06s06.html

Or write environment variables to stdout in Bourne shell syntax so the caller call run "eval" on it. Like ssh-agent, for example.

gohrt · on Oct 21, 2014

Continuation Passing Style! http://en.wikipedia.org/wiki/Continuation-passing_style

agumonkey · on Oct 21, 2014

Oh wow, unix continuation passing style. Never heard of that o_o;

oneeyedpigeon · on Oct 21, 2014

I agree that the column formatting code shouldn't be in ls. However, if it were removed (which it won't ever be, of course: theoretical) I would want every system I ever access via a terminal to somehow alias ls to "ls | mc". To support full working of ls, though, that can't just be a straight alias, so I need a shell script to handle things like parameters to ls, which itself is then aliased to ls ... is that really better?

4ad · on Oct 21, 2014

In Plan 9 programs return strings instead of numeric codes.

to3m · on Oct 21, 2014

Additional tip: if writing a tool that prints a list of file names, provide a -0 option that prints them separated by '\x0' rather than white space. Then the output can be piped through xargs -0 and it won't go wrong if there are files with spaces in their paths.

I suggest -0 for symmetry with xargs. find calls it -print0, I think.

(In my view, this is poor design on xargs's part; it should be reading a newline-separated list of unescaped file names, as produced by many versions of ls (when stdout isn't a tty) and find -print, and doing the escaping itself (or making up its own argv for the child process, or whatever it does). But it's too late to fix now I suppose.)

fragmede · on Oct 21, 2014

> newline-separated list of unescaped file names

That breaks when you have newlines in filenames, no?

to3m · on Oct 21, 2014

File names often have spaces in them, but very rarely newlines. Based on xargs's current behaviour, it's clearly no problem to just not support certain characters in file names by default. I just think it would have been more useful for it to not support a smaller set of names.

mjevans · on Oct 21, 2014

You would be amazed what various broken tools can produce for filenames.

to3m · on Oct 21, 2014

I can't decide if this is a rebuttal, or not ;) - assuming it is, note that the number of possible paths containing newlines OR spaces is smaller than the number of possible paths containing only newlines, so an xargs that didn't handle newlines by default would still be supporting more possible paths than it does in its current state!

simpleigh · on Oct 21, 2014

Only the other week I managed to mouse-twitch into existence a file that rm refused to remove.

pstuart · on Oct 21, 2014

> That breaks when you have newlines in filenames, no?

That seems like an extremely pathological case.

userbinator · on Oct 21, 2014

Pathological or not, ensuring that pathnames can essentially contain any byte value except the 0 terminator, and it will still work, is important to prevent surprising behaviour which often has security implications.

typedweb · on Oct 21, 2014

The only character not allowed in Unix file names is the forward slash directory separator, so even that would be a pathological mistake waiting to bite someone.

Edit: my mistake, they can't contain nulls either: https://news.ycombinator.com/item?id=8485861

myhf · on Oct 21, 2014

too pathological; didn't implement

deathanatos · on Oct 21, 2014

> That seems like an extremely pathological case.

When a human is creating files by hand, I almost certainly agree. When a program is creating files, however, it's only a matter of time before weird characters wind their way in there.

I really wish newlines had been disallowed. (There's UI implications, in addition to the parsing ones — how do you do a list view with newlines in the filename?; I also wish filenames had a reliable character set and weren't just bytes.)

bmn_ · on Oct 21, 2014

I think dwheeler is trying to get this fixed/standardised in POSIX via the Open group.

oblio · on Oct 21, 2014

That it's going to be an uphill battle is an understatement.

Someone replied on LWN, when he posted his proposal, that he had implemented a sort of home-grown database using non-UTF8 characters for the file names.

Rube Goldberg, indeed!

userbinator · on Oct 21, 2014

how do you do a list view with newlines in the filename?

Show them with the standard escape sequence for a newline:

    This\ filename\ncontains\ a\ newline

Same for any other characters that could be considered 'special' in output; I really wish the backslash convention for escaping was more common. Character sets and such are a UI/display issue, so I don't think there should be any special handling for them at the lower levels of the system.

mjevans · on Oct 21, 2014

UI issues; on display format all /printing display elements/ (including spaces as spaces and things that look like whitespace but aren't spaces) with readable glyphs, or those numeric standins for non-rendering glyphs.

on Oct 21, 2014

[deleted]

oneeyedpigeon · on Oct 21, 2014

Whilst OSX's file-naming is disgusting (hello, /Library/WebServer/CGI-Executables), I don't think I've ever encountered newlines in filenames, and I've used it a fair amount. What are you referring to?

mappu · on Oct 21, 2014

And \x0 separator breaks when you have \x0 in filenames. Pragmatically it's a question of rarity, but ultimately the shell should support something like prepared queries in SQL.

hedgehog · on Oct 21, 2014

Filenames do not contain nulls: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_...

ole_tange · on Oct 22, 2014

You view was heard in the design of GNU Parallel: It defaults to newline separation, escapes the argument, and is for most cases a drop-in replacement of xargs.

This does what you would expect:

  echo My brother\'s 12\" records.txt | parallel touch

wahern · on Oct 21, 2014

xargs assumes the input is composed of quoted and escaped atoms. Compare

$ printf '"foo bar"' | xargs -n1

and

$ printf '"foo" "bar"' | xargs -n1

and

$ printf "%s" '\\"foo bar\\"' | xargs -n1

acabal · on Oct 21, 2014

Great article. The other thing I've always wished for command-line tools is some kind of consistency for flags and arguments. Kind of like a HIG for the command line. I know some distros have something like this, and that it's not practical to do as many common commands evolved decades ago and changing the interface would break pretty much everything. But things like `grep -E,--extended-regexp` vs `sed -r,--regexp-extended` and `dd if=/a/b/c` (no dashes) drive me nuts.

In a magical dream world I'd start a distro where every command has its interface rewritten to conform to a command line HIG. Single-letter flags would always mean only one thing, common long flags would be consistent, and no new tools would be added to the distro until they conformed. But at this point everyone's used to (and more importantly, the entire system relies on) the weird mismatches and historical leftovers from older commands. Too bad!

daxelrod · on Oct 21, 2014

One sort-of attempt at this is GNU's Coding Standards.

Long and Short Options: https://www.gnu.org/prep/standards/html_node/Option-Table.ht...

General Interfaces: https://www.gnu.org/prep/standards/html_node/User-Interfaces...

Command Line Interfaces: https://www.gnu.org/prep/standards/html_node/Command_002dLin...

Program Argument Syntax: http://www.gnu.org/software/libc/manual/html_node/Argument-S...

ramses0 · on Oct 21, 2014

How to be Unix-y in Eleventy-Billion Steps.

http://www.robertames.com/blog.cgi/entries/the-unix-way-comm...

""" The two surprising finds in the above documents are the standard list of long options and short options from -a to -z.

Forver and a day I am trying to figure out what to name my program options and these two guides definitely help. It allows me to definitively say you should use -c … for “command” instead of -r … for “run” because -r means recurse or reverse. """

--Robert

voltagex_ · on Oct 21, 2014

http://www.catb.org/~esr/writings/taoup/html/ch10s05.html lists alternatives for each short option, so which do you choose?

userbinator · on Oct 21, 2014

Actually most of them are quite consistent since POSIX published guidelines for it - and the only inconsistencies are historical exceptions:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_...

(I'm not so convinced that long options are a good thing, as evidenced by the --extended-regexp/--regexp-extended and other little "was it spelt this way or that?" type of confusions. It's not hard to remember single letters, especially if they're mnemonic.)

pimlottc · on Oct 21, 2014

Long options are very nice to use in scripts as they are somewhat self-documenting. Compare:

    curl -kLIiso example.org www.example.org

versus:

    curl --insecure --location --head --include --silent --output example.org www.example.rog

And of course as a practical matter, with short opts you'll run out of characters eventually, and meaningful mnemonics before that.

jzwinck · on Oct 21, 2014

You're right, myriad popular tools are not totally consistent (ls -h and du -h are similar but grep -h is very different). There is a bit of hope however--the GNU folks have documented lots of the options currently in use so you can try to find one that fits when you build new tools: https://www.gnu.org/prep/standards/html_node/Option-Table.ht...

dTal · on Oct 21, 2014

I've often thought this - that xkcd.com/1168/ is funny is a terrible embarrassment. I would also like to add that manpage syntax help should be standardized and machine-parseable. I had an idea recently to auto-generate GUIs for command line tools from the manpage syntax line, but it turned out that while such lines look precise but cryptic, they are often in fact highly ambiguous, nonstandard, and still cryptic. This seems broken to me.

james2vegas · on Oct 21, 2014

Blame man(7), have a look at mdoc(7): Semantic markup for command line utilities: Nm : start a SYNOPSIS block with the name of a utility; Fl : command line options (flags) (>=0 arguments); Cm : command modifier (>0 arguments); Ar : command arguments (>=0 arguments); Op, Oo, Oc : optional syntax elements (enclosure); Ic : internal or interactive command (>0 arguments); Ev : environmental variable (>0 arguments); Pa : file system path (>=0 arguments)

JetSpiegel · on Oct 21, 2014

Linux kernel tools solves this, but the help looks cryptic and is difficult to parse by a human(such as myself).

I can't find documentation on what I mean, but try ip --help

stass · on Oct 21, 2014

Are you referring to GNU sed? Other sed implementations I know of actually use -E for extended regular expressions support. I could never understand why GNU picked up -r for sed...

As for dd, it came from a non-UNIX OS and kept the original syntax.

JetSpiegel · on Oct 21, 2014

It would be easier to write a wrapper that called the real commands. You would have to writer a wrapper for each command, so it's still a lot of work.

dap · on Oct 21, 2014

Lots of great points here, but as always, these can be taken too far. Header lines are really useful for human-readable output, and can be easily skipped with an optional flag. (-H is common for this).

The "portable output" thing is especially subjective. I buy that it probably makes sense for compilers to print full paths. But it's nice that tools like ls(1) and find(1) use paths in the same form you gave them on the command-line (i.e., absolute pathnames in output if given absolute paths, but relative pathnames if given relative paths). For one, it means that when you provide instructions to someone (e.g., a command to run on a cloned git repo), and you want to include sample output, the output matches exactly what they'd see. Similarly, it makes it easier to write test suites that check for expected stdout contents. And if you want absolute paths in the output, you can specify the input that way.

zaptheimpaler · on Oct 21, 2014

I also think headers should be included. Its really annoying to go pore through a man page just to see what the columns mean. You could use flags, or maybe send headers to STDERR.

0xbadcafebee · on Oct 21, 2014

Not every program will be able to take input in stdin and output to stdout. If you have a --file (or -f) option, you'd do well to support a "-" file argument, which means either stdin or stdout, depending if you're reading or writing to -f. But you won't support "-" if the -f option requires seeking backwards in a file. Neither will you be using stdin or stdout if binary is involved (because tty drivers).

'One thing well' is often intended to make people's lives easier on the console. Sometimes this means assuming sane defaults, and sometimes just a simpler program that does/assumes less. Take these two examples and tell me which you'd prefer to type:

  user@host~$ ls *.wav | xargs processAudio -e mu-law --endian swap -c 2 -r 16000
  user@host~$ find . -type f -maxdepth 1 -name '*.wav' -exec processAudio -e mu-law --endian swap -c 2 -r 16000 {} \;

Write concise technical documentation. Imagine it's your first day on a new job and you need to learn how all your new team's tools work; do you want to read every line of code they've written just to find out how it works, or do you want to read a couple pages of technical docs to understand in general how it works? (That's a rhetorical question)

Definitely provide a verbose mode. When your program doesn't work as expected, the user should be able to figure it out without spending hours debugging it.

_pmf_ · on Oct 21, 2014

I have a strong bias against people who quote their own tweets in their own blog posts. I find this to be highly narcissistic.

1amzave · on Oct 21, 2014

I sympathize, but I have to say I find it far less annoying than the constant implorings to "follow me on Twitter!" that have become obnoxiously ubiquitous in the last few years.

RexRollman · on Oct 21, 2014

Wow, its been a while since I've seen a monkey.org link. I thought the site was dead. Nice to see I was wrong.

mseepgood · on Oct 21, 2014

Another tip: don't do colored output. I don't want to deal with ANSI codes in your output.

arh68 · on Oct 21, 2014

I think it's insane to restrict programs to just STDOUT & STDERR. Why 2? Why not use another file descriptor, maybe STDFMT, to capture all the formatting markup? This would avoid -0 options (newlines are markup sent to stdfmt, all strings on stdout are 0-terminated), it would avoid -H options (headers go straight to STDFMT), it would allow for less -R to still work, etc.

It's possible other descriptors would be useful, like stdlog for insecure local logs, stddebug for sending gobs of information to a debugger. It's certainly not in POSIX, so too bad, but honestly stdout is hard to keep readable and pipe-able. Adding just one more file descriptor separates the model from the view.

0xbadcafebee · on Oct 21, 2014

I honestly have no idea what you are talking about. The whole point of standard i/o streams is for them to be portable and composable by other programs without those programs having to be designed to work with yours. POSIX is here for a very good reason.

Obviously not every program will use just two file descriptors. Binary isn't handled by stdin and stdout because they're typically used for tty input/output. If you need to handle multiple files you'll take a list of file arguments. Often a program takes no input at all that isn't a command-line option.

And what 'formatting markup'? There is no 'markup' on a terminal, unless you're dealing with colors or something, which you would disable if your fd wasn't a tty. And why would you send 'headers' to a completely different file descriptor anyway?

Oh, I think I get it now. You confused the MVC architecture with Unix programs. Unix programs don't provide a user interface.

arh68 · on Oct 21, 2014

> In your program's design, the 'cat' program would handle all kinds of file i/o, provide some kind of ncurses text GUI to select a file, a progress bar for the progress of text flowing through it, sending errors to a logging subsystem

Not at all. cat wouldn't have a ncurses GUI, that doesn't make sense. My point is that 'cat --verbose' should be an option, where the stdout doesn't change but extra crap is sent elsewhere, and probably just dumped on the terminal like stderr. I sometimes want to see extra context and line numbers in my grep searches (grep -nC 3 ..) but I might want the stdout to remain clean. This makes programs more composable. Right now it's like we've got stdfmt permanently redirected towards stdout.

In practical terms, vi does its own paging. It's not a wrapper over echo | ed | less. One giant monolithic subsystem. Perhaps vi is the exception. But dd offers a progress bar, but only if you send it a SIG of some sort. wget offers a progress bar by default (silence is golden? not so much). ls yields differently columned outputs to ttys or files. I suppose this is the simplicity of Unix that I shouldn't touch.

Some unix tools work really well already, and I'm not suggesting destroying tar or xargs. I'm not sure how systemd works into this, but I'm not really a fan of that.

I guess Plan9 wasn't Unix, either.

masklinn · on Oct 21, 2014

> I honestly have no idea what you are talking about. The whole point of standard i/o streams is for them to be portable and composable by other programs without those programs having to be designed to work with yours.

His point is that two streams are not enough, you don't want to present the same output stream or a human, a logfile or an other utility.

> And what 'formatting markup'? There is no 'markup' on a terminal, unless you're dealing with colors or something

Right, so there is markup on a terminal.

> which you would disable if your fd wasn't a tty.

Which would be much simpler to handle if there was a stream for human consumption and one for piping

> And why would you send 'headers' to a completely different file descriptor anyway?

Because headers are useful to human users or when capturing output in a file to read later rather than in an other utility?

0xbadcafebee · on Oct 21, 2014

In practical terms, everything you mention should be done by different programs, not one giant monolithic subsystem that manages 10 completely different tasks. Each component should be reusable, independent, and interoperable. Not tied into one program.

In your program's design, the 'cat' program would handle all kinds of file i/o, provide some kind of ncurses text GUI to select a file, a progress bar for the progress of text flowing through it, sending errors to a logging subsystem, storing header metadata in some object passed along its output streams, etc. The Unix designers had dealt with this kind of crap before, and were sick of it, and so they wrote a program which did only one thing.

What you describe is the systemd school of design. If I just make my program more complex and technically superior, i'll have a better program. Who cares that nobody wants to use it, or that it's burdensome, hard to extend, difficult to understand, and incompatible with everything that exists today? Who cares if we can already do all these things without all the downsides? Technical superiority trumps practicality. Well, that's not Unix.

The Unix environment flourished not only because it was widely available, but mainly because it was incredibly efficient. By removing all the things they didn't need, they made the system better. There are four words that accurately express all of this, and that should guide the development of any Unix tool:

Keep it simple, stupid. https://people.apache.org/~fhanik/kiss.html

chilicuil · on Oct 21, 2014

I agree with what is exposed on the article and I've actually added more details in how to apply this "principles" to shell scripting:

http://javier.io/blog/en/2014/10/21/hints-in-writing-unix-to...

jwr · on Oct 21, 2014

I would add to this list:

If you are intercepting UNIX signals (starting with SIGINT), go back to the drawing board and think again. Don't do it. There is almost never a good reason for doing it, and you will likely get it wrong and frustrate users.

pjc50 · on Oct 21, 2014

I wrote one of these ages ago that was very useful (regain interactive control of an otherwise batch program) but broke all sorts of 'rules', including doing blocking IO in the signal handler.

edwintorok · on Oct 21, 2014

How about cleaning up tempfiles on ^C?

renox · on Oct 21, 2014

YMMV but I prefer cleaning the old tempfiles at start-up. It allows you to get the content of the tempfiles after the program stopped, very handy for debugging..