Additional tip: if writing a tool that prints a list of file names, provide a -0 option that prints them separated by '\x0' rather than white space. Then the output can be piped through xargs -0 and it won't go wrong if there are files with spaces in their paths.
I suggest -0 for symmetry with xargs. find calls it -print0, I think.
(In my view, this is poor design on xargs's part; it should be reading a newline-separated list of unescaped file names, as produced by many versions of ls (when stdout isn't a tty) and find -print, and doing the escaping itself (or making up its own argv for the child process, or whatever it does). But it's too late to fix now I suppose.)
File names often have spaces in them, but very rarely newlines. Based on xargs's current behaviour, it's clearly no problem to just not support certain characters in file names by default. I just think it would have been more useful for it to not support a smaller set of names.
I can't decide if this is a rebuttal, or not ;) - assuming it is, note that the number of possible paths containing newlines OR spaces is smaller than the number of possible paths containing only newlines, so an xargs that didn't handle newlines by default would still be supporting more possible paths than it does in its current state!
Pathological or not, ensuring that pathnames can essentially contain any byte value except the 0 terminator, and it will still work, is important to prevent surprising behaviour which often has security implications.
The only character not allowed in Unix file names is the forward slash directory separator, so even that would be a pathological mistake waiting to bite someone.
When a human is creating files by hand, I almost certainly agree. When a program is creating files, however, it's only a matter of time before weird characters wind their way in there.
I really wish newlines had been disallowed. (There's UI implications, in addition to the parsing ones — how do you do a list view with newlines in the filename?; I also wish filenames had a reliable character set and weren't just bytes.)
That it's going to be an uphill battle is an understatement.
Someone replied on LWN, when he posted his proposal, that he had implemented a sort of home-grown database using non-UTF8 characters for the file names.
how do you do a list view with newlines in the filename?
Show them with the standard escape sequence for a newline:
This\ filename\ncontains\ a\ newline
Same for any other characters that could be considered 'special' in output; I really wish the backslash convention for escaping was more common. Character sets and such are a UI/display issue, so I don't think there should be any special handling for them at the lower levels of the system.
UI issues; on display format all /printing display elements/ (including spaces as spaces and things that look like whitespace but aren't spaces) with readable glyphs, or those numeric standins for non-rendering glyphs.
Whilst OSX's file-naming is disgusting (hello, /Library/WebServer/CGI-Executables), I don't think I've ever encountered newlines in filenames, and I've used it a fair amount. What are you referring to?
And \x0 separator breaks when you have \x0 in filenames. Pragmatically it's a question of rarity, but ultimately the shell should support something like prepared queries in SQL.
You view was heard in the design of GNU Parallel: It defaults to newline separation, escapes the argument, and is for most cases a drop-in replacement of xargs.
This does what you would expect:
echo My brother\'s 12\" records.txt | parallel touch
I suggest -0 for symmetry with xargs. find calls it -print0, I think.
(In my view, this is poor design on xargs's part; it should be reading a newline-separated list of unescaped file names, as produced by many versions of ls (when stdout isn't a tty) and find -print, and doing the escaping itself (or making up its own argv for the child process, or whatever it does). But it's too late to fix now I suppose.)