Hacker News new | past | comments | ask | show | jobs | submit login

> Write programs to handle text streams, because that is a universal interface.

I always hated that part of UNIX. It would be so much better if programs could handle data-structure streams instead. Having text streams causes every single program to implement ad-hoc parsers and serializers.

I now use the command-line only for trivial one-time actions. For everything else I'll use a scripting language and forget about reading and writing text streams altogether.




The answer is probably the same as why powershell isn't as usable as a unix shell is. Which in turn has a lot to do with why we're still programming text files and not clicking fancy objects, despite it is seemingly a more powerful system and the many projects which tried to take advantage of that.

Text is a useful common denominator. Text is possible to version control, tie to bug trackers, and handle with configuration management systems.

The same is true for the command line. If you handle structured data, or objects, you communicate using APIs. While it's not theoretically impossible to still use version control and configuration management, it turns out that it's much more difficult in practice. Plain text is a useful lowest common denominator.


We're already creating ad-hoc APIs using cut, sed and awk and grep (to name a but a few) all the time in order to massage the data into a format the next program in the chain will understand. This sometimes involves non-trivial invocation chains, I always feel like I'm working on a representation of the data rather than the data itself.

I would much rather have functional primitives (map, filter, reduce, zip, take, drop, etc) doing this work.


It would seem to be better in theory, but I don't think it's much better in practice. I could never get on with PowerShell, though that's a further step beyond what you suggest - not just structured streams, but object streams.

It's like the difference between static and dynamic typing. Solving the type system's constraints adds complexity over and above the irreducible complexity of the problem. Static typing pays for its added complexity by proving certain things about the code, but for ad-hoc, short-lived code it usually isn't worth it. And most code (by frequency, if not by importance) using streams is ad-hoc, on the command line.

With a structured stream, there are only a handful of generic utilities that make sense: map, filter, reduce, etc. (and they better have a good lambda syntax). Whereas the advantage of unstructured streams is that utilities that were never designed to work together can be made to do so, usually with relatively little effort.

For example, suppose you have a bunch of pid files in a directory, and you want to send all the associated processes a signal. What kind of data structure stream does your signal sending program accept? What needs to be done to a bare number to convert it into the correct format? How do you re-structure plain text from individual files? Structure in streams seems to have suddenly added a whole lot of complexity and work, and for what?

Whereas:

    cat $pid_directory/*.pid | xargs kill -USR1
(I don't really see how a scripting language solves your issue. You still need to parse the output and format the input of all the tools you exec from your scripting language. Or maybe you're not actually using tools written in lots of different languages? Because this is one of my main use cases for the shell using streams: gluing focused programs together without constraint on implementation language.)


>For example, suppose you have a bunch of pid files in a directory, and you want to send all the associated processes a signal. What kind of data structure stream does your signal sending program accept?

What program? A single line of shell code would work fine. Kill itself only need take a pid, or an actual handle if Unix had sich a thing.

>What needs to be done to a bare number to convert it into the correct format?

If a "bare number" isn't the correct format, why would you have them at all?

>How do you re-structure plain text from individual files?

The whole idea is not to use plain text at all.

>Structure in streams seems to have suddenly added a whole lot of complexity and work, and for what?

Structuring your data doesn't add complexity; when you consider the hoops one jumps through to strip data of its structure at one end of a stream and reconstitute it at the other, it's really reducing it. It's only if you insist on also using unstructured representations that complexity is increased.

Of course, as long as Unixes and their shells only speak bytestreams and leave all structuring, even of program arguments, to individual programs, it's a moot point. He's still right aboutnit being a shitty design, though.


> When you consider the hoops one jumps through to strip data of its structure at one end of a stream and reconstitute it at the other, structured data is really reducing complexity.

Exactly this. I think HN doesn't have much experience with powershell, which is why you're currently being downvoted. So let's have a practical example: consider killing processes run in the last 5 mins using posh:

ps someprocess | where { $_.StartTime -and (Get-Date).AddMinutes(-5) -gt $_.StartTime } | kill

Now try the same on bash, and spend time creating a regex to extract time from the unstructured output of Unix ps.


kill `ps -eo pid,etime | grep -P ' (00|01|02|03|04):..$' | cut -d " " -f 1`

Not really a complex regexp thogh. I almost exclusively use Linux and thus bash/zsh etc. And yes, my piece above looks uglier and like but of a hack, but that's not the point. It's easy because it's discoverable. These are one-liners you write ad hoc and use once. But powershell in my experience lacks the discoverability that bash has, you can't look at what some tool outputs and then know how to parse it, you need to look at the underlying object first. Granted I have maybe one day of experience with PowerShell, but I don't know anyone who uses it for their primary interaction with the Computer. For Bash though...

(And yes I'm aware that you can also create huge complicated bash scripts, but you could also just use python)

Find the name of the CPU using powershell and have fun looking up the correct WMI class and what property you need.

Here's bash: grep name /proc/cpuinfo


> my piece above looks uglier and like but of a hack, but that's not the point.

Well it was precisely my point.

get-wmiinfo seems pretty discoverable to me. You can browse the output and pick what you want.


For the sake of completeness: your regex doesn't perform the task either.


Structure in streams seems to have suddenly added a whole lot of complexity and work, and for what?

Being able to stream a collections of bytes (and collections of collections of bytes, recursively) is one case that I find myself wanting when sending data between programs at the command line.

Consider:

  ls "$pid_directory" | xargs rm
This, of course, has problems for some inputs because ls is sending a single stream and xargs is trying to re-parse it into a collection on which to use rm.

If there were some way to encode items in a collection OOB, you could pipe data through programs while getting some guarantees about it being represented correctly in the recipient program. (Sometimes you see scripts that do this by separating data in a stream with NUL delimiters, but this doesn't work recursively or if your main data stream might have NUL in it.)


>Being able to stream a collections of bytes (and collections of collections of bytes, recursively) is one case that I find myself wanting when sending data between programs at the command line.

If you don't mind using JSON as intermediary format, you might like to have a look at jq: http://stedolan.github.io/jq/

jq can also convert raw (binary) data to JSON objects containing strings (and vice versa) for further processing. Naturally, jq filters can be pipelined in many useful ways.


There's work ongoing on FreeBSD to add libxo support to tools in the base system, which will allow you to get (amongst others) JSON out of various commands: https://github.com/Juniper/libxo


The .NET shell does this, but that's why it's not Unix, and it's not universal.

If programs passed data structures then either you're forcing a certain data structure model (i.e. it's not universal, because it's not compatible with anything else), or your data structures are so general (i.e. a block stream) that your applications are going to be parsing anyway... and that's going to be even nastier than if everything was stupid text steams in the first place.


Programs already need to have their input in the proper format in order to parse them. I'd like the mismatch error to be from the invocation of the program rather than halfway down the parsing.

For this to work data structures would have to be nothing more than scalars, sequences and mappings without specific concrete types. Just like JSON, YAML, and the rest do it now.


The main problem is that there are many kinds of incompatible "data-structure streams". Using a scripting language doesn't solve the problem, it just standardises on one particular ad-hoc parser/serialiser combination (say, Python's pickle, for example).

That's fine for personal use, or in a single project, but doesn't scale like "dumb" text does.


If there was a non-ideal but reasonable standard serialization format, like maybe IFF [1] or BSON [2], it would still simplify things, and common tools like `cat` might support decoding them.

It would be easy to wrap an arbitrary other packed format inside a binary string, or an arbitrary text format inside a text string.

IFF is quite successful in certain areas, BTW.

[1]: http://en.wikipedia.org/wiki/Interchange_File_Format [2]: http://en.wikipedia.org/wiki/BSON


You could always pass a message per line (so to speak) and use msgpack or json... if you want compression you could use json+gz

Of course, you could also do something similar with a 0mq adapter as well for a request/response server, which is available pretty much everywhere you might need it (language wise)... It's an interesting generic interface... however as I mostly use node for newer development, it's so trivial to use sockets or websockets directly for this, I'd just assume go that route... or for that matter, something like sockjs/shoe.


There's a very good reason that UNIX is data structure ambivalent; because supporting lower level (text streams) offers far more flexibility. If you want to use a specific data structure type among your own suite of scripts you are free to do this, UNIX does not get in your way.

At the same time, if you want to write a utility like grep that is agnostic to the structure of your text, it can exist and work. If UNIX cared about data structures, this wouldn't work.

Here's a good rule for you; if you find yourself thinking the way UNIX (or any other long existing and widely supported system) does something is dumb and you know better, assume you are wrong and look for the reasons that the people who are smarter than you chose to do it the way that it was done.


This is a tough lesson and for many the only teacher is experience. For the longest time I avoided what I considered abstract math, despite the general wisdom of its importance to computer science. I spent weeks writing software that could predict customer complaints based on supplier action. Given the first three days of complaints following an action, I could reliably predict the total number of complaints a year into the future. I proudly showed one of my coworkers, who didn't have the aversion to math that I did, he informed me that I just reinvented poisson linear regression. I learned my lesson that day. This guy will likely figure it out after a couple of tries as well :)


Have you read the Unix Haters Handbook?

http://web.mit.edu/~simsong/www/ugh.pdf


I hadn't, looks like an hour of two of amusement.

While there are some occasional good points that could be converted in to solutions, I can't help but think that this book is more amusement than actual constructive criticisms well packaged.


It's a collection of some of the best rants from the Unix Haters mailing list that existed in the late eighties/early nineties. The people that were on the list usually had experience with operating systems that were more advanced than Unix. Constructive criticism was never really the point of the mailing list.

My point was that the parent's suggestion that anyone who criticizes Unix obviously isn't smart enough to understand it is just flat out wrong and offensive.


> the parent's suggestion that anyone who criticizes Unix obviously isn't smart enough to understand it

That wasn't what I was saying at all. But thanks for playing.


How exactly would that work? How would you pass one kind of data structure from one program to another so that they both could understand it without involving parsers or deserializer? To be concrete let's start with the simplest possible data structure the linked list.


That's no different from text. You can't pass one kind of text from one program to another without both understanding it. That's why ls has a million different and confusing quoting options.

The advantage of using a proper data format is

a) You don't have to do in-band signalling so it will be far more reliable (you still can't have spaces in filenames for a lot of unixy things). b) The encoding is standard. Using text for pipes still requires some kind of encoding in general, but there are many different ways (is it one entry per line? space separated? are strings quoted? etc.)


A linked-list is already too specific, a sequence is all you need to express any linear arrangement of data, be it an array, linked list, vector, or any of the myriad of other concrete sequence types.


S-expression


And how do you convert any data structure to an s-expression? You serialize the data. How do you get the s-expression into a form your program can understand? You parse it.

In other words you still haven't solved the fundamental problem of passing data back and forth between different programs. In fact if you are going to mandate a specific serialization/deserialization format then JSON, XML, or even ASN.1 are better options than s-expressions.


My point was more, let your language do the parsing and deserialization for you. S-expressions are merely a textual representation of linked lists. The parsing and evaluation of text is already written as part of the language.

The other point was that we're ultimately stuck with serial forms of communication, be it wires, pipes, sockets etc. If we want to easily transfer structured data through these serial channels, we should probably build up our structures from a serial primitives, and S-expressions are much more handy than plain strings (which we may not even be able to parse without ambiguity), or XML, JSON or whatnot. One, because the parser is already implemented as part of the language, and secondly, because you can transfer code in addition to data, and evaluate it on the remote end to bring into scope more "structured" data like records.

I did try to include a bit more in the previous post, but I'd accidentally hit save, and I was unable to edit the post afterwards


Somehow it always felt natural in Lisp Machines.


How does PowerShell do it?


PowerShell can not pass anything structured unless the other end of the pipe is a cmdlet and even then there are times when the other end of the pipe is forced to interpret whatever is passed to it as a string instead of a more structured object like a dictionary.

So even within a controlled environment like PowerShell where everything is a cmdlet it is still impossible to pass only structured objects between commands.


Since there is so much pointless discussion about use of text streams under this comment.

When someone has example of situation when binary, json or other communication between websocketd and program is needed, please just file a ticket, it would be great to see practical situation instead of just arguing with each other about text stream/unix principles/json and other stuff.


At the risk of perpetuating the meme that hacker news is overly enamored of Golang, I'd like to say that providing "data structure streams" is exactly Go's sweet spot, and it's about as awesome as you would hope it would be.


No risk of that here. websocketd is written in golang.


You're on the right track. This does however present us with a challenging problem: developing a single consistent efficient cross-platform language-independent form of storing and transmitting data (between programs).

It's something to ruminate upon.


This is already a partly solved challenge. Text streams don't magically come into existence, programs needs STDIN, STDOUT and STDERR at their disposal to use them.

All you really need are scalars, sequences and mappings to fully express any data structure. Tagged types would be sugar on top but it could quickly add unwanted complexity.

All you need is a shell capable of marshalling data between the different programs piped together. Powershell is a nice idea but it only runs .NET code and that's a limitation I can't live with.


Most json stringifiers will create a json with no line terminators that aren't inside escaped strings... so a json object terminated by a CR would seem to be a pretty logical data format for open text...

There's also msgpack, protocol buffers as well... I think the plain text of json that is readily a line per message is far simpler and easier to handle though.


you never got to learn and understand the unix philosophy. your loss.


The "unix philosphy" is out-dated and wrong in many cases. You're an idiot if you follow it blindly.


A philosophy is neither right nor wrong. It's a philosophy, one that you choose to subscribe to, or not.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: