Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What CLI userlands (POSIX, Powershell, Plan9) are missing, IMHO, is a self-describing serialization "stream container" format, like Avro (https://avro.apache.org), the interchange format used in Hadoop, Kafka, and several other systems of the "gluing IO-streaming components of different languages together into an ETL pipeline" variety. (Which is, of course, exactly what you're doing when you write a Unix pipeline, just in-the-small.)

Where in self-describing data formats like JSON or XML (or even the more efficient encodings like ASN.1), every term is "described in place", taking up a lot of encoding overhead and bandwidth; in a self-describing stream container format, each encoded stream first encodes a schema (or the ID of one) for what it's about to transmit; and then transmits the terms encoded using the schema.

Because schemas are referred to by embedding them in the document in a normalized form, each stream received by a client can be handled by a hybrid approach between building up just-in-time dynamic parsers from the schema, if it's not known; or recognizing known schemas (by e.g. hashing the representation of the schema, which works since it's in a normal form) and using a baked-in decoder for that specific schema if available.

This would work pretty well as an enhancement to standard POSIX CLI IO-streaming tools; they'd be able to use specific logic to optimize the parsing of a few schema-encoded types they "expect", while also faithfully (but less efficiently) handling data of any type by falling back to a generic parser routine. (By which I don't mean "falling back to treating the stream as text", but rather "falling back to treating the stream as the custom type that the sender specified." Sort of like how, in most languages, you can compile regexes that are available at compile time, but also have a regex interpreter for regexes received at runtime.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: