Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep.

We were recently storing tokens in a database, and I chose to use SOH for the metadata and SOX for the text.

One byte width, no collisions with printable text, and that's what they're there for.

I'd love to see a CSV replacement that used SOH ... SOX for headers, RS as "commas", and GS as "newlines". You'd be able to cleanly concatenate multiple files, since the first line is no longer special, and you'd be able to have commas, newlines, and in fact any printable text whatsoever inside the data.

And the semantics are perfectly clean. Again, that's what they're there for. Some small challenges for hand-editing that a competent text editor can easily rise to.



Me too. I think this could be achieved. But there would need to be good editor support to make it successful. And good keyboard support too.


There's a lot of value to the ecosystem here if it could be standardized - possibly it should be an RFC.

Combined with what rswail wrote about encoding hierarchies, with careful design, those CSV sections could be embedded as tables.

If that was used as a base format for other formats, then objection that encoding for boolean and numerics wasn't standardized might go away.


Looking closer at it.. I think US would be commas, and RS the newlines? Leaves the imagination what GS could be used for..


Hmm I think it would be nice to reserve US for when you have multiple entries in a single column.

Like if the column was phone numbers and occasionally there's more than one, that sort of thing. Thinking of each cell as a "record" and allowing it to have more than one "unit" makes sense to me.

But anything would be better than ever having to whip up a script to fix a CSV with comma-separated dollar values in it, ever again.


That is an interesting thought. Perhaps it is possible to arrange a type of nested structuring when this is needed. Like a CSV inside a value. C for "control code separated" of course :-)

Very thought inducing... I think the main impediment is that these characters are not visible and not so easy to type. If they were, we might not have got the number of CSV variants that have evolved.


Yes, the challenge is editor support.

What I'd want is an emacs special mode, that displays RS as a red* comma, GS as a simple newline, US as a red semicolon, and regular newline as a red "\n".

Comma, newline, and semicolon insert the control characters, while M-, etc insert the literal characters. Not sure exactly how to handle header lines but this is the general premise.

*red as in "whatever method of visually distinguishing them as special works for you"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: