> I think avoiding numeric types is a good decision. Only if this format is inte...

dan-robertson · on Oct 4, 2020

You’ve given lots of examples of things that make parsing numbers difficult but I don’t see why they are relevant to a config file written by humans. I think it makes sense to have the number parsing owned by the thing which cares about the number format.

One example you provide is decimals for currency values but I claim you would want such values to look like $1234 in config files so that when they are reviewed or written, the person reading the file knows they are looking at a dollar value and can be concerned if it is too large.

I’m not suggesting that applications write their own number parsing. Just do uint64::parse or parseInt or Double.of_string, or whatever else you need to access your language’s number parsing routines.

jiggawatts · on Oct 4, 2020

Is the format written and read by a human?

> Just do uint64::parse or parseInt or Double.of_string, or whatever else you need to access your language’s number parsing routines.

Okay, so the computer is doing the parsing.

Those functions are notoriously inconsistent in their behaviour, particularly across different programming languages. If you're not careful, you'll end up accidentally using the internationalised versions of those functions. Even if you're careful, other people won't be.

Remember, data formats are for interchange. They have to be language agnostic. They have to be well-defined, and it should be possible to write a parser for them without having to guess at the precise details.

The harmful consequences of the Robustness Principle are now well-recognised in computer science: https://tools.ietf.org/id/draft-thomson-postel-was-wrong-03....

Some things need to be done properly, nor not at all.

megameter · on Oct 5, 2020

If you go fully against Robustness principle, you lose the reason to use textual formats as well, since they are designed to be forgiving of human errors in input and catch them in syntax.

And - it is certainly OK in many instances to have fixed-width, fixed byte-order binary encoding as the format's basis. It comes with the twin downsides of wholly different categories of errors cropping up, and with the lack of a universally agreed upon tool for human entry.

Perhaps text was a fashion, though. I definitely have had thoughts in that vein lately. And in that case we shouldn't always be rushing to use it as the source of truth when we have many good, machine-level agreements about numeric formats.

Ntrails · on Oct 4, 2020

I wrote some config for my application, which knows how to read it. Why do I want some other application In some other programming language to read it too?

I am far more worried about localisation issues than language issues. If you are storing something central to multiple applications I'd argue a text file is the wrong tool

cycomanic · on Oct 4, 2020

But which of these are problems in configuration files written by a human? That is the aim of the format. Moreover in applications were there could be issues, it would most certainly be tied to very specific fields and you would want specific application logic to handle that field. Now if people misuse it as a data exchange format or so, yes I agree with you, but at that point just use a binary format instead.

jiggawatts · on Oct 4, 2020

> That is the aim of the format.

That doesn't matter at all. The author's aims will be ignored if this format is used for anything even vaguely important. Eventually it'll need tooling to both read and write it.

DevOps pipelines, applications with GUIs, or something will need to both parse and generate this format in a consistent way.

There is no such thing as a human-write-only format in widespread use.

Even programming languages are regularly generated by tools such as RPC API codegen tools, LINQ-to-SQL and the like.

lsorber · on Oct 4, 2020

In my opinion, the best solution to these issues is to:

1. Declare numbers as numbers in the configuration language. E.g. "decimal(1e1000)".

2. Parse declared numbers with a lossless format like Python's decimal.Decimal.

3. Let users decide at their own risk if they want to convert to a lossy format like float.

GoblinSlayer · on Oct 5, 2020

Then you just roll your preferred serializer on top of format in a properly composable form.