NestedText, a nice alternative to JSON, YAML, TOML

comex · on Oct 3, 2020

> data type does not change based on seemingly insignificant details (2, 2.0, 2.0.0, “2”)

…only because NestedText does not support numeric types at all. That seems like throwing out the baby with the bathwater.

dan-robertson · on Oct 3, 2020

I think avoiding numeric types is a good decision. It tends to eventually cause problems when one implementation converts numbers to doubles, another to either doubles or longs depending on whether they have a . or e, and another which converts them to bignums (or passes them as strings to the caller).

One should remember that any sane application will be parsing the config file into internal data structures and validating it anyway so it gets little benefit from the numbers being already “parsed”.

There are also issues when something looks numeric but doesn’t parse (eg 1.2.3, 3/2, 12in, 4h30m2s, 2:30, 2020-02-29, etc). One way to deal with these is a tokenisation rule like in Common Lisp: if it is a valid number syntax then treat it as a number, otherwise it’s a symbol, but this can lead to issues (eg you would need to know that when your number needs more than float precision or otherwise doesn’t follow the rules, it should be in quotes. It seems crazy to pass that detail on to the poor sod who has to write the config file).

kanox · on Oct 4, 2020

> One should remember that any sane application will be parsing the config file into internal data structures and validating it anyway so it gets little benefit from the numbers being already “parsed”.

The benefit of standard numeric and boolean types is that different tools can exchange data in a well-understood way.

Getting rid of yaml's 30 ways to write "true" and "false" by making everything is a string just means that you now have 30 tool-specific ways to write "true" and "false".

The "everything is a string" approach already exists in shell scripts and TCL and it's not really that great.

kortex · on Oct 4, 2020

Except YAML is straight-up wrong at times, unless you know all the edge cases. I just learned unquoted NO is coerced to False. A classic case of leaky abstraction/"bad magic".

Unless you have actual type annotations/tags (eg xml, jsonld, graphql), everything IS a string. There's no assumptions otherwise.

anderskaseorg · on Oct 4, 2020

https://yaml.org/spec/1.2/spec.html#id2805071

Of course, even though YAML 1.2 is a decade old, there are still many parsers that accept YAML 1.1.

kortex · on Oct 4, 2020

Huh, I had no idea, thanks for the heads up!

hannibalhorn · on Oct 4, 2020

That seems like a pretty sensible approach - eliminates the "Norway problem" altogether. I'd guess you have to use the "%YAML 1.2" directive in all documents in order to get it though...

anderskaseorg · on Oct 4, 2020

You don’t need this directive to get YAML 1.2 parsers to parse your document as YAML 1.2; that’s the default. You only need it to instruct YAML 1.1 parsers to raise a warning (and I guess for parsers of future version of YAML to fall back to 1.2 behavior, if that ever happens).

https://yaml.org/spec/1.2/spec.html#id2781553

https://yaml.org/spec/1.1/#id895631

dragonwriter · on Oct 4, 2020

> Unless you have actual type annotations/tags (eg xml, jsonld, graphql),

YAML has actual type annotations (tags).

kortex · on Oct 4, 2020

Wait, really? scrambles to check woah...

I've never seen typed yaml, this is wild.

    negative: !!int -12
    zero: !!int 0
    positive: !!int 34

Can't say I love the notation, but indeed that is type annotations. I guess neither "yaml type hints" nor "yaml type annotations" are the right query. Had to search explicitly for "yaml tags".

Varriount · on Oct 4, 2020

The neat thing is that you can use tags to represent custom data structures or functions. For example, AWS Cloudformation uses YAML tags as a shorthand for template functions [0].

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...

kanox · on Oct 5, 2020

Depending on the parser you can also use them to call arbitrary code. This used to be the default behavior of `pyyaml.load`.

wowwow · on Oct 4, 2020

Types are !!screaming at reader. Don't want to start blameshitting, but is yaml supposed to be a simple human-readable format?

kchr · on Oct 4, 2020

It is screaming with double negations!

PaulDavisThe1st · on Oct 4, 2020

It's not screaming without double negations!!

nine_k · on Oct 4, 2020

I would hazard to say that any magic is bad in engineering. (Sorry Perl fans!)

laumars · on Oct 4, 2020

There isn’t really any magic in Perl, just lots of unfamiliar lexicon if you come from more traditional programming languages. Perls problem is it’s history is rooted in command line usage so there’s a tonne of inherited reserved single character variable names and such like that are optimised for keystrokes rather than readability. However you can certainly write Perl programs that don’t follow those older conventions and look more like a modern language.

bmn__ · on Oct 5, 2020

> There isn’t really any magic in Perl

http://p3rl.org/guts#Magic-Variables

    perl -MDevel::Peek=Dump -mTie::Scalar -e'
        //g; Dump $_; tie $c => "Tie::StdScalar"; Dump $c; Dump \%ENV
    ' 2>&1 | grep MAGIC

jlokier · on Oct 5, 2020

That kind of magic in Perl has a different kind of meaning than "magic number/bool/string parsing" in YAML.

Perl tied-variable magic just means there are (effectively) getter and setter properties attached to the variable. "Magic" is just the name that was chosen in the implementation, and it stuck.

It's used to implement variables with special, automatic meanings, like $$ for "current pid" and $! for "last error".

It's also used to implement variables with user-defined behaviours on access, which is quite handy for a lot of abstractions.

A lot of modern languages support both of these things, because they are useful, but it's not called magic in those languages, it's called something like "watchers", "proxies", "getters and setters" or "hooks".

No, the criticism of YAML-style "magic" is that it leads to entirely surprising behaviour from innocuous input. Perl magic is not that kind. If you're using a special variable, you already know why.

laumars · on Oct 6, 2020

That’s just playing around with the same reserved variables I spoke of before and there is nothing magical about them aside their silly name. In fact the opposite is true, they’re actually predictable and well documented. They just happen to have terse names as a throwback to command line usage (eg you wouldn’t call $1 a magic variable in Bash because it happens to work the same as ARGV[1]).

In fact in Perl, you can opt for longer, readable, lexicon over the terse single character variables; and that’s literally how modern Perl should be written.

Whereas the problems described with YAML is where it can automatically alter your data based on what the parser “thinks” the data should represent. Which is generally what people mean when they talk about “magic” in IT: systems that don’t honour your input and instead automatically convert it into something else. Perl doesn’t do this even in spite of it looking like executable line noise to many.

atoav · on Oct 4, 2020

I think "magic" is often just abstraction. And while abstraction is certainly necessary to speed up work and declutter the brains of the end users, too much abstraction takes control away and bad abstraction takes the wrong decisions in your name.

If you take a look at how string handling works in most programming languages there is a lot of "magic" going on there. Which isn't a bad thing necessarily, because most programmers don't want to deal with the intricates of strings unless they really have to. The key is that this magic doesn't get in your way and doesn't do too magical things nobody ever asked of it.

forgotmypw17 · on Oct 4, 2020

technology without magic? nonsense

kchr · on Oct 4, 2020

Insert Arthur C. Clarke quote here.

tybit · on Oct 4, 2020

That was the GPs point, YAML contains a lot of insanity, but let’s not throw out what it got right along with everything it got wrong.

efitz · on Oct 4, 2020

Encoding meaning in whitespace is an abomination. That’s all I have to say about YAML.

koliber · on Oct 4, 2020

Encodingmeaninginqhitespaceisanabomination.That’sallIhavetosay.

Corrado · on Oct 4, 2020

Encoding meaning in whitespace is an abomination. That’s all I have to say about Python.

geofft · on Oct 4, 2020

> The benefit of standard numeric and boolean types is that different tools can exchange data in a well-understood way.

In Python:

    >>> import json
    >>> json.loads('{"x": 9007199254740993}')
    {'x': 9007199254740993}

In my browser's JavaScript console:

    > JSON.parse('{"x": 9007199254740993}')
    {x: 9007199254740992}

(Consider what happens if you try to send a tweet's ID, a perfectly normal number like 205052027259195393, through JSON. Or if you try to serialize a stack trace on a 64-bit system, where addresses are also perfectly normal numbers.)

zelphirkalt · on Oct 4, 2020

I see what you are getting at and it is worrisome indeed, however, why would one encode an id as simple number?

geofft · on Oct 4, 2020

Why wouldn't you? It's a number. It's also sorted (up to about one second of inconsistency between Twitter worker processes on different machines), and you want to sort them by numeric sort, not by lexicographic sort as you would with strings.

I mean, there's a pretty clear argument here: Twitter themselves used to return these numbers as numbers in the API until they realized they were about to hit this problem. https://developer.twitter.com/en/docs/twitter-ids

zelphirkalt · on Oct 5, 2020

Using numeric types as ids probably leaks into other areas, where being numeric is then assumed. When you want to change it at some point, you got to be very careful suddenly. A string could simply contain a new kind of id. Seems less refactoring effort would be required. On the other hand, switching from numeric to strings might give you compiler errors when types do not match, so perhaps it might even make refactoring simpler.

imtringued · on Oct 6, 2020

You can always add a second or third key. Stop messing with the primary key. Many systems have a numeric primary key and then only expose a hash or UUID to the public.

imtringued · on Oct 6, 2020

>why would one encode an id as simple number?

That's an incredibly weird complaint when the real problem is that javascript's JSON.parse doesn't use BigInt for large numbers.

tmd83 · on Oct 4, 2020

I have very little understanding of JSON, so why would this happen?

zerocrates · on Oct 4, 2020

It's more a JS thing: The lone JS number type is double-precision floating point, meaning beyond a certain range there are integers that cannot be accurately represented as a JS number.

The JSON standard doesn't place restrictions on size or precision of numbers, instead just noting that implementations can vary their treatment of and limits on numbers. While JS uses doubles for all numbers, many other languages emit an integer type for a JSON integer. So, once you go beyond the range where a double can accurately represent all integers, you run the risk of a mismatch in how the number is interpreted by different languages parsing the same JSON.

Of course the spec also allows you to create way too big or too precise numbers that would be problematic in most languages as well; it's just that this is a somewhat common bugbear.

I wouldn't necessarily call it a flaw in JSON though, more an issue with JSON.parse or really just a fact of life when dealing with numbers in JS. Alternatives to the built in JSON.parse exist to read large integers as strings or bigints.

BookHeretykow · on Oct 4, 2020

JavaScript is using double as a number. There's no such thing like integer or floating point numbers per se, only so called Number.

Described issue is not problem of a JSON but engine which parsed it and language which stands behind the parser. Any config format will eventually have same result and same issue.

So forcing programmer to parse every single piece of data for sake of "it's his responsibility" is not a case here.

I also disagree this is in any way programmer responsibility to create standardized way of creating parser for everything. This format gives you nothing but indentation so you are forced to create documentation for everything field, what type it's and what kind of values it takes. Lots of extra work for nothing when you have any other format.

jamie_ca · on Oct 4, 2020

It's a Javascript problem. Integer caps out at 9007199254740991 (253-1), and after that it's treated as a BigInt. As long as you're working with BigInt literals, it'll be accurate, but when you convert back and forth you can lose precision.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

sergeykish · on Oct 4, 2020

Python promote to Bignum automatically.

Javascript number is a floating point

    Number.MAX_SAFE_INTEGER
    // 9007199254740991
    Number.MAX_SAFE_INTEGER + 1
    // 9007199254740992
    Number.MAX_SAFE_INTEGER + 2
    // 9007199254740992

no automatic promotion to BigInt

    9007199254740991n + 2n
    // 9007199254740993n

DougBTX · on Oct 4, 2020

The example uses numbers outside the range of JS numbers (too many significant digits), so when the JSON is evaluated as a JS number some of the least significant bits are rounded. Personally, I think it is a mistake to use values in JSON that can’t be represented in JS, but the standard doesn’t explicitly forbid it.

auscompgeek · on Oct 4, 2020

JavaScript numbers are double-precision floating point numbers.

kanox · on Oct 5, 2020

The solution to such problems is a better standard for the handling of numbers, not dropping support for numbers altogether.

pdonis · on Oct 4, 2020

> The benefit of standard numeric and boolean types is that different tools can exchange data in a well-understood way.

I don't think that's what this tool is for. This tool is for humans to read and edit data. That's a different use case from automated programs exchanging data, for which I agree you should be using standardized numeric and boolean data types and not making everything a string. But how that standardized data gets determined from data that humans enter should be up to the individual application.

miki123211 · on Oct 4, 2020

Agreed. Serialization is not really possible with this tool, due to the following:

> A key that requires quoting must not contain both single and double quote characters.

You can't really serialize user data with that restriction.

kanox · on Oct 5, 2020

It is still useful for humans to have a well-understood way to input typed data.

If you make everything a string then the interpretation of "no" as boolean true or false is left to each tool, and there are even tools which have different interpretations of "yes/no" for each field.

yoz-y · on Oct 4, 2020

What is the use of this then? Humans don’t need structured data to exchange information and if this is insufficient to communicate it to the machine then it won’t be a great human to machine format either.

pdonis · on Oct 4, 2020

> Humans don’t need structured data to exchange information

Maybe we don't need it, but it often helps.

Also, this data format isn't necessarily for humans to exchange data with other humans, but for humans to give data to applications in a format that's much easier for humans to use.

> if this is insufficient to communicate it to the machine

Not at all. Each specific application can easily parse this data format according to its own needs. What this data format doesn't specify is a single translation into application data that is the same for all applications. But applications don't need or want that, because they have different use cases.

yoz-y · on Oct 10, 2020

I see. That makes sense. Now that I think about it maybe if its in front of some database or storage that only handles strings it would work well as well.

pfraze · on Oct 4, 2020

I think this is an “input format” vs “serialization format” issue. NestedText is meant for user input so it’s focusing more on the UX of the syntax. In that case, having to parse strings is a reasonable task for developers to take on

epicureanideal · on Oct 4, 2020

If NestedText is meant to be written by non-programmer humans, the details of null, undefined, 0, false, "NO", nil, or whatever else, would likely also be lost on them.

Most likely if their input is meant to be machine interpreted they would need to be trained to provide specific inputs anyway. I like that NestedText doesn't hide that problem. It lets the user organization decide how it wants to manage that problem, and what symbols or words are understood by the people authoring the files.

GoblinSlayer · on Oct 5, 2020

>you now have 30 tool-specific ways to write "true" and "false"

ISO8601 joins the chat.

jiggawatts · on Oct 4, 2020

> I think avoiding numeric types is a good decision.

Only if this format is intended for use-cases that never need to deal with numbers.

> One should remember that any sane application will be parsing the config file into internal data structures and validating it anyway so it gets little benefit from the numbers being already “parsed”.

That statement couldn't possibly be more wrong.

Number parsing (and encoding!) is a decidedly non-trivial problem. You need to concern yourself with -- at a minimum -- all of the following:

- Unsigned 64-bit numbers.

- A series of digits that would be bigger than a 64 bit whole number. Convert to float? Truncate in some way? Error?

- NaN

- Infinity

- Negative zero

- Denormal numbers.

- Differentiating between decimal/currency types and floating point numbers. Not all decimal values can be exactly represented as floats!

- Efficiently encoding floating point to use the minimum digits without losing precision.

- Parsing those minimal numbers with perfect "round-tripping".

- Doing the above efficiently.

- Securely too! Efficient parsers cut corners on sanity checks. I hoped you fuzzed your parser...

The above can easily amount to many kilobytes of extremely complex code. Look up "ryu" as an example of what Google came up with to make JSON number parsing reasonably efficient.

Meanwhile, reading a fixed-length number from a binary format can be done in a single machine instruction. One. It might not even take an entire CPU clock cycle! Okay, two, if you need to bounds-check your buffer, but there's ways to avoid that.

Afterwards, the bounds check is again literally just two machine instructions in complexity. That's not the difficult bit!

The difficult bit is the parsing.

dan-robertson · on Oct 4, 2020

You’ve given lots of examples of things that make parsing numbers difficult but I don’t see why they are relevant to a config file written by humans. I think it makes sense to have the number parsing owned by the thing which cares about the number format.

One example you provide is decimals for currency values but I claim you would want such values to look like $1234 in config files so that when they are reviewed or written, the person reading the file knows they are looking at a dollar value and can be concerned if it is too large.

I’m not suggesting that applications write their own number parsing. Just do uint64::parse or parseInt or Double.of_string, or whatever else you need to access your language’s number parsing routines.

jiggawatts · on Oct 4, 2020

Is the format written and read by a human?

> Just do uint64::parse or parseInt or Double.of_string, or whatever else you need to access your language’s number parsing routines.

Okay, so the computer is doing the parsing.

Those functions are notoriously inconsistent in their behaviour, particularly across different programming languages. If you're not careful, you'll end up accidentally using the internationalised versions of those functions. Even if you're careful, other people won't be.

Remember, data formats are for interchange. They have to be language agnostic. They have to be well-defined, and it should be possible to write a parser for them without having to guess at the precise details.

The harmful consequences of the Robustness Principle are now well-recognised in computer science: https://tools.ietf.org/id/draft-thomson-postel-was-wrong-03....

Some things need to be done properly, nor not at all.

megameter · on Oct 5, 2020

If you go fully against Robustness principle, you lose the reason to use textual formats as well, since they are designed to be forgiving of human errors in input and catch them in syntax.

And - it is certainly OK in many instances to have fixed-width, fixed byte-order binary encoding as the format's basis. It comes with the twin downsides of wholly different categories of errors cropping up, and with the lack of a universally agreed upon tool for human entry.

Perhaps text was a fashion, though. I definitely have had thoughts in that vein lately. And in that case we shouldn't always be rushing to use it as the source of truth when we have many good, machine-level agreements about numeric formats.

Ntrails · on Oct 4, 2020

I wrote some config for my application, which knows how to read it. Why do I want some other application In some other programming language to read it too?

I am far more worried about localisation issues than language issues. If you are storing something central to multiple applications I'd argue a text file is the wrong tool

cycomanic · on Oct 4, 2020

But which of these are problems in configuration files written by a human? That is the aim of the format. Moreover in applications were there could be issues, it would most certainly be tied to very specific fields and you would want specific application logic to handle that field. Now if people misuse it as a data exchange format or so, yes I agree with you, but at that point just use a binary format instead.

jiggawatts · on Oct 4, 2020

> That is the aim of the format.

That doesn't matter at all. The author's aims will be ignored if this format is used for anything even vaguely important. Eventually it'll need tooling to both read and write it.

DevOps pipelines, applications with GUIs, or something will need to both parse and generate this format in a consistent way.

There is no such thing as a human-write-only format in widespread use.

Even programming languages are regularly generated by tools such as RPC API codegen tools, LINQ-to-SQL and the like.

lsorber · on Oct 4, 2020

In my opinion, the best solution to these issues is to:

1. Declare numbers as numbers in the configuration language. E.g. "decimal(1e1000)".

2. Parse declared numbers with a lossless format like Python's decimal.Decimal.

3. Let users decide at their own risk if they want to convert to a lossy format like float.

GoblinSlayer · on Oct 5, 2020

Then you just roll your preferred serializer on top of format in a properly composable form.

Phrodo_00 · on Oct 4, 2020

I am of the opposite view: As long as there are no edge cases, supporting more types in data languages is good, it leads to failing faster.

Too many types could get overwhelming, but I like where amazon's Ion is [1]. It actually supports multiple number types, with decimal being the default for values with a dot.

> you would need to know that when your number needs more than float precision or otherwise doesn’t follow the rules, it should be in quotes

Not really. The configuration value should either be a number or not, which is determined by the application reading the config. As a config writer you only care to make the type match (so, in json, if the application uses number you make sure you use number, and if it expects a string you use that)

(disclaimer: I work for amazon, but have nothing to do with Ion other than having used it. Opinion is my own, not my employer's, yadda yadda)

[1] http://amzn.github.io/ion-docs/

7952 · on Oct 4, 2020

I wonder if you could just include explicit type information?

  name(str): Dave
  age(int4): 22
  dob(date): 2020-02-01
  photo(base64):TWFuIGl....

zamadatix · on Oct 4, 2020

That's not the only way to deal with them, you can specify type like bson.

The nice thing about that is it solves the problem rather than hoping it doesn't matter or assuming each program's validator will think to note all of the possible data types not compatible with the program natively. If a u32 is defined in the file and you've only got doubles to work with it's a given you'll have to deal with it in your tool specific validation. For everyone else it's well defined.

The downside is it's a bit more verbose and if you have all of that info already it's pretty easy to jump to just using a binary format which will be more efficient anyways.

thayne · on Oct 4, 2020

My ideal config language would have unambigous syntax for at least 64-bit signed integers and doubles (so, for example the spec for the language requires 56 to be parsed as an integer and 56.0 to be a double. Additional types would be ok too, as long as the syntax is unambiguous and obvious.

samatman · on Oct 4, 2020

Both of those are true for TOML.

The spec requires that a full 64 bits of signed integer be parsed and understood, that hexadecimal, octal, and binary values be integers, and that floating point values be parsed as doubles.

It doesn't support hexadecimal float, however, which is a pity: having a guaranteed bit-identical format is a nice affordance.

barumi · on Oct 4, 2020

> I think avoiding numeric types is a good decision. It tends to eventually cause problems when one implementation (...)

It seems you're mixing up the language definition with implementations that try to follow the language definition.

If different implementations have different results then either they are buggy or the language has some important holes in the specification.

Either way,the solution to this problem is not less validation.

> One should remember that any sane application will be parsing the config file into internal data structures and validating it anyway so it gets little benefit from the numbers being already “parsed”.

That's the whole point, isn't it?

I mean, if you already acknowledge the fact that this parsing and validation is a basic requirement, why handle it as an afterthought and force developers to add their own hand-rollef absurd and unnecessary type checks and type coversions?

Wouldn't it simply easier to let the language and the parser do that already?

I mean, no one ever complained that JSON had string types. In fact, one of JSON's main complaints is that it doesn't support enough types, such as timestamps.

clarkevans · on Oct 3, 2020

In the design of YAML, Ingy made the case that we shouldn't have types for scalar values, that they should all just be strings. His argument was that each application knows what it needs and should have the ability to direct how those scalars are processed. If the format needs to be standardized across applications, one can use a schema system layered on top. In retrospect, I think he was right and I was wrong. For example, Fast Healthcare Interoperability Resources (FHIR) serialized as JSON treat numeric values as decimals, e.g. `2.3` is not a floating point value. Moreover, since parsing numbers is slow, it should be deferred till you actually need to interpret the numeric value.

crazygringo · on Oct 3, 2020

Indeed.

It's less of a problem if you're using it for configuration files, where a program knows what key's values need to be cast to an integer or float.

But it seems disastrous if you wanted to use it for storing or transmitting data, above all between applications. You're immediately throwing out the possibility of being able to serialize and then deserialize data in basically any programming language.

I shudder at the idea of an API that accepted NestedText, where I'd need to worry about whether my floating-point output was compatible with its floating-point string parser. Yikes. I want the serialization format to handle that. Isn't a major criticism of JSON that it doens't have a built-in datetime representation?

xg15 · on Oct 3, 2020

To be fair, this seems to be designed specifically for config files. Just as JSON should be used for data transmission but not for config, I could imagine, this should be used for config but not for data transmission.

remote_phone · on Oct 4, 2020

What if someone inputs 2.2.5 when the code is expecting an int? Or “abc”? It seems like it’s pushing all the config validation into user code which sucks.

geofft · on Oct 4, 2020

The code already has that problem. Certainly, if you're using YAML, a user could type 'version: 2.2.5' where you're expecting a major version number (an int), and all of a sudden your code is passed a string instead (or a string-flavored variant). You can imagine the same sort of problem in JSON too, usually from someone leaving off a quote where you're expecting a string. NestedText's philosophy seems to be, you're going to need your code to handle this anyway, we'll just pass you a string in all cases and it's up to you to convert it to an int on your own and validate it.

Frankly, in most languages, this is better because you don't have the types of objects randomly change based on user input. (In a few languages, with a few libraries, you can specify the type of the document to the parser and have it fail to parse the entire document if it can't deserialize to the right type, in which case this is a little weaker. But you can still do that with NestedText, just one step after the parser - have your own function that takes a ComplexStructure<..., String, ...> and returns a ComplexStructure<..., int, ...> or throws an error.)

Netch · on Oct 4, 2020

> The code already has that problem.

Only in untyped or dynamically-typed languages. But even in JavaScript one may write +obj.version instead of obj.version to make it numeric. Evading this is a straight way to hell.

In a statically typed one, conversion is typically generated from description and type checking applies just at reading.

The problem with vaguely specified format is in more simplex cases. Shall we accept 45x as number (and what it value will be, 45 or 0)? 045? 045x? What date is 1/2/3, 1-2-3? And so on.

geofft · on Oct 4, 2020

In a statically-typed language, this is an entirely reasonable thing to want, but also, existing languages like JSON and TOML don't give you that either:

- There's no way to say that you want an integer; you get a floating-point value.

- See https://news.ycombinator.com/item?id=24676484 , you can't reliably accept integers over 2^53 without taking them as strings.

- Someone can always specify something of the actual wrong type. (Imagine changing YAML "version: 1.9.1" to "version: 1.10". You can't just stringify 1.10, you'll get "1.1"!)

So, in a practical data format, the schema for your document needs to say something like "This is a number, which must be an integer between 0 and 2^16" or "This is a string, make sure to quote it" or whatever, and a generic statically-typed JSON- or YAML-parsing library isn't going to handle that for you. And telling your users "the input format is JSON" doesn't answer that question: you must make it explicit to users.

Fortunately, you can handle it just fine in a statically-typed language in one of two ways. One is to accept an object from your parser that consists of variant types and pass it through your own function that validates it against a schema, and then either returns a more-restrictively-typed object or throws an error. Such a function could easily do string conversion too if given NestedText input, as I mentioned. The other is to pass some information into your parser saying, don't act like a generic JSON/YAML parser, instead interpret these particular fields in this particular way and accept only things with this structure. If you're doing that, you can easily tell the parser to use this particular string-to-integer function on the strings in NestedText and then return an appropriately-typed object containing an integer to you.

nicoburns · on Oct 4, 2020

> Shall we accept 45x as number (and what it value will be, 45 or 0)? 045? 045x? What date is 1/2/3, 1-2-3? And so on.

I think the point is that the answers to those questions may well be application-specific. In which case it is better to not bake them into the file format.

Mindless2112 · on Oct 3, 2020

You wouldn't use this for arbitrary object serialization. Lists can only contain strings, not other lists or dictionaries.

As for stringification, JSON's data types are mismatched with pretty much everything that isn't JavaScript to some degree. If you need to serialize a 64-bit integer to JSON, you serialize it as a string because the parser on the other end is probably going to try to parse it as a double-precision floating point number. Once you've started serializing numbers as strings anyway, it's not too far to "serialize every scalar as a string".

recursive · on Oct 4, 2020

Well, there's nothing about the JSON spec that says you can't serialize 64-bit integers to JSON. The grammar allows unlimited precision. Implementations may vary.

thewakalix · on Oct 4, 2020

> Lists can only contain strings, not other lists or dictionaries.

I don’t think this is true. None of the examples have lists containing non-string objects, but the documentation doesn’t seem to draw a distinction between lists and dictionaries wrt what can be placed in them.

(Both lists and dictionaries are initially described as only containing strings, and later this description is expanded to include nesting; this counterintuitive arrangement may explain the confusion.)

crazygringo · on Oct 3, 2020

Ah ha, got it. The page never really makes it clear that this is for configuration files only, not serialization.

Indeed, not being able to have lists of dictionaries, or lists of lists, is very restrictive. Seems to be for very simple configurations only. E.g. a set of preferences, but not a set of monitor calibrations. (Inventing arbitrary dictionary keys seems pretty hacky.)

krick · on Oct 4, 2020

So, this is basically YAML, "but better". I can repeat once more that "easily understood and used by both programmers and non-programmers" is unapologetically stupid concept that can never succeed. So I see how all of this will sound all too familiar to anybody with a little experience, which makes them to automatically dismiss this YAYAML.

But YAML is really quite complicated, and JSON (which shouldn't be used for config files at all) and TOML (which I love and wish it would gain more popularity) aren't exactly alternatives to YAML. So, I would be actually totally ok with "YAML, but better", as a way to deprecate YAML.

Now, it is clear from the start that this cannot deprecate YAML, because it doesn't even have booleans and numbers. But, surprisingly, I can accept this as well: ok, let's just assume that being good at dealing with strings may be enough.

The problem is, it isn't clear at all from the docs, if this is better than YAML at anything. It raises dozens of questions. I'll start with the most basic ones (using [] as a wrapper/delimiter): how do I represent values [ a], [a ], ["a"] and [""] in this file format?

pydry · on Oct 4, 2020

>basically YAML, "but better"

That was my intention behind this, too:

https://github.com/crdoconnor/strictyaml/

The general structure of YAML is fine I think but its feature set grew a little bit out of control.

The "cleanliness" of the format leads to one of its inherent weaknesses - syntax can't be used to encode type information so you either need a schema to encode type information (strictyaml approach) or have magic conversions (yaml approach) or to assume strings (strictyaml w/o schema/nestedtext).

The interesting thing I discovered about schemas building this is that it kind of pays to make them extensible and build them in a turing complete language. Schema validation done using a non-turing complete language (e.g. jsonschema) allows for cross language usage but it ends up being a kind of blunt object.

zelphirkalt · on Oct 4, 2020

What kind of values are [<whitespace>a] and [a<whitespace>] supposed to be? They look like typical YAML syntax traps to me.

Why should JSON never be used for configuration? It is sufficient for declaratively expressing anything I have encountered. Do we really need references or other stuff from YAML? For configuration this seems unnessecary, provided that the program, which interprets the result of parsing the JSON is well written.

krick · on Oct 4, 2020

> What kind of values are [<whitespace>a] and [a<whitespace>] supposed to be?

I don't understand the qustion. They are supposed to be exactly that, [<whitespace>a] and [a<whitespace>]. I assure you, I've encountered many situations, where whitespace at the beginning or the end of the value is actually meaningful for reasons you (creator of the app) have no control over.

> Why should JSON never be used for configuration

Many reasons, actually, but the most important (IMO) being that original JSON specification doesn't support comments, nor most actual parser implementations do. Configuration file that doesn't support comments is trash and causes very real inconvenience for users. Using additional key values for comments (even if such atrocity doesn't bother you conceptually) isn't a solution in many cases (for example, when your intention is to comment out a list item).

reificator · on Oct 4, 2020

> original JSON specification doesn't support comments

Just for the record, JSON initially had comments and they were later removed according to Douglas Crockford.

zelphirkalt · on Oct 5, 2020

It is only now, that I realize, that [<whitespace>a] and [a<whitespace>] are really supposed to be 1 string inside a list. However, I already see problems with this kind of syntax:

[<whitespace>a,<whitespace>b]

How will this be interpreted/parsed? Will there be a whitespace before "b" after parsing? That would mean, that I am not able to separate visually more clearly, by adding a whitespace between list elements, which is widely considered to be a good practice in programming languages. The reason is readability.

The next thing is, that on the same line whitespace is added, but what about multiple lines defining a list? Here we do not add the line breaks and indentation to the string. It's not consistent in this way.

So I personally would never write string like that. I would always make use of quotes in such situations and probably in YAML in general, simply to make it clear, that I do wish to have the leading whitespace in the string, and it is not simply a typo, resulting from removing a former first element from the list.

JSON is limitting, but for configuration I think it's kind of limitations are often good. With comments in JSON I am still not sure, because sometimes I'd like to write them there, but would not like to include another dependency, only to be able to parse away the comments from JSON. Then I better write good docs elsewhere.

0df8dkdf · on Oct 4, 2020

Or you can do it this style, but yeah, good point. I can see now why a lot of apple config files are still XML. https://www.freecodecamp.org/news/json-comment-example-how-t...

user5994461 · on Oct 4, 2020

JSON lacks comments and will fail for a missing or extra comma, so it's not great for configuration written by humans.

You can use HJSON which is the json with comments. It's fully compatible with json so easy to introduce into anything that does json. https://hjson.github.io/

gsliepen · on Oct 4, 2020

Failing with an extra comma also makes it harder than necessary to write JSON by machines.

westurner · on Oct 5, 2020

JSON5 also supports comments and multiline strings with `\`-escaped newlines: https://json5.org/

Triple-quoted multiline strings like HJSON would be great, too.

dzhiurgis · on Oct 5, 2020

I'd rather take "extra comma" failure than "extra space" space failure. First one can be caught by any IDE, second one will take you a couple of minutes to find out (when building CI for example).

reificator · on Oct 4, 2020

[<whitespace>a] could be a markdown string that begins with code.

westurner · on Oct 5, 2020

From "The description of YAML in the README is inaccurate" https://github.com/KenKundert/nestedtext/issues/10 :

> I will mention something else. The section about the "Norway problem" is not quite accurate. Some YAML loaders do in fact load no as false. These are usually YAML 1.1 loaders. YAML 1.2's default schema is the same as JSON's (only true, false, 'null and numbers are non-strings).

> Any YAML loader is free to use any schema it wants. That is, no loader is required to to load no as false. Good loaders should support multiple schemas and custom schemas. The Norway problem isn't technically a YAML problem but a schema problem.

> imho, YAML's biggest failing to date is not making things like this clear enough to the community.

> Note: PyYAML has a BaseLoader schema that loads all scalar values as strings.

msla · on Oct 4, 2020

I have an even stupider question: How do I have a key with a colon at the end?

    this:: right
    this: left

Will my program be able to tell its right from its left?

simonh · on Oct 4, 2020

> I'll start with the most basic ones (using [] as a wrapper/delimiter)

So far as I can tell, this doesn’t use [ or ] in its syntax at all, so all the values you give would be represented exactly as strings without any problem.

reificator · on Oct 4, 2020

> So far as I can tell, this doesn’t use [ or ] in its syntax at all

Yes, that's why the GP chose those characters to delimit their example strings. I'll try again using ` characters to delimit the example strings.

> so all the values you give would be represented exactly as strings without any problem.

But what would those string representations hold? If I parse the below strings in a javascript context, what would I get?

    ` a`, `a `, `"a"` and `""`

Do I get this?

    " a", "a ", "\"a\"" and "\"\""

Or do I get this?

    "a", "a", "a" and ""

Or do I get something inbetween?

simonh · on Oct 5, 2020

OK, I got it. I still don't see the problem. The examples include strings starting with whitespace (One of the multi-line strings), so that's not a problem. Single line strings start with the first character after the colon and a space. If that happens to be a space character, so be it. Strings are terminated by a newline, so trailing spaces aren't a problem. Quote characters are just characters with no special meaning in this format, so those aren't a problem either. Unless the library implementing this format has a bug, there shouldn't be any issues.

oblio · on Oct 3, 2020

You know what I want? Schemas. And clear error messages.

I want to know beforehand what I can put in a config file and I want a fast and hard failure if what I put in there is not good.

And this should be implemented at the file format parser level, with hooks for apps to add on top of the default behavior, so that every app that implements this format gets these things almost for free.

vlovich123 · on Oct 3, 2020

Haven’t you described cap’n’proto, protobuf, thrift, flatbuffers etc?

I know cap’n’proto also has fantastic support for using the schema for config files. You can just compile any constant as a stand-alone serialized message that you mmap into your code in a safe way. It can’t do complex math and things (at least yet) but you can express lists, dictionaries, and reference other constants, so as a config file replacement I love it. I’ve also found the format to be far more regular and consistent than you get with things like text protobuf (you’re still using the schema language instead of another format)

quietbritishjim · on Oct 4, 2020

> You can just compile any constant as a stand-alone serialized message that you mmap into your code in a safe way.

Are you suggesting using a binary format for your config files? I think most people would find that more trouble than a decent text format.

> ... than you get with things like text protobuf

You can just use protobuf's canonical JSON representation (thought the lack of ability to use comments is annoying).

vlovich123 · on Oct 4, 2020

You store your configuration as plain text in your repository and whatnot. When it comes to deployment you just compile it to a binary file.

Cap’n’proto also has plain text and JSON serialization formats if you really want to have your deployed config file be directly human-editable and deserialize from that. I was just noting a very cool feature of having your config written in cap’n’proto and it’s what Cloudflare uses to maintain a bunch of config internally if I read Kenton’s allusions to it correctly.

vlovich123 · on Oct 4, 2020

Just to be clear, I'm saying you use Cap'n'Proto constants to store your schema: https://capnproto.org/language.html#constants

You can then compile it into whatever format (JSON, plain text, binary) that you want for actually reading it from disk.

LibertyBeta · on Oct 4, 2020

I think the parent is typing to say that the data is stored in a map which is read to a proto, etc. Kinda like what GPRC does over HTTP. Which kinda makese sense. The schema gives you a great idea of what "should be", and the typing/errors/etc are understood by the host language.

pookeh · on Oct 3, 2020

We had that a decade ago. It was called XML and XML Schema. All IDEs support it.

JSON was a huge step backwards in the name of simplicity. And now when we are going to add similar functionality to JSON, something else is going to come out in the name of simplicity (like NestedText).

m463 · on Oct 3, 2020

I think XML sort of failed simplicity.

In a minute I can read and write json from most languages I use.

In the same amount of time, I'm still wondering if I should use a tag or attribute in xml. cdata? expat?

It's not that xml isn't a good technology. It's that it's not appropriate for general use, especially in comparison to simpler alternatives.

vbezhenar · on Oct 4, 2020

If your child node has unique name among its siblings and does not contain nested nodes, then it's an attribute. Otherwise it's an element. Seems pretty obvious to me.

The fundamental issue with XML is its impedance mismatch with common data structures which forces using Object to XML mappers (whether explicitly or implicitly). It's more or less solved with XML Schemas or DTDs, but if you're looking at just XML, you can't tell whether some element is an array or a single node. Thus JSON is better suited for serialization.

quietbritishjim · on Oct 4, 2020

> If your child node has unique name among its siblings and does not contain nested nodes, then it's an attribute. Otherwise it's an element. Seems pretty obvious to me.

That is really not what attributes are for. I feel a bit of a fraud posting that because I'm not an XML expert and so not really clear what they actually are for. (This reenforces the parent's point: you need to be an expert to know what such a fundamental feature is for.) I remember it's something like "something used to help interpret the actual value" e.g. units of measurement. But most of the time, even if it's non-repeating with no children, you're supposed to use elements rather than attributes.

One problem here is that attributes are so much more compact (and so often easier to read) than elements that it's tempting to use them in places where you ought to use an element (and many people over time have given in to that temptation). Another problem is that the distinction between attributes and elements is almost never useful. That was the parent comment's point by the looks of things.

> The fundamental issue with XML is its impedance mismatch with common data structures

That's probably part of it, but I think at least as problematic is that it has many features that most of the time you don't need and don't want to have to care about. Things like CDATA (also mentioned by the parent comment), custom entities, external entities, DTDs (which can be inline in XML files so you need to know all about DTDs to understand XML properly). That's why there are all sorts of weird XML vulnerabilities that JSON doesn't have. Did you know you can make an XML file that reads your /etc/passwd file when it's parsed? That is not an issue with JSON.

sergeykish · on Oct 4, 2020

HTML tag and attribute is markup. Strip it and document would be still legible for a human being. Markup is non human part - presentation, semantic web.

Confusion arise once a human observer is lost.

quietbritishjim · on Oct 4, 2020

Thanks, I found this explanation really helpful, and almost obvious in retrospect (as the best explanations often are!).

I had been thinking that all of these extra features that XML have are just a case of massive overengineering that no one would ever need. In fact it's a case of taking something fundamentally meant for text documents with extra markup, as the name implies, and misapplying it to config files and IPC messages which are just not the original domain at all.

sergeykish · on Oct 5, 2020

Thank you.

I think we should draw on XML strength points. People read articles in browser, not plain text. "Add to cart" is just a POST request with id

    curl -d id=foo

yet we have forms and interactivity. Like in literate programming text and data live together, interactive application like a Smalltalk image.

In XML we can separate data from presentation.

    <?xml-stylesheet type="text/css" href="foo.css"?>
    <?xml-stylesheet type="text/xsl" href="bar.xsl"?>
    <root>...

Machine receives data, human receives application with documentation, builder. That's exactly what we have today except UI can be plugged to any stored document. To good to be true.

I think XML was killed by poor usability. Plain text XML, XHTML and XSLT authoring is not fun.

I am trying to uncover it from DOM perspective [1], so far I like it more than Markdown. XHTML and HTML is just a serialization format. HTML is not a good one [2], [3], [4]. XSLT may have nice GUI or compact syntax like RELAX NG.

[1] http://sergeykish.com/live-pages

[2] http://sergeykish.com/script-style-is-cdata-in-html

[3] http://sergeykish.com/pre-newline-ignored-in-html-test

[4] http://sergeykish.com/content-after-html-appended-to-body-in...

tannhaeuser · on Oct 4, 2020

> Did you know you can make an XML file that reads your /etc/passwd file when it's parsed?

Not only can SGML (but not XML on its own) read /etc/passwd, it can format it into fully-tagged markup and then render it into eg an HTML table. Demonstrating what SGML/XML is actually designed for: encoding and authoring semistructured text. This can't be overstated in discussions like these where use cases for config formats, service payload formats, and actual text authoring are all thrown into the same basket when they shouldn't.

Btw: you can parse and canonicalize this new config file format into markup using the same SGML mechanism you'd be using for CSVs like /etc/passwd, namely short references

Btw2: you can skip/ignore markup declarations in XML, including whole declaration sets (DTDs) since these can be recognized using plain greedy regexpes, though you can't ignore entity declarations when actually used in your XML body text

brabel · on Oct 4, 2020

> you need to be an expert to know what such a fundamental feature is for.

No you don't... the parent commenter explained to you what it's for in a simple and concise manner... you chose to not accept that even though you're not an expert in this, and then complains you need to be an expert to do it?!?

quietbritishjim · on Oct 4, 2020

The parent commenter gave an explanation that, yes, was simple and concise, and also good enough for you to believe it (or you already thought that way). But it's also wrong. That just reinforces my point.

(The true difference is explained in sibling comments to yours, by sergeykish and tannhaeuser, if you're interested.)

gortok · on Oct 4, 2020

The parent commenter explained it in a somewhat obtuse way.

I don’t doubt they meant to be clear, but reading it they were not and raised more questions than were answered.

As an example:

Wouldnt attributes be better served as details about the current element?

Wouldn’t elements be better served as “I am a child of the parent”?

Why would I use an attribute as a “non-repeating child” when semantically that doesn’t make sense when looking at the document? The attribute is inside the element’s definition, and seems to me attributes should be used to further describe the element being presented itself, and not be structural or describe itself as a child in any way.

atombender · on Oct 4, 2020

JSON Schema [1] is actually a mature standard now, with decent tooling support, mostly through OpenAPI (formerly Swagger), which extends it with support for endpoints.

It's much simpler to use than XML Schemas, and arguably results in cleaner data models, since it doesn't have anything analogous to XML namespaces that allow for arbitrary mixing of schemas.

[1] https://json-schema.org/

Netch · on Oct 4, 2020

> We had that a decade ago. It was called XML and XML Schema.

It would be true if XML was not full of all this SGML debris like "entities" (really, uncontroller macros), if real schema formats was flexible enough (I needed <c> inside <a> and <c> inside <b> when they totally different), etc.

But when a config reader tool has to deal with 40+-year legacy of enterprise guys wanting to embrace the universe, but all this doesn't allow to control contents without external measures like regexp checking... that simply shuts up facing real world.

thaumaturgy · on Oct 3, 2020

Magento is a popular codebase that made XML-based configuration a fundamental part of its architecture. The results were terrible and caused numerous headaches and countless hours lost to trying to troubleshoot inscrutable configuration issues. The Magento 2 codebase began a shift away from XML for configuration, although it still uses some.

There may be room for an argument that Magento did XML badly (it did many things badly), but I don't believe I've ever seen XML done well.

vbezhenar · on Oct 4, 2020

I love XML configuration in Spring.

imtringued · on Oct 6, 2020

I don't get it. The @Configuration and @Bean annotations are at least 100 times more readable and powerful than whatever garbage people used to write into their xml files to define beans. 20 lines of xml are often equivalent to like 8 lines of Java and each of those Java lines is shorter than the xml equivalent. Repeating closing tags is not very interesting.

kanox · on Oct 4, 2020

We have it today for JSON, it's called JSON Schema and many IDEs support it.

tda · on Oct 4, 2020

Exactly, jsonschema allows one to describe exactly how the json should look like including inter field validation. And with tools like reactjsonschemaform you can generate a ui on top of it for free.

irrational · on Oct 4, 2020

I spent years working with xml, xslt, xml schema. Frankly when I first saw json I thought it was terrific. Nothing has changed my mind since. Why do you feel like it is a huge step backwards?

jeffbee · on Oct 4, 2020

XML is fatally flawed because you can't safely put one XML doc inside another one. Because of this rather fundamental problem, it never was any good for anything, and it never will be.

magicalhippo · on Oct 4, 2020

Sure you can. At work we talk to a system that requires that we do exactly this. The solution they chose is entirely trivial and safe: include the embedded doc as a base64 encoded string...

And yes, I'm being sarcastic.

vbezhenar · on Oct 4, 2020

I don't understand why can't you safely put one XML doc inside another one? Many XML formats are literally built using this feature, like SOAP.

jeffbee · on Oct 4, 2020

SOAP was and is an epic disaster, so that hardly seems like a refutation. The known way to embed an entire XML doc into a SOAP message was to use CDATA, which isn't a general solution because it means the embedded doc can't have ]]> in it anywhere. You could also base64-encoded the included doc.

Both of these solutions and all other known solutions to this problem are, as I'm sure you can see, just awful.

You can't just paste XML in XML because of the <?xml?> thing, because of entities, and because of half a dozen other misfeatures of XML.

jiggawatts · on Oct 4, 2020

You miss the point entirely.

You put XML fragments inside a parent XML document using namespaces.

This is very well supported, and used extensively.

Trying to "escape" XML to nest it in a parent XML document is Wrong with a capital W.

uryga · on Oct 4, 2020

> You put XML fragments inside a parent XML document using namespaces.

could you post or link to an example? i'm not very familiar with advanced XML features

or for a simple example: what would it look like to put `child` into `parent` using namespaces?

  # parent-doc.xml
  <parent>
    <!-- embed here -->
  </parent>

  # child-doc.xml
  <child x=3 y=5/>

jiggawatts · on Oct 4, 2020

Roughly speaking, you can do things like the following:

    <!-- The special XMLNS attribute binds a short alias to a long name -->
    <p:parent xmlns:p="urn:some:unique:string">
        <c:child xmlns:c="urn:some:other:child:name" x=3 y=5>
            <c:subchild> <!-- No need to repeat the fully qualified unique name -->
                <p:tada>You can even interleave!</p:tada>
            </c:subchild>
        </c:child>
     </p:parent>

Note that while this is possible to write by hand, typically namespaces are for documents generated and processed by tools. The XML Schema Definition (XSD) format has full support for namespaces, so you can define documents based on modular chunks. E.g.: you can "import" the SVG namespace into a diagramming XML document format namespace, but restrict its usage to only the child nodes of an "img" tag. Or MathML as the children of "graph" nodes. Both SVG and MathML can potentially import a shared "font" namespace. Or whatever.

In the XML Reader API, each element has a "fully qualified" name that includes the long namespace prefix. If you use the API correctly, your tool can handle nested documents, or gracefully ignore them if it's appropriate.

The fiddly part is making this efficient, i.e.: avoiding a full string comparison against a long URI or URN. You typically have to "register" the namespaces you're interested in, and the API gives you some sort of efficient token instead of a string to use from then on.

I'm not saying it's perfect. Nothing is in XML. It was designed by committee, it brought too much of the legacy SGML baggage with it, but its namespace capabilities are a lot better than nothing at all, in much the same way that C# or Java don't have perfect type systems, but they're superior to loosely typed languages.

sergeykish · on Oct 4, 2020

You don't embed plain text XML in CDATA, right? You escape it

    function escapeXml(unsafe) {
        return unsafe.replace(/[<>&'"]/g, function (c) {
            switch (c) {
                case '<': return '&lt;';
                case '>': return '&gt;';
                case '&': return '&amp;';
                case '\'': return '&apos;';
                case '"': return '&quot;';
            }
        });
    }

Or you convert to the same encoding, strip XML declaration, expand entities. In short work with adequate tools.

malodyets · on Oct 4, 2020

Good for nothing?

Well, except for handling complex content documents like in all ebooks and, in sgml form, all webpages like this one.

tootie · on Oct 4, 2020

XInclude works pretty well.

rudolph9 · on Oct 4, 2020

Check out https://cuelang.org/

verdverm · on Oct 4, 2020

Came here to say the same, Cuelang is by far the best config system and paradigm I have tried. All else seems so last century, though Cuelang has its foundation in NLP systems from last century :]

vitiral · on Oct 6, 2020

Never seen this, it's awesome! Might be an improvement over jsonnet, which was my favorite approach

fn1 · on Oct 4, 2020

Slightly off-topic, but yes, having fail-fast deserialisation is great.

I wrote a json/kotlin-serialisation library once and purposely restricted some json-features to achieve that:

1. Fields can arrive in any order - this is standard

2. Field names are matched case-insensitively - so keyA and keya are the same, because who would use two variables differing only by case. Serialization keeps the original casing of the name.

3. Missing fields throw an error. if they are nullable, they have to be explicitly set to null - so that you can be sure the serialization side upgraded to the latest version of a protocol if a field was added, and things don't just work by chance.

4. Nullable strings are not coerced to empty strings or anything like it. Kotlin is null-safe, so if it's a string, it has to be "". If it's, for whatever reason, a nullable string, you can set it to null.

5. Enums are also serialized case-insensitively - so you an write "keyA": "eNumVaLuE" if you want - typos should not break the code here, no on would you two enums differing only by case. IIRC booleans could also be TRUE, tRuE, truE etc. (but NOT t or f, or yes or no, or 0 or 1 or empty).

6. Superfluous properties are silently ignored.

These rules were a great tradeoff for quick development, mixing languages and having fail-fast behavior with a stable protocol.

(https://medium.com/@fabianzeindl/generated-json-serialisatio...)

michaelmior · on Oct 3, 2020

JSON schemas are available for a number of JSON/YML config formats from JSON Schema Store[0]

[0] https://www.schemastore.org/json/

RobIII · on Oct 3, 2020

> You know what I want? Schemas.

I can see this work perfectly fine in typed languages like C#: `NestedText.Deserialize<T>("nestedtext")` where the deserialize method handles the actual mapping of nested text objects to `T` by providing the deserializer a class / classes that handles the string -> scalar(s) mapping for the given T. That would, sort of, function as a Schema.

I think the only thing, from glancing over the project, that would need to be supported to make this really useful is nested lists/dictionaries. I don't see how this can be done but maybe I'm missing it.

setr · on Oct 4, 2020

You can always do that, defining the schema in the client to produce sensible checks, even with JSON. The problem is that wherever the spec is underspecified is another place where two different clients can deserialize differently, and both be correct.

And the problem with stringly typed systems is that everything is underspecified

zmj · on Oct 3, 2020

Protobufs have a text representation.

andrewg · on Oct 3, 2020

Yes indeed - it's actually pretty nice. You just define a message for your configuration schema:

  message Config {
    repeated Server server = 1;
  }

  message Server {
    string address = 1;
    int32 port = 2;
    bool standby = 3;
  }

And then you use the text representation in a config file:

  # main instance
  server { address: "127.0.0.1" port: 4567 }
  # backup instance
  server { address: "127.0.0.1" port: 9876 standby: true }

And load it into a message instance:

  Config config;
  google::protobuf::TextFormat::ParseFromString(input, &config);

atombender · on Oct 4, 2020

Unfortunately, it is undocumented and has no formal spec, and this appears to be intentional, with no plans for improvement: https://github.com/protocolbuffers/protobuf/issues/3755.

kortex · on Oct 4, 2020

Wow, I use pb's a ton and didn't know this. I'd upvote this twice if I could!

It looks oddly like HCL. I wonder...

quietbritishjim · on Oct 4, 2020

As of protobuf 3 they also have a canonical JSON representation, which you can access from all the supported languages.

tootie · on Oct 4, 2020

You want XML from 15 years ago? Yes, me too. Schemas and includes.

oblio · on Oct 4, 2020

I've used XML. I don't want namespaces, I don't want the verbosity, I don't want entities, I don't want the security vulnerabilities.

I should have mentioned that I want something simple and readable.

already_exists · on Oct 3, 2020

Like in Windows where you configure by clicking check boxes that can get disabled if invalid, with tooltips explaining what they do, additional help if you press F1, etc. ?

It would be nice if we had such tools.

pas · on Oct 3, 2020

There's JSONSchema, and there are GUIs for handling/inspecting them.

eyelidlessness · on Oct 4, 2020

This seems on its face to be a significant improvement on the goals of YAML, but I think the tradeoffs it makes will likely move YAML’s problems into a different place, creating a whole different set of difficulties understanding what a given data is, means, or does.

The problem with human friendly formats is that the thing that typically makes them human friendly is removing things that make reading and editing difficult, but make disambiguation possible. If the format ever needs to be read by a machine, something has to do that disambiguation.

If it’s not provided by the format, you’ve turned every usage into a potential source of bugs that would otherwise be restricted to interchange/stack implementation incompatibilities. In other words, now your format can have a different set of expectations even on the same system.

The natural response to that problem will be to bolt on validation, types, and documentation that is provided arbitrarily (and with varying quality).

IMO, efforts in human friendly formats should focus less on stripping out funny characters, and more on which minimal set of funny characters provide:

- Good readability

- Good editability

- Clarity of structure

- Clarity of data types

- Reasonable tolerance and flexibility for variance in arbitrary formatting/style preference (particularly in delimiting long form/multiline text and annotations), because no one can agree what good readability or editability means

- A flexible type system that allows machines and humans to know what a given datum is without variation or surprises

- Maybe humans should just use a GUI?

mistercow · on Oct 4, 2020

I generally find that the biggest problem with human friendly formats like YAML, which I think this also has, is that they tend to decouple readability from writability, and this encourages all sorts of complexity and polymorphism that seem superficially expressive, but end up just being difficult to work with. I've seen so many cases where YAML schemas turn into a quasi-DSL, because the developer thought that it was more important to have a clean looking configuration than one that is easy to edit. The result is that things like indentation get really weird, because the developer didn't optimize for having a sane underlying model.

A great comparison for this is CircleCI's config syntax and that used by GitHub Actions. The Circle format is extremely error prone; about half the time when I'm modifying a Circle config, I'll end up pushing a broken config, even though the YAML syntax itself is valid. With the GitHub Actions format, I almost never screw it up. I don't think it's a coincidence that if you convert a Circle configuration to JSON, it looks twisted and bizarre, whereas if you do the same with a GHA config, it looks perfectly ordinary and sensible.

If you think of YAML as "a prettier version of JSON", and design as if your users will work primarily with JSON, you can do fine with it. If you think of it as a medium for building your own configuration language, you'll make something awful. The problem is that any human friendly format is going to inherently encourage the latter.

mikepurvis · on Oct 4, 2020

See also the travesty that is Ansible's YAML-based DSL, which includes fun stuff like an in-line replacement language with tokens enclosed in braces, which of course you have to quote in some cases so that pyyaml doesn't think they are dicts.

kstenerud · on Oct 4, 2020

This is basically my goal with https://concise-encoding.org

Also one more goal is twin binary and text formats that are 1:1 compatible, so that you can write it in text and transmit in binary.

I'm still finishing up the reference implementation, and then will start on the schema.

eecc · on Oct 4, 2020

I’m getting fed-up with this constant reinvention of the serialization game. Back in the days I used to be skeptical of IT and informatics precisely for this reason: always arguing about slight variations of the same mundanities: XML, XSD, IDL, ASN.1, Avro, JSON... Emacs vs. Vim, Weakly vs. Strongly typed, and so on...

What can an ICT professional claim at the end of its career? “Hey, I’ve argued about shit all my life!”

RedShift1 · on Oct 4, 2020

I feel the same way. I've settled on JSON everywhere. The only thing I don't have in my toolkit is an easy binary ser/deser format.

IshKebab · on Oct 4, 2020

Also comments, schemas, hashmaps with anything other than string keys, and sparse reads.

Wilem82 · on Oct 4, 2020

Your statement implies that arguing is something bad. Arguing is presenting arguments that attempt to show (prove) why something is true or false. It's the most important tool (convincing each other) in the progress of civilization. If you don't try to convince each other, the only alternative, in the end, is just shoot whoever you disagree with.

And it also diminishes your value as a team member. If you can't convince others, means the reasons you present are weak and nobody would be interested in listening to you and therefore there's not much reason in having you around.

eecc · on Oct 4, 2020

> And it also diminishes your value as a team member. If you can't convince others, means the reasons you present are weak and nobody would be interested in listening to you and therefore there's not much reason in having you around

Jeez, you extrapolated a personal observation all the way to a character assassination and firing letter paragraph.

Standups and 1on1 with you must be a blessing... a joy

Wilem82 · on Oct 4, 2020

If you think what I said is wrong, you're welcome to explain why. Personal attacks are neither productive nor interesting.

My reasoning is the following:

People would only listen to you if you can prove what you say is right, because nobody is interested in hearing wrong things or unexplained things, they just aren't helpful.

"We should use nodejs!" "Why?" "I don't want to argue, we just should." Is that helpful?

If you don't have the reasoning skills to convince others, you can't present constructive ideas and back them up with an explanation.

If you can't do that, literally, what is your value to the team? Blindly and quietly execute the will of other team members? That would take too much energy from those people, to direct you on every step of the way.

When you hire engineers, you expect them to give more than they take, otherwise they're a drain on the team resources.

Collective problem solving is impossible without arguing. Arguing is trying to improve something, identify mistakes, logical contradictions, basically you're doing the work of a compiler that checks your program for correctness. Would you want a compiler that always agreed with you, whatever you fed into it? Don't think so. Same thing with engineers working together to reach a common goal. You're checking and improving each other's ideas.

Edit: moreover, if you lack reasoning skills to convince others, means you lack reasoning skills themselves. How are you going to solve problems in the first place?

eecc · on Oct 5, 2020

Can you please stop attacking me.

Please.

Thankfully we’ll never work together so can we just continue our existences as we did before, blissfully unaware of each other?

divbzero · on Oct 4, 2020

Presenting arguments isn’t bad if it’s about something important, but I think by “Hey, I’ve argued about shit all my life!” the GP takes issue more with the “shit” than the arguing.

Wilem82 · on Oct 4, 2020

It's explicitly stated in their comment:

> arguing about slight variations of the same mundanities: XML, XSD, IDL, ASN.1, Avro, JSON... Emacs vs. Vim, Weakly vs. Strongly typed, and so on...

There could be a lot of valid and important arguments around these things. Except maybe vim and emacs, who gives a shit about that.

amelius · on Oct 4, 2020

Yes, when coding always ask yourself how you would tell your grandchildren about what you accomplished.

spullara · on Oct 3, 2020

I don't like any machine readable format that doesn't have some indicator that it is a complete document (like JSON or XML does). I've had a production issue where a format like this one was used where the file was read before it was finished writing it and ended up with a corrupt configuration as half of it was elided without any way to know.

kortex · on Oct 4, 2020

This is the best criticism of this entire thread.

It'd be easy to employ ... as a document separator / end indicator that could be checked for.

kzrdude · on Oct 4, 2020

You could add that yourself - so an application specific marker, using this format.

alexchamberlain · on Oct 3, 2020

> For example, in JSON 32 is an integer, 32.0 is the real version of 32, and “32” is the string version. These distinctions are not meaningful and can be confusing to non-programmers.

I'm really struggling with this assertion; IMHO one of the problems with JSON is the lack of more sophisticated scalar types.

That being said, this appears extremely readable, so my concerns could definitely be alleviated by a decent schema.

kevincox · on Oct 4, 2020

This looks fantastic. I was recently reading a config file from Rust and ended up going with JSON5 but this is simpler to read. In a language like Rust you don't need the types because you specify that in the struct anyways. Sure, this means that there are effectively more types than strings but the user doesn't need to differentiate.

In Rust you do something like this:

  #[derive(Clone,serde::Deserialize)]
  struct {
    an_int: u64,
    a_float: float,
    ordered_map: linked_hash_map::LinkedHashMap<String, chrono::DateTime<chrono::Utc>>,
    unordered_map: std::collections::HashMap<i32, String>,
  }

So there is no need to worry about if maps are ordered or if a value is an integer, real or string in the format itself. Ironically for Python (which the reference implementation is in) it does seem much more annoying to have to manually call `int()` on each element.

I'm just a little sad that tabs are disallowed. I really think the best rule for indentation-sensitive languages is that each line must either have the same indentation in which case it is the same level, same indentation plus any amount in which case it is the next level, or the exact indentation of any previous level in which case it is a dedent. These "solutions" which just forbid tabs are half-assed and ones that try to convert tabs to a set amount of spaces just lead to confusion.

Additionally it would be nice if there was an example of a dict inside a list. I think it would work like the following but can't confirm from reading the site.

  -
    key: value
  -
    key: value
    other-key: other-value

kseistrup · on Oct 4, 2020

Correct. You'll get:

    [{'key': 'value'}, {'key': 'value', 'other-key': 'other-value'}]

Animats · on Oct 4, 2020

Maybe what we need is a widely used UI for trees, and editors for them. The editor reads a schema and tells you what blanks to fill in. Export XML, JSON, S-expressions, whatever - any tree structure.

The trouble is, open source cannot do good GUIs. If a problem is best expressed with a GUI, open source consistently blows it. See Gimp, Blender, Inkscape, FreeCAD, all of which are notably worse than their commercial competitors.

undergrowth54 · on Oct 4, 2020

Proposal for someone with more free time than I:

Improve the format of org-mode. Make it primarily easy for humans to interact with but also easy for cheap scripts to parse and manipulate. Create a super-fast CLI for it which ships with the ability to read keybindings from a file. Ship with emacs keybindings as a default but also a file with the spacemacs keybindings. Add the ability to run the CLI as a daemon that can be started from neovim.

Keep the format open so someone else can write some npm package for including an editor in VSCode or a webapp.

saghm · on Oct 4, 2020

> The trouble is, open source cannot do good GUIs. If a problem is best expressed with a GUI, open source consistently blows it.

What about browsers? Firefox and Chromium are open source (and fit under any reasonable definition of "GUI")

snarfy · on Oct 4, 2020

> Indentation is used to indicate the hierarchy of the data

After dealing with 500+ line kubernetes configurations, this is a bad idea.

Chakazul · on Oct 4, 2020

Indentation by tab is good for shallow data or code e.g. Python. Space is bad. For deeper levels, I'd go for a visible, countable symbol Level 1 .Level 2 ..Level 3

yunruse · on Oct 4, 2020

This format seems to support one-space indents, so you could use an editor which highlights spaces easily and set indent level to 1 for visibility. The alternative is unfriendly config or another method of doing the same thing, and both are antithetical to having a simple data format.

baq · on Oct 3, 2020

I’m not sure the world needs another sexpr/sgml isomorphism, though it does look pleasing to the eye for the given examples. What this doesn’t solve though: yaml+jinja use cases, code as text, schemas outside of the document, everything else that makes a language out of a syntax tree.

jarym · on Oct 3, 2020

> With NestedText any decisions about how to interpret the leaf values are passed to the end application, which is the only place where they can be made knowledgeably. The assumption is that the end application knows that Enrolled should be a Boolean and knows how to convert ‘NO’ to False.

That assumption is... not applicable to most scenarios I come across and will likely lead to issues being pushed downstream and introduction of subtle bugs and defects.

liversage · on Oct 3, 2020

The special interpretation of NO in YAML has led to bugs and defects:

> I once disabled our product for the entire country of Norway for a day because `NO` in YAML evaluates to `false`

https://twitter.com/aarondjents/status/1307692593493553160

pas · on Oct 3, 2020

It'll lead to incompatible NestedText codecs/serdes :(

There should be at least some support for some standardized representations (basically JSON + ISO8601 datetime + some encoding for embedding whatever stringified serialization, eg. just how HTTP uses chunks and unique boundary tokens).

bxparks · on Oct 4, 2020

I have had great experience using JSONNET (https://jsonnet.org/) as a configuration language. It supports variables, inheritance, operators, functions, substitutions, types, with just the right amount of power, expressiveness, and simplicity.

In my opinion, JSON is best used as a wire-protocol. It is awkward as a configuration language.

YAML works for short configs, but becomes unmaintainable for longer configs. I think the primary problem is that the indentation is significant. I also think the language spec is far too complex.

INI format works for short configs, but also becomes unmaintainable for longer configs. Ironically I think this is because INI is too primitive, the opposite problem of YAML, but has the same effect.

I am not familiar with TOML or DHALL, mostly because I stopped looking after I implemented the JSONNET system and liked it so much.

Addendum: I have used text-formatted protobufs in limited situations with good results. But I don't think that protobufs is a good general purpose configuration language.

Addendum2: The amazing thing about simplicity of the INI file format is that I was able to write a "single line" sed program to parse it in a bash script. The following finds the value of the $key in the $section in the $config_file INI file (definitely works on GNU sed, I think it works on MacOS sed too, not 100% sure though):

    sed -n -E -e \
        ":label_s;
        /^\[$section\]/ {                                                       
            n;                                                                  
            :label_k;                                                           
            /^ *$key *=/ {                                                      
                s/[^=]*= *//; p; q;                                             
            };                                                                  
            /^\[.*\]/ b label_s;                                                
            n;
            b label_k;
        }" \
        "$config_file"

thangalin · on Oct 4, 2020

> I think the primary problem is that the indentation is significant.

An editor problem, perhaps? We don't maintain office documents using vim; why edit structured configuration files using a plain text editor, if doing so is arduous?

My text editor[0] abstracts the underlying hierarchical data format behind a tree-based widget[1]. Whether YAML, JSONNET, NestedText, CSON, XML, or TOML backs the widget becomes an implementation detail.

[0]: https://github.com/DaveJarvis/keenwrite

[1]: https://dave.autonoma.ca/blog/2019/07/06/typesetting-markdow...

xixixao · on Oct 3, 2020

This actually looks great. Many people are complaining about the lack of data types, but most of the time you

a) Do not have values of different types occupy the same fields

b) Have a schema defined (explicitly or implicitly as part of the parsing), especially because you're likely working with a type system

This isn't good if you want `x = parseJSON(blob)` kind of API, but that's definitely not what you want for any kind of human-editable config.

It seems simpler than TOML, I'd give it a try.

rswail · on Oct 4, 2020

JSON is a serialization protocol, not a configuration syntax. It's designed to be written/read by machines. It's convenience for humans is that it can be relatively easily be read or written by humans as well.

Protobufs is similarly a serialization protocol combined with an RPC layer for server and client stubs. Its canonical serialization format is binary.

Neither are particularly well suited for human written configuration files. A "JSON without the quotes around keys and allowing Javascript comments and commas at the end of objects or lists and a mechanism to escape multi-line strings" would probably cover most of the required cases.

YAML is an attempt at that, but also attempts to solve a bunch of other problems in a complicated and fault-inducing way (eg, relying on indentation for hierarchy).

delhanty · on Oct 4, 2020

This is an interesting idea. I don't know whether it will take off.

One thought that occurred to me several times in the last year or so is that roughly the level of abstraction offered by NestedText might make sense as part of a hierarchy of abstractions that could be built on.

We already have that with text files. Because end-of-line character combinations are special, text files in a given encoding are already more structured than streams of characters.

So, assuming UTF-8 character encoding:

  CharacterStream
    Text (EOL character combinations)
      NestedText
        MyNestedTextFormat (domain specific semantics)

With non-text files, this has already happened more than once. For example, both Zip files and Sqlite files are used as base formats for specifying other formats.

rswail · on Oct 4, 2020

ASCII has always had separator and message delineation characters [1][2] that can be used instead of the CR, LF, SP, TAB, "|", "," etc. They are not "human readable", but are very easy to parse.

They are even UTF-8 transparent and can be easily converted to/from CSV, TSV, PSV, and, with a definition of the equivalent of a "close brace" could allow for a multi-level hierarchy.

[1] https://en.wikipedia.org/wiki/Control_character#Data_structu...

[2] https://en.wikipedia.org/wiki/C0_and_C1_control_codes

samatman · on Oct 4, 2020

Yep.

We were recently storing tokens in a database, and I chose to use SOH for the metadata and SOX for the text.

One byte width, no collisions with printable text, and that's what they're there for.

I'd love to see a CSV replacement that used SOH ... SOX for headers, RS as "commas", and GS as "newlines". You'd be able to cleanly concatenate multiple files, since the first line is no longer special, and you'd be able to have commas, newlines, and in fact any printable text whatsoever inside the data.

And the semantics are perfectly clean. Again, that's what they're there for. Some small challenges for hand-editing that a competent text editor can easily rise to.

mongol · on Oct 4, 2020

Me too. I think this could be achieved. But there would need to be good editor support to make it successful. And good keyboard support too.

delhanty · on Oct 4, 2020

There's a lot of value to the ecosystem here if it could be standardized - possibly it should be an RFC.

Combined with what rswail wrote about encoding hierarchies, with careful design, those CSV sections could be embedded as tables.

If that was used as a base format for other formats, then objection that encoding for boolean and numerics wasn't standardized might go away.

mongol · on Oct 4, 2020

Looking closer at it.. I think US would be commas, and RS the newlines? Leaves the imagination what GS could be used for..

samatman · on Oct 4, 2020

Hmm I think it would be nice to reserve US for when you have multiple entries in a single column.

Like if the column was phone numbers and occasionally there's more than one, that sort of thing. Thinking of each cell as a "record" and allowing it to have more than one "unit" makes sense to me.

But anything would be better than ever having to whip up a script to fix a CSV with comma-separated dollar values in it, ever again.

mongol · on Oct 4, 2020

That is an interesting thought. Perhaps it is possible to arrange a type of nested structuring when this is needed. Like a CSV inside a value. C for "control code separated" of course :-)

Very thought inducing... I think the main impediment is that these characters are not visible and not so easy to type. If they were, we might not have got the number of CSV variants that have evolved.

samatman · on Oct 4, 2020

Yes, the challenge is editor support.

What I'd want is an emacs special mode, that displays RS as a red* comma, GS as a simple newline, US as a red semicolon, and regular newline as a red "\n".

Comma, newline, and semicolon insert the control characters, while M-, etc insert the literal characters. Not sure exactly how to handle header lines but this is the general premise.

*red as in "whatever method of visually distinguishing them as special works for you"

delhanty · on Oct 4, 2020

I don't know whether that's an "aside" comment, but it's interesting and informative anyway!

Another variant of the idea that I was mulling over was to base a roughly NestedText level of abstraction on UTF-8 transparent characters, and then combine that with what Animats was talking about as a standardized GUI for trees, dictionaries etc.

A recent trend is that programming languages have more than one bijectively equivalent syntax. For example, ReasonML [0] and OCaml are two bijectively text based syntaxes. That idea could be extended to a NestedText like syntax being bijectively equivalent to a text based syntax. Editors like Visual Studio Code infer that sort of information continually on the fly, but it sort of gets lost in the toolchain. Compilers could operate at a higher-level of abstraction than lexing/scanning. Git merge might also work better if could operate at a NestedText like level of abstraction.

[0] https://en.wikipedia.org/wiki/Reason_(syntax_extension_for_O...

malodyets · on Oct 4, 2020

Every time we have a conversation that touches on any serialization or configuration file format, it doesn’t take long for someone to pull out the flail and start beating on XML again.

XML might not be the best format for everything, and I for one am glad to use other formats for simple structured data. But when it comes to representing complex content, there is no other format that even comes close to being as useful and usable.

* All digital publishing of ebooks uses XML inside ZIP files.

* All contemporary mainstream word processors (Word, LibreOffice) use XML inside of ZIP as the basic file format.

* Automatic customized conversion processes from Word to InDesign or from InDesign to EPUB use XML at the heart.

* Let’s not forget the web itself, which is still mostly SGML in the form of HTML. Not XML per se, but only different in the details.

Not only is XML the only practical serialization format for working with publication content, but the presence of mature schema tooling is intrinsic to making publication automation robust in a given context.

I’m very glad for JSON and JSON schema in the domain of APIs. But in the domain of content data, it’s all XML.

Every serialization format has a domain for which it is most appropriate (whether or not it is the best choice in that domain.)

I’m really liking the shape of nested text for the domains in which I would have used YAML.

MildlySerious · on Oct 3, 2020

Specifically, lines that begin with a word or words followed by a colon are dictionary items; a dash introduces list items, and a leading greater-than symbol signifies a line in a multiline string. Dictionaries and lists are used for nesting, the leaf values are always strings.

No doubts that there are use cases for this, but calling something that casts everything to strings an alternative to the above formats seems like a bit of a stretch.

xorcist · on Oct 3, 2020

It's not a bad idea, as text file formats are all strings anyway, to let the application do the conversion. It's the one with the domain knowledge anyway. Perhaps it could do with a companion library to specify the data format and emit parse errors as appropriate. But to keep that out of the syntax makes a lot more sense than the insanity that is yaml.

dagw · on Oct 3, 2020

I've been using toml for my latest project and the one thing that really bitten me is that [1,2,3] is a valid array [1.1, 1.2, 1.3] is a valid array, but [1, 1.5, 2] isn't valid and throws an exception due to it having heterogeneous types.

sepples · on Oct 3, 2020

This is something that was fixed in the forthcoming TOML 1.0 spec. Parsers that have been updated to support it will allow heterogeneous types in arrays, which your application can convert to a vector of floats or whatever collection type it uses internally.

dagw · on Oct 4, 2020

Fantastic. So far that has been literally my only complaint about TOML.

bArray · on Oct 4, 2020

Why choose this over the other options? Just some thoughts:

1. I like that comments are part of the standard. I wrote my own C++ JSON parser that allows for comments too.

2. Is it strict about indentation? One thing you can never get programmers to do on significantly sized teams is consistent indentation. Is that tab+space? Or spacex5? Is it going to break if a tab sneaks into Git? (Setting up Git push rules just annoys and confuses people.)

3. "without the syntatic clutter of JSON" - I happen to like it. I can compact it quite far if I need to. I also like the fact I can spit it out over a debug server and JS will just magically start reading it.

4. Something really cool would have been the introduction of typed data. One way we achieve this via JSON is to create a template file which would declare something like (in a file named 'template.json' or something):

    {
      "data" : { "type": "float", "default": 0.0, "min": -1.0, "max": 1.0 }
    }

Obviously this requires checking in the code, but it does build up some kind of format checking and sanity. It can also warn you that it's using a default rather than a config defined value.

It would be nice if there was then the ability to define type syntax... But I fear this might be going too far.

5. Another thing I do with JSON is inheritance. So you define a 'parent' property at the top of a file, the values are loaded from the parent and then the child loads theirs over the top. Why have this? We usually need some per-application configuration but mostly it stays the same. It saves having to write it multiple times. You can even break it down into sections to keep each configuration file smaller.

(NOTE: For inheritance, a top tip is to implement "maximum depth", encase you get into a loop.)

Hurtak · on Oct 4, 2020

For second point: most of the languages have linters that can enforce unified indentation, also there is https://editorconfig.org/ standard

miki123211 · on Oct 4, 2020

One cool advantage this format has that is not mentioned anywhere is its potential for localization.

If everything is a string, and you need to parse values yourself, you can tread "prawda" and "fałsz" as booleans, instead of "true" and "false".

marvinblum · on Oct 4, 2020

How is that useful?

miki123211 · on Oct 4, 2020

It is, when you're accepting data from users, instead of data from other programs.

An end user with no programming experience could reasonably understand a file written in this format. This can't be said about json or yaml, not when they don't know what "true" and "false" mean.