This critique made me learn some new things about both formats, but I actually come out more in support of TOML because of it
Each argument the author gives is a strong preferrence for everything being implicit, ambiguous, free of heavy syntax like quotes for strings and brackets for arrays
That would be a reasonable subjective choice, but when I see the INI exemples used to illustrate, or the slightly out there assertion that a 1 element list is a completely meaningless inhuman concept, I'm not really swayed. Sometimes it almost seems like irony. The exemples are really hard to take at face value..
If anything, this pushes me further away from that. I've done YAML. I've done languages where everything is helpfully implicitly cast, and all the "WTF talks" that result from the weird rules and edge cases.
We know what it's like when DE is a string but NO is a boolean, and only some version numbers are strings while some are floats.
Quoting strings is really not what costs me time and effort. It's weird edge cases and surprises that come back to bite you.
I can accept Postel's Law as being a great idea for fairly low-impact things like markup languages. XHTML is a good example here: it turns out it wasn't an awesome idea, because if the author of an HTML file forgets to close a tag, I'd rather the browser make a best effort at displaying a document that might be a little janky, than show me nothing at all.
But if we're talking configuration files for applications? No. Absolutely not. If I get anything even slightly off, do not under any circumstances respond by launching the application into an unpredictable state. Fail immediately and tell me why. Same principle applies for RPC messages.
The reductio ad absurdum here is weak typing. If Postel's Law were actually a generally applicable law, then PHP4 would be widely considered to be the pinnacle of language design. I think most people would agree that it's closer to the nadir.
But still... context matters, XHTML was a mistake. Which implies that Postel's Law is true in at least some contexts.
There's still a few nice things about XHTML that I miss.
It's really helpful in debugging and catching mistakes. I'll actually force it on for dev and test systems just to quickly identify errors. I've caught hundreds of issues with templates that way. Sure there are markup validators, but the always on strict was nice. And still usable with XHTML5... So long as the web page is being generated by your code, the strictness is a win I think. And you can turn off strict in browsers by serving XHTML5 as HTML5 with text/html content type.
The responseXML in XHR for XHTML is really nice, and still available though mostly useless. I wish when XHTML was abandoned that a responseParsedDOM was offered to avoid some of the exploitable hacks people came up with instead.
XML transforms using XSL could do some pretty nifty tricks with static docs and no other processors but your browser.
So, yeah, don't feel it was wholly a mistake. Sure for random content or user generated isn't a good idea, and it'd be nice if there were clean ways to handle that (not iframes), but saying that your app shouldn't have a strict rendering is like saying JSON should be forgiving of misplaced braces... If you're feeding bad JSON to your modern JS driven app, well, that's your fault and there should be errors and it should be fixed. Similar for XHTML for your server side app IMO.
Good news: XHTML was never abandoned. It still exists today as an optional serialization format for HTML5. I am using it in practice on my website and described it in great detail: https://www.nayuki.io/page/practical-guide-to-xhtml
The main thing you lose (no idea why XHTML5 doesn't add support for this) is <noscript> is ignored. Obviously if you did any other form of JS detection in a session, you can just use that to offer alternate content.
Eh, even without <noscript>, you could achieve those things by declaring them in the HTML code and then writing a script on the page that hides them immediately.
Interesting. It isn't officially supported. Browsers do seem to honour it though without parsing errors, even as a tag in the header (say for a redirect fallback instead of JS in a handler). Tested Firefox and Chromium - good to know, had had issues there in the past. Thanks!
Postel's law is a way to aid in adoption, not a way to increase correctness.
If Product X accepts malformed input I, but product Y does not, then product X appears to "work better" than product Y and people will adopt X more. (The other half of the law also helps in adoption; if you emit very conservative output, then your output works with everybody else as well, also making your product look better).
If authors of webpages only had access to browsers that implemented strict XHTML, then there would be a lot fewer missing close-tags out there. Things have largely been sorted out now, but for a while it was a case of "I have to be just as good at rendering complete garbage as IE is, or nobody will use my browser" which I hesitate to label as "positive" in any meaningful sense.
Because it's a user agent and as the user I want it to degrade as gracefully as possible. It doesn't serve my interests to refuse to render anything just because the author of the website forgot a </b> tag somewhere. I'd rather read the text just with formatting other than what the author intended, than not read the text at all. Don't punish me for someone else's typo.
By that logic, broken SVGs and the like should also be rendered leniently. That doesn’t make any sense.
If HTML had been strictly schema-validated from the start, nobody would be arguing for this.
It’s certainly true that HTML being parsed leniently helped in it being picked up by amateur website authors in the early days, because they weren’t confronted with error messages (though they were confronted with “why doesn’t this render as I expect” instead). But that has little to do with user expectations by browser users.
Well it seems to work for TCP at least, which is where it comes from. Of course it's not the correct approach for everything, but calling it "one of the crappiest ideas in CS" might be a tad harsh.
EDIT: Of course there are better ways to be robust than to try to just accept whatever garbage is thrown your way because "be liberal in what you accept." So for example since this is about config files, you could easily just tell the user that their stuff is wrong and tell them how to fix it.
Postel's law is for implementations of underspecified standards and for backwards compatibility, the problem is the misguided attempts to somehow use it in new standards.
I understand your points as precisely in the opposite direction to your conclusion.
According to this post, the INI approach says that the code that reads the value determines what type it should be. That approach means that you never get a problem where NO is a boolean when it should have been an enum, or a version number is a float when it should have been a string. You only get those problems when the type of a value is determined from the file without reference to the expected type, like in TOML.
>According to this post, the INI approach says that the code that reads the value determines what type it should be. That approach means that you never get a problem where NO is a boolean when it should have been an enum, or a version number is a float when it should have been a string
That's a simplistic view.
In practice, there soon wont be just one piece of code reading your files, or it will be shared elsewhere, and it will all depend on implicit semantics and documentation of the assumptions (if you're lucky to have it). Hilarity, chaos, and head scratching ensues.
Whereas with a format that enforces the types, every consumer with a parser for the format gets the same values (to the extended that it matters: a list doesn't suddenly become a string, but in some language it might be a vector and in another an array).
Configuration files usually are only read by one piece of code. Besides, the article is correct in that the type system from TOML is completely inadequate for fully parsing the files anyway, so it will necessarily depend on implicit agreement on the semantics from every reader.
IMO, configuration file formats should only ever have text as primitive type. Anything else should be defined in another layer. I completely agree with that part of the argument from the article.
Then the article goes to argue that the quotes are harmful... And no, if you have a whitespace sensitive language, you need a damn good representation for strings that won't allow for ambiguity to creep in. And INI is just horrible on this.
>Configuration files usually are only read by one piece of code.
Having a single source of truth and multiple services and scripts needing the same info means the same configuration file will get to be read by many pieces of code, even from different languages.
And that's without considering piecemeal migration of the same "one piece of code" running on different services to another language or a version two design, still needing to read the same file.
>IMO, configuration file formats should only ever have text as primitive type. Anything else should be defined in another layer. I completely agree with that part of the argument from the article.
I mean, that's not even wrong.
Except if you mean "they should not be binary". Then, sure.
Structured formats cannot natively represent every possible type. For example JSON might support very basic types, like integers and booleans, but not more complex types like date/time types. Formats like JSON only have an advantage when discussing built-in types. Because with JSON, if you say "this field is a boolean" then everyone knows its possible values, however, if you say "this field is a date" then who knows? The point the article makes is the format itself shouldn't dictate types and to let the application decide them - which it what happens anyway for types the format doesn't natively support. You need documentation either way.
> or the slightly out there assertion that a 1 element list is a completely meaningless inhuman concept
Much further down in the "Immediacy" section there's a slightly better argument on that part, that a 1-element array looks like a reference to a section instead of an array.
This is an extremely terrible defence/criticism, for many of the reasons pointed out by others already, but I’ll add some more: INI came from Windows, and if you’re going to call it INI I think it’s reasonable to expect it to work with Windows’ INI functions—or else you can call it conf, embracing the still-very-ad-hoc format used across Linux and such. (And yeah, I recognise that the library name invokes both labels, but beyond that they’re focusing on INI.)
But not one of the INI examples shown will actually work as you’d expect using Windows’ GetPrivateProfileString function. Windows’ INI-reading functions are extremely simplistic; about the most magic thing is case-insensitivity of keys. You can’t put a space around the equals sign: that gives you a key name that ends with a space and a value that starts with a space. There are no line continuations (section 5). You can’t do what they call a composite configuration file (section 8). Empty keys (section 10) are fine. Implicit keys (section 12) don’t work.
It's so incredible how two engineers can look at something like this and draw literally the opposite conclusion. Taste is certainly subjective...
To me, this is largely an argument for TOML. I mean,
> wishes = "I am fine"
This is an array with one element????
> Even TOML's featured example proposes “the INI way”
Obviously it is the TOML way too? Hence it being the featured example. That another way to express it doesn't change that...
It's almost comical how much their arguments are convincing me that INI is a disaster. And yet they are, seemingly in all seriousness, truly trying to convince me that this is the way.
> Obviously it is the TOML way too? Hence it being the featured example. That another way to express it doesn't change that...
It's not even another way to express it. The example is a completely different data structure. The initial proposed "TOML way" is an array while the "INI way" is a table. The then corrected "TOML way" IS the equivalent table in TOML.
It's a ridiculous strawman along with several other oddities where the author loves to give INI the benefit of its weirdness but then rails against TOML's. The author loves INI and refuses to understand why people wanted and needed TOML to fix its deficiencies.
I'd love to tear this thing apart point-by-point, but I think it'd be more cathartic for me than useful to anyone else.
> It's so incredible how two engineers can look at something like this and draw literally the opposite conclusion. Taste is certainly subjective...
Well, in such cases it's likely because two engineers are actually expressing their tastes/beliefs, merely using the thing they look at as a running example. That also means that the running example, ironically, is actually mostly irrelevant to the topic: otherwise it wouldn't have been able to support two opposite conclusions.
And in the field of philosophy this phenomenon is even more egregious: you can find e.g. one philosopher arguing that life is meaningless because it's finite, and another philosopher arguing that life has meaning because it's finite.
> It's so incredible how two engineers can look at something like this and draw literally the opposite conclusion. [...] It's almost comical how much their arguments are convincing me that INI is a disaster. And yet they are, seemingly in all seriousness, truly trying to convince me that this is the way.
That pretty much sums up how I view arguments about metric vs. imperial measurement systems. Every feature of imperial that someone points out as an advantage is something I see as a disadvantage.
>That pretty much sums up how I view arguments about metric vs. imperial measurement systems.
One argument I will make in favor of imperial: feet (when divided into inches) are easy to divide into thirds because it's base 12 instead of base 10. This makes a a lot of things (especially in construction, woodworking, and others) a lot easier. Of course it has a whole boatload of downsides to go along with it, but that's one real and tangible benefit to imperial over metric that I've experienced.
That's exactly the kind of thinking that I'm alluding to. Okay, if base 12 is so great, why are gallons not divisible by 12? Why is our currency not divisible by 12? Why is a mile 1760 yards - what if you wanted to parcel a square mile into thirds on each side? Why is an inch partitioned into binary fractions? Why are pounds not divisible by 12? The imperial customs (it's hard to even call it a system) have no internal consistency.
To add to the observation about lack of internal consistency: Metalworkers use decimal inches. Woodworkers use feet, inches, and binary fractions. Surveyors use decimal feet. Some people use decimal miles. You could argue that each of the aforementioned systems make sense on its own, but none of them interoperate. Seriously - you'll be baffled if you pick up a decimal foot measuring tape in real life (they exist). Metric doesn't have this problem because if someone is doing detailed work in millimetres, someone is planning a house's rooms in metres, and someone is organizing a town's land in kilometres, they can all work with each other by simply moving the decimal place and changing the prefix.
Another problem is that you're presupposing that the products you interact with are designed in a whole number of feet, and then you subdivide from there. I don't see this as true at all; things come in all sizes like 2'5", how are you going to divide that into thirds?
> The imperial customs (it's hard to even call it a system) have no internal consistency.
"Internal consistency" doesn't matter. Virtually nobody ever has to convert miles to yards; the use cases for measuring a distance in miles and the use cases for measuring a distance in yards almost never overlap. For the rare cases where they do, the units do have an integer conversion.
For the use cases that do commonly overlap, e.g. inches and feet, you get a useful-for-that-specific-domain conversion factor of 12. Some domains use decimal miles or decimal inches and that's totally fine; sounds to me like they didn't need the metric system after all.
But if you're going to be so insistent on "internal consistency", riddle me this: why do we measure time in hours, days, weeks, months, and years rather than in decaseconds, hectoseconds, kiloseconds, megaseconds, etc.?
> if someone is doing detailed work in millimetres, someone is planning a house's rooms in metres, and someone is organizing a town's land in kilometres, they can all work with each other
These people never work with each other. This is a fantasy.
The lack of an opinionated stance on internal consistency is exactly how we arrive at traditional measures and also the US Customary set of units. It's far easier politically to be accommodating and allow more units than to put down your foot and say no, this is redundant, this cannot be used.
> Virtually nobody ever has to convert miles to yards
That's mostly true because it seems yards are only used to measure football fields and fabric. Everything else is measured in feet, from personal heights to furniture to rooms to house yards to building structures (the Empire State Building is 1454 feet tall).
But my point still stands. You think you don't have to convert between miles and feet? Okay: https://www.researchgate.net/figure/a-Typical-multi-lane-hig... . There's a highway exit coming up in 800 feet and another in 1/2 mile. How many times longer does it take to reach the second exit compared to the first exit? You have no clue. In metric it's 250 m and 800 m, and it obviously takes about 3 times longer to reach the second exit.
> These people never work with each other. This is a fantasy.
Tell me you haven't worked in engineering without telling me you haven't worked in engineering. If you eyeball everything and use intuition, I can see why you don't care about units, conversions, and calculations. If you actually need to plan and analyze things carefully before you order materials and cut things, you'll quickly see that having a plethora of units adds complexity and chances for error without adding any functionality that a pure system has (whether you're using millimetres or only decimal inches).
> But my point still stands. You think you don't have to convert between miles and feet? Okay: https://www.researchgate.net/figure/a-Typical-multi-lane-hig... . There's a highway exit coming up in 800 feet and another in 1/2 mile. How many times longer does it take to reach the second exit compared to the first exit? You have no clue. In metric it's 250 m and 800 m, and it obviously takes about 3 times longer to reach the second exit.
I’m having a hard time seeing this as much of a problem in daily life. That sounds more like a word problem in math class than something someone would want to typically calculate on the fly. And my sense of how far something at a given distance is in the car is informed more by experience and intuitive sense than measurement.
> That's mostly true because it seems yards are only used to measure football fields and fabric.
Also shooting ranges, but yes.
> There's a highway exit coming up in 800 feet and another in 1/2 mile. How many times longer does it take to reach the second exit compared to the first exit?
Prior to GPS navigation nobody ever said "there's a highway exit coming up in 800 feet"; road signs in the US consistently use fractions of a mile. When my GPS app switches units from fractions of a mile to feet, that means I can see where I need to exit/turn. Calculating a precise ratio doesn't matter. It's between 6 and 7 since there's 5280 feet in a mile and 86 is 48 but 87 is 56, but who cares?
Also, on a highway, I'm usually traveling at least 60 mph, and since there's 60 minutes in an hour, that comes out to one mile per minute. Try that with your fancy metric system!
> Tell me you haven't worked in engineering without telling me you haven't worked in engineering. If you eyeball everything and use intuition, I can see why you don't care about units, conversions, and calculations.
You're imagining a scenario where a guy who's worrying about fractions of an inch building a cabinet has to talk to the city planner who's worried about miles and they have to convert units to do that. That doesn't happen, and yet you want to optimize the entire unit of measurement for that specific use case at the expense of more common use cases like "dividing by three".
When it comes to domains where consistency does matter, we just don't do unit conversions. For instance, flight altitude is measured in feet, even when it's thousands of feet, instead of miles. If you're flying an airplane at 30,000 feet, who cares how many miles that is? That's not what miles are for. Likewise with domains that use feet per second rather than miles per hour. Again, even "metric" countries don't commit to the bit here; name me a country where the highway speed limit is in meters per second.
> When it comes to domains where consistency does matter, we just don't do unit conversions. For instance, flight altitude is measured in feet, even when it's thousands of feet, instead of miles. If you're flying an airplane at 30,000 feet, who cares how many miles that is?
Your plane is up at 30000 feet and your engines are out. The nearest airport is 47 nautical miles out. The plane has a glide ratio of 15. Will you make it?
It's easy in metric: 9.1 km altitude, 87 km distance.
> Again, even "metric" countries don't commit to the bit here; name me a country where the highway speed limit is in meters per second.
I actually much prefer metres per second. It makes things like kinetic energy calculations easier. If I wanted to know the KE of a 1500 kg, 100 km/h car, I would first need to convert to m/s. Ditto the kilowatt-hour; it needs to die in favor of megajoules.
> Your plane is up at 30000 feet and your engines are out. The nearest airport is 47 nautical miles out. The plane has a glide ratio of 15. Will you make it?
Nautical miles and statute miles are different units anyway, so my initial assertion--that you would never convert flight altitude to statute miles--is still correct.
Glide ratio varies according to speed; the plane might have a glide ratio of 15 at one speed but a different glide ratio at a different speed. So in practice, the real world version of this word problem is more complex than you make it out to be, and pilots' handbooks will commonly have tables of glide distances to consult for this reason.
> If I wanted to know the KE of a 1500 kg, 100 km/h car, I would first need to convert to m/s.
You've adequately demonstrated that metric is more convenient for arbitrary word problems that you've provided. Real world applications is what I'm less convinced about.
> Also, on a highway, I'm usually traveling at least 60 mph, and since there's 60 minutes in an hour, that comes out to one mile per minute. Try that with your fancy metric system!
All right. I'm travelling along the highway at 120 kilometers per hour, which is 2 km per minute. Where ist the problem?
>why do we measure time in hours, days, weeks, months, and years rather than in decaseconds, hectoseconds, kiloseconds, megaseconds, etc.?
We don't use them generally because at the time of French Revolution, due to taxes, standardizing the state units of measurement of physical goods was a much more pressing concern than time. (If I were to guess, there were no work hours limits, and thus it just hadn't crossed the popular mind). Weeks were changed to have ten days though.
That's not to say that they hadn't tried: decimal time was mandatory for a few years before they realized that there were too many clocks around, and pushing decimal time can actually turn people hostile against the then-new metric system.
We still kinda sorta use them in form of fractions of Julian days.
There's no theoretical problem if we define 86400 seconds = 1000000 newseconds.
One big practical problem is that because SI is a coherent system, the second is embedded in many units. For example, 1 N = 1 kg m/s^2. 1 J = 1 N m. 1 V = 1 J/C. 1 Pa = 1 N/m^2. And so on and so forth. So all of those units will have to be replaced with units derived from the newsecond. This is kind of like how SI has the unit tesla but the CGS electromagnetic unit is gauss.
A long-term problem is that no matter what, the length of the day on Earth will drift from scientifically accurate atomic time. Sometime in the near future, a day will be 86401 seconds. Then 86402, and so on. So the old second or the new second will not solve this problem.
> A long-term problem is that no matter what, the length of the day on Earth will drift from scientifically accurate atomic time. Sometime in the near future, a day will be 86401 seconds. Then 86402, and so on. So the old second or the new second will not solve this problem.
How long term is this problem? For instance, we habitually insert "leap seconds" in order to keep things lined up, but if we didn't insert leap seconds, we might accumulate an error of maybe half an hour between the solar meridian and "noon" in the next 500 years, which is less error than we introduce by putting Madrid and Belgrade in the same time zone. And in 500 years, most of humanity will not even be living on the Earth anyway.
Unlike metric, all the different imperial measurements are just combinations of different kinds of initially unrelated measurement systems. So a gallon has absolutely nothing to do with foot, miles are a completely different measurement system than feet, pounds and so on.
Needing to know the weight of a cubic hectare of water and then how many gallons that is, is not a common issue. Metric does give you a system that can quickly calculate these things comparatively.
Metric should have been base 12. You get more typical factors than base 10, and it's not an absurd deviation from base 10 (base 60 would allow for the same factors as both 12 and 10, but I doubt anyone would be happy to adopt that)
What a bunch of ridiculous nonsense. "I'm 6'1", which is easy to visualize because it's literally about the length of six adult feet end to end". I don't know how he spends his time, but I doubt I've ever seen six adult feet end to end, let alone with such frequency that's it's a familiar reference for measurement. (Not to mention that adult feet are very rarely as much as one foot in length).
Edit: Oh no, I watched another one. "It's more naturalistic. Think about it: there are a dozen inches in a foot just like there are a dozen eggs in a carton." I'm not even sure where to start on that one.
> Postel's law is a good indicator of how robust a language is: the more a language is able to make sense of different types of input, the more robust the language.
Into the trash it goes.
Seriously, the avoid-crashing-at-all-costs anti-pattern is what made HTML, JavaScript and PHP the messes that they still are, from which the latter one is only now recovering at a glacial pace. For once, we could learn the lesson.
Everything can be a success if its the only game in town or a free option, even Javascript.
And we shouldn't conflate adoption success with design quality either.
That said, it's not like TCP is a great example of Postel's law, and surely not in the crude way it's understood and practiced by its advocates. The RFC says:
"As a simple example, consider a protocol specification that contains an enumeration of values for a particular header field -- e.g., a type field, a port number, or an error code; this enumeration must be assumed to be incomplete. Thus, if a protocol specification defines four possible error codes, the software must not break when a fifth code shows up. An undefined code might be logged (see below),
but it must not cause a failure."
Which is hardly the "anything goes" ticket people imagine it to be. E.g. TCP would still break on a badly formed header and consider it an error.
Besides, the advice is good for a transmission control protocol, especially one involving in-between nodes that will not care for the enumeration values like ports and such like the start/end nodes do, and just need to pass them through.
It's horrible for other types of software. Language parsing would be a great example where it should not be followed. And of course the most famous related shit show is HTML handling in browsers. HTML might be succesful, but it's hardly because of following the Postel principle.
> HTML might be succesful, but it's hardly because of following the Postel principle.
I think if webbrowsers were not lenient with HTML parsing the web would have been adopted much slower in the initial years. Also, HTML5 can be seen as a confirmation that XHTML2 with a strict parsing would not be successful. I followed the WHATWG mailing lists quite closely when that effort began. This is evidence for the applicability of Postel's law at least in the case of HTML.
Nearly all of the internet's core protocols reject bad/malformed packets. They don't do a best effort to figure out what the sender "intended" they just reject the packet.
Some of the most popular instances of doing a best effort to figure out the intent of the sender are also poster children for protocol level security flaws. If we've learned anything from deploying, managing, and developing on top of the core internet protocols it's this:
1. Be very conservative in what you send.
2. Reject anything that isn't what you expected to get.
If you squint you can sort of see that being a valid interpretation of Postel's law but it's not the standard interpretation in practice.
The leniency of HTML burned me a lot in my beginner days. At some point, I decided to switch to XHTML (serving as the media type "application/xhtml+xml") and never looked back. Ditto JavaScript, the laxness harmed a lot, and only like 15 years later did I prepend "use strict" to every script.
Because for a long time if you wanted to build a website with no money and no experience those were your only options!
JavaScript and HTML for obvious reasons. PHP because if you didn't have your own server, you couldn't run anything else (while there were free hosting providers that would host your PHP scripts).
Lisp was a thing before PHP, and software engineers did use it but it never reached the popularity of PHP. It is in the end a question of being pragmatic, which PHP and Perl are.
Every time I hear Postel's Law mentioned all I can think is "God forbid we actually expect people to correctly implement a spec." I mean I could kind of get it if the specification is poorly written/ambiguous, but that's a problem with the spec itself in that case. Otherwise it's just adding unneeded complexity (that can majorly harm performance) for no real gain, except that you accommodate the people too incompetent to correctly implement a spec.
> and because it was easy people were quick to learn and use them
and then went on to write a shitload of insecure code.
Just because it's simple to use, doesn't mean that just anyone should be using it. The problem with PHP is that it can be used by someone with far below average programming skills to make something functional. But the flip side of how forgiving it is, is that it takes someone of above average skill to make something to use it to make something safe and performant.
Making it easy to use wrongly also makes it harder to use it right. Simply because you lack any feedback when you do something dumb.
Clear and sane error messages (i.e. something with the level of quality of Rust's compiler) could have accomplished the same thing without creating the insanity we have now.
I almost stopped reading at the difference between "89" and 89 being something bad that risks making your program crash.
What a moronic diatribe.
TOML being typed makes it excellent compared to INI.
Nobody with anything resembling a CS degree on their wall should be defending nonsense like "castable strings", and the proliferation of string conversions into the application layer. Let alone in C or C++.
Postel's law is only half right.
You should be conservative in what you generate (don't probe every obscure corner spec of a representation, if you can avoid it), and reject all inputs that do not conform to the specification.
Programs shouldn't react in uexpected ways to bad inputs, like crash, or allow an attacker to take control. But they shouldn't try to reinterpret bad inputs as good, either. That's folly.
The only reason to follow Postel's law is economic gain at the expense of the technological ecosystem.
If your web browser accepts broken HTML and renders it, whereas the competing browser rejects it, that competing browser is better for the web, but looks buggy to the naive user base, which will prefer your web browser.
Postel's law was used as one of the weapons in the browser wars, whose legacy negatively affects the web even today.
> I almost stopped reading at the difference between "89" and 89 being something bad that risks making your program crash.
I can only commend you for not taking a few minutes to consider whether it was worth continuing when the essay starts with praising postel's law, possibly the worst idea in the field since "let me just run that code I received over a socket".
I'm not a big fan of TOML, but I find the typing criticism here weak. I would far rather my configs have a strict interpretation of 89 vs "89" vs 89.0
None of the INI parsers I have ever used have just returned everything as a raw castable string of exactly what the user entered. There's always a horrible layer of interpretation. Many, including the one built into PHP have confusing rules around bools, for instance
> String values "true", "on" and "yes" are converted to true. "false", "off", "no" and "none" are considered false. "null" is converted to null
> All INI dialects however are well-defined (every INI file is parsed by some application, and by studying a parser's source code it is possible to deduce its rules)
That the definition of a lack of definition. Well-defined means you don't need to study parser's source code and deduce the rules, you get the rules!
> , and, if one looks closely, the number of INI dialects actually used in the wild is not infinite.
and that doesn't help you since you'll have to study parser's source code to deduce
There is no undefined behavior in C. You just have to read your compilers source code or the generated assembly or simply run to program to learn the exact definition.
The author does not seem to appreciate separating tokenization from interpretation. Can you right a (choose your data format of choice) document that is valid in (your data format of choice) and not valid config for your application? Absolutely! Can you right an arbitrarily specked INI file that is both invalid structurally and not valid? Even more so!
Choosing a standardized data format gives you a constellation of tools for managing, generating, querying, and validating data that you don’t need to write in an application.
I’m not sure if TOML libraries support it, but the Ion libraries, for example, allow you to see the next data type in code and adjust accordingly. If you want to accept a symbol, string, or array of values, the application can choose to do that. Oh, and you can write a formal schema with things like enumeration values that users and code can use to validate a document. So if you’re in custom INI land, that’s yet another tool you need to write on top of everything else.
If I didn’t want to pull in a dependency, I might write a quick and dirty INI parser, knowing there are likely bugs, corner cases, and all kinds of potential future issues that the extra code entails. If I’m taking on a dependency, I’d probably choose a well defined, human and machine readable/writable format that has schema support. Then I’d write a schema and point people (operators and programmers) to it for reference, not the code implementing the parser.
I've made my own config format precisely because TOML was not good enough. But INI would have been worse for my case.
Strict typing is good. If you expect a number, and it doesn't parse, you want to know that.
Strict typing communicates intent. And that is worth gobs of effort.
(I do admit that the author has a point about times when apps just want a string value, so I think I'll add a function to get the string value of a move no matter what type it is.)
Also, as much as the author slams TOML's avoidance of ambiguity, the industry has learned that Postel's Law is bad. We need to be conservative in what we emit and accept, for several reasons, the most important of which is that parsers are some of the most dangerous code; any possible ambiguity could be a security bug or many waiting to be discovered. My parser is opinionated because that prevents bad things from happening.
The author is also wrong about C not handling mixed arrays well; my config format is JSON with a few niceties, and I easily implemented it in C, even though it can have mixed arrays.
And braces/ brackets in JSON are not human-editable? Come on...
That said, some changes to JSON were necessary to make it easier to edit; besides comments, I also added keys that don't need quotes, as long as they are one word.
The author's premise is that the strict typing and validation of a configuration file belong with the application, not some library doing initial validation of a document which may or may not conform to the application's logic.
There are some pretty bad takes here. An integer and a version string are obviously different.
> no apparent motivation behind this rule, except that of conforming TOML to JSON...TOML's reason remains somewhat mysterious.
No, it's not mysterious, you figured it out! TOML is designed in part to work fairly well with other common formats, like JSON.
> ...except that a value can also be a date. There is something intriguing in all this. Even forgetting that an application might not need dates at all, why constraining something so particular and that can be formatted in so many different ways into a rigid primitive?
Once again, you answered your own question. Dates can be formatted in many different ways, so TOML offers a standard date type. It's really helpful!
> And even if you do survive the process of writing a parser that is fully compliant with TOML (some people don't), you still have done only half of the job, that of writing a parser, without really thinking of any real case usage.
In my experience TOML parsers are more consistent than INI parsers in terms of behavior. They have been immediately helpful to me, as they support a configuration format I vastly prefer.
What a funny write-up. TOML isn't perfect, but I like it much more than INI.
The fact that YAML is so bad is why being "understandable by a human" shouldn't be the ultimate goal of any configuration language. I would gladly make concessions like quoted strings or bracketed lists to avoid the hell of trying to figure out if my string/number/list actually parsed as a string/number/list.
I used to be in favor of schemas, but my problem with them these days is that they just can't encode all the validation necessary to ensure that the config is correct. At the end of the day, the only way to check if the config is actually valid is to parse it, so I'm sympathetic to the "string them all and let the application sort them out" approach.
I'm sympathetic to that, but not to ini! INI is just standardized enough that your ini parser may or may not give you "just a string." As another poster mentioned, it may well interpret the string "NO", quotes included, as the boolean false, before passing along to the rest of the application. It's this ambiguity of type that makes INI problematic. If it simply handed along strings, without fail, and left the application to parse whether "NO" should be a country or boolean or string, that wouldn't be a problem.
But inevitably, in order to DRY, somebody will make a consistent parser that is used in your application, whether that's in-house or a dependency. And at that point, it is very tempting to run everything through the parser, and the parser is going to make some unexpected decisions.
So, sure, use INI. But don't really. Use a env file that is parseable as INI or as shell environment variables. As soon as you start needing anything more complex, use something where you have at least a few basic guarantees that what you're getting is at least in the general vicinity of what you want.
Right, I'm not trying to say that INI is the solution, only that 1) TOML's anemic selection of types is probably pointless, and 2) any attempt to provide a useful selection of types would require being a Turing-complete language, which is not want I want in my config files, so you might as well just give me strings and let me parse it in the typed, Turing-complete language that I'm already using for my application logic.
Yes. I like the NestedText approach, but I do feel like it needs an official optional "blessed" schema description language for type validation, instead of "here's a dozen ways to do NestedText validation in python".
I kinda get where the author is coming from. Everyone is rushing over themselves to try and "fix" configuration file formats by adding more and more complexity so we humans can express more complex data types into configuration files.
Nowadays TOML is the new hotness, 6 years ago it was YML, 9 years ago it was JSON and before that XML was our alledged savior.
Ultimately there's no one perfect solution, but INI does have one thing going for it: it is dead simple. It has no type system and it leaves it up to the application to convert a set of key/value string pairs (grouped by a header) into something meaningful rather than attempting to guess and spitting out unexpected surprises.
Contrast to XML which has 50 ways to say the same thing depending on what tickles your fancy, YML which is probably the configuration language with the fastest footguns in the West or TOML which attempts to express fairly complicated data types into a configuration file without questioning if that's even useful for 99% of the configuration files.
The only one that is close to equal in simplicity is JSON, which only allows booleans, numbers, strings and arrays/dictionaries. Really I think the only reason why JSON failed the "human-written configuration file" contest is because spec-compliant implementations forbid comments. This being mostly because the author of the spec wanted to avoid JSON (de)serializers from putting custom logic in them.
(Although I'll note that JSON specifically tends to have the most non-spec-compliant libraries that have a looser stance on it and allow in-line comments using javascripts rules, at least in my experience and even has a bunch of dedicated offshoot specs that are basically "JSON but with comments".)
Yeah, I noticed - that's mostly just because of the general flow of trends.
XML was a hard swing in terms of complexity - it allowed you express everything, but did it in such a way that very few people would actually enjoy writing it. So it fell out of favor and folks moved to JSON for its simplicity.
Then JSON proved to be somewhat too onerous because of several tiny annoyances (lack of comments, but also things like not allowing trailing commas in lists/dictionaries), so devs started trying out YML.
YML quickly proved to have too many footguns because of the way its parser behaves, so it is now being abandoned in favor of TOML.
Trends are a pendulum. They go back and forth, slightly changing each time they do. Complexity falls out of favor so we move to simplicity. Simplicity proves too simple so we go back to complexity, but slightly different than the old one. That too will fall out of favor and we go back to a slightly different kind of simplicity. Just how it goes.
As for TOML - it's fine, I'd probably use it for a new project, but it's also undeniable that TOML in general feels a lot like supercharged INI. And that should in turn ask the question: why do we need that supercharged INI. Why should a config parser decide what primitives to return/what objects to construct them to? We can bolt schemas and other stuff on top of config parsers all we want, but why should that be the task of the parser?
INI only deals in strings. In many ways, that make things a lot easier. Most static typed languages I've worked with tend to bolt getPrimitive() functions onto their config parsers for every type that throw an exception if it can't understand the type (which also makes the API extremely messy to read).
Meanwhile most soft/duck-typed languages just throw you in the depths of it and give you whatever the fuck they parsed out of it. Usually you just end up manually type checking the data after.
With INI, all you have is strings. It's simple, it's clean and it doesn't need to infer anything else (which also fits nicely with paradigms like "do one thing, and do it well").
We have showcased here the classic impedance mismatch between serialization in a typed language and serialization in a dynamic language. The author clearly is in the typed camp, speaking of making enumeration labels and a dated type for the names of continents.
The typed language camp loves the idea of structured untyped strings in a serialization format, such as NestedText or INI. However, this is uncomfortable in dynamic language territory, for many use cases. Because of this dichotomy, any universal serialization format is going to feel like a compromise. JSON is the poster child of this.
In the YAML spec, the authors explicitly stated a goal of supporting native data types of dynamic languages. This design decision seems to be a good compromise between the typed and dynamic camps, but the old saying applies: a good compromise is where everyone goes home angry.
TOML forces arrays to be encapsulated within square brackets (exactly like section paths do), although humans do not need square brackets for recognizing when something is a list.
Isn't TOML meant to be read by programs?
Don't square brackets make things less ambiguous for both machines and humans?
There is no way to convince a human that something composed of only one member is a list (if you think differently, chances are that you are partly non-human).
The author should speak for himself.
TOML forces arrays to be always comma-separated, although a human can recognize a list even when the separator is a mushroom.
I agree with the typing arguments. A config parser can only parse the values to a degree, to primitive types basically. The application can parse them fully. So the question is to what extent the parser helps the application vs to what extent it introduces subtle problems. As a config file user I would prefer not to have to care so much, but also not to have to work around weird corner cases. YAML's no-problem is a perfect example. This would not exist if the parser did not try to be helpful.
If we assume this is serious then it falls into a division I see in a lot of areas. Catch errors sooner or Catch errors later. The Catch errors later group tends to take the position that Because I have to catch some errors later anyway I might as well catch all the errors later. The catch errors earlier group tends to take the position that the earlier I can catch the error the easier it is to handle and the safer my code will be.
Neither seems to be able to see valid points in the other's position so it ends up being polarizing.
I don't think this is to do with catching errors. You can throw errors at load time with both mechanisms if that's what you want. Instead, the question is, should the file format include a bunch of notation so that the computer can deserialize the value to some type even without knowing what it'll be used for?
Since ini is a format designed for humans to read and write, the argument is that no, the reading code should decide how to interpret the value, and this is fine because it knows whether it wants this particular value to be a boolean or a string or a number or a continent.
The ini file reader has an implicit schema in its reader code. The TOML file makes (some of) the types explicit in the file itself at the expense of making it less convenient for a human to read and write.
Following this argument to its logical conclusion, why bother having any kind of standardized format at all, ini or otherwise? The program's config reader knows what it wants to read, why bother standardizing names, notation, section delimiters, or anything else?
And a standardized format takes that even further, since now text editors can be aware of more of the file structure and assist with highlighting, completion and more.
NestedText was mentioned earlier in this thread, and it takes the philosophy to it's logical conclusion. That conclusion includes schemas https://nestedtext.org/en/latest/schemas.html. Text editors could absolutely be written to understand schemas and provide the help you suggest.
> The Catch errors later group tends to take the position that Because I have to catch some errors later anyway I might as well catch all the errors later.
I'm not sure why that follows. It doesn't seem to apply to any other domain, eg: since I have to deal with diseases associated with aging later anyway, I might as well not take care of myself now and just go wild and do anything I feel like.
For nearly every argument made in the article, I found myself siding with the TOML approach. The "no non-ascii keys except wrapped in quotes" thing is a bit wonky I admit.
There is definitely an argument for understanding elementary data types later on application level instead on config format, so we do not create arbitrary distinction between types on config format level and on application level, and forcing IP addresses (and other domain-specific types) to be encoded as strings.
But i do not think this is good argument for structural data types like lists, sets and so on, because M applications would use N different ways how to encode them. While a human can recognize them, it is hard to human to remember that this specific application uses that specific encoding with its own quirks.
> TOML forces arrays to be encapsulated within square brackets (exactly like section paths do), although humans do not need square brackets for recognizing when something is a list.
> # not an array in TOML
> wishes = apples, cars, elephants, chairs
But earlier he didn't want strings to be quoted. If these two criticisms were both applied, how do you distinguish the string "apples, cars, elephants, chairs" and the list ["apples", "cars", "elephants", "chairs"]?
Having used INIs already, it seemed TOML didn't really bring enough to the table to be worth switching to it. But I imagine switching from YAML to TOML might be an improvement.
Date support, and general type-awareness and nested sections also seemed like anti-features to me.
We use INI files to configure our on-premises product. Four INI files to be precise. Database connection url, server hostnames, ports, various Boolean and debug flags, and even CSV string arrays.
To this day not a single customer complaint or even support questions with regards to the INI files. Most of our customers are on Windows but we also support Linux with the same config scheme.
In replying to something this deranged, one runs the risk of responding with sincerity to a troll, and thereby being Owned. Nevertheless, my ethics require me to implore anyone who sympathizes with the section on Square Brackets to seek help.
It's surprising how many commenters point out that the post reads like a joke, but none of them seem to consider the possibility that it is, in fact, a joke. I'm pretty confident that it is. Consider this quote: "if one looks closely, the number of INI dialects actually used in the wild is not infinite". That isn't damning with faint praise, it's just damning.
The advantage of TOML over INI is that you can use a generic TOML linter or TOML validator to check files. This is very helpful when dealing with deployment and CI pipelines where it would be nice to just fail if the config file(s) are not valid. This can save a lot of time, and eliminate the whole "spin up a container and load the application with broken config file" step...
I used TOML for complex configuration and I am quite happy with it.
However I often saw people who complained with it, and most of the complains afaik are of the nature: "This extremely complex use case, that I could as well just write in an interpreted language isn't supported by TOML".
If your configuration is so crazy it needs more than any configuration language offers, just use a scripting language instead. I have seen lua or python files act as configuration and there is nothing wrong with that.
Postel's Law is bad. It directly contradicts with the Unix philosophy of building small and composable interfaces. Look how small and generic the interface of Reader in Go. It has one method with a simple signature and can be used widely.
Otoh, when your software is not opinionated about what it accepts, that's when it tends to produce surprising behaviors. This whole argument made me lean on TOML even more. It's not perfect but certainly better than ini in many ways.
tomlc99 parser took 13 minutes to parse a 50MB file!? There must be something horribly broken with it. Rust's TOML parser is literally 1000 times faster than that.
Don't think I've benchmarked that case since thats not generally what you'll run into with a human data format. Would love to see numbers if you have them since I have no idea what they'd be.
Personally performance isn't my biggest concern and I only focus on it for the cargo case but I want to switch cargo to storing packaged `.crate` files to a format meant for machines.
(maintainer of `toml` and `toml_edit` packages for Rust)
The file it generates is not valid TOML though; even if you fix the obvious syntax issues you run in to issues like:
% tomlv ./big_file.ini
Error in './big_file.ini': toml: line 35: Key 'section' has already been defined.
Perhaps tomlc99 doesn't check for this; I didn't try it.
Maybe I should add something to toml-test for this; while I agree that 50M files are somewhat rare[1] and that performance isn't a huge consideration, it's not entirely unimportant either, and this can give people a bit of a "is the performance of my parser horribly atrocious or roughly fine?" kind of baseline.
[1]: One use-case I have is that I use TOML for translation files (like gettext .po, except in, well, TOML) and in a large application with a whole bunch of translations I can see that adding up to 50M.
> value in € = 345 # valid with libconfini but invalid in TOML
This is degenerate.
You want a key-value format easy to pick up, and easy to edit by hand - otherwise you'd rather just use something like SQlite - in particular if you "need" the other insanity that is sections in INI files.
Why the heck allow spaces and UTF8 in key names? To satisfy someone's libido?
Don't tell me, my native language uses a bunch of those weird çharacters. I also know that it always eventually cause problems because there's always legacy software that doesn't handle them properly.
But what really is the purpose of allowing supporting in key names? Current keyboards - at least those sold in Europe - do have a key for €, but imagine some lunatic "app developer" insists that another INI key name should be spelled "value in ¥". Customers, coworkers, support desk staff, everyone will curse them.
Comments prescribing example configurations resolve most of these issues. Even if the ideal config format is found and Internet debates settle the issue decisively, README.txt will still be a good practice.
There's very little which isn't permissible if we presuppose users will read the source to understand a config file or software generally.
What's the problem with using JSON and providing a schema? It is not so "pretty" to write with braces, quotes and such, but not too hard either. I have been doing that at work for some time now. VSCode will even recognize the schema and check the document in real time.
Each argument the author gives is a strong preferrence for everything being implicit, ambiguous, free of heavy syntax like quotes for strings and brackets for arrays
That would be a reasonable subjective choice, but when I see the INI exemples used to illustrate, or the slightly out there assertion that a 1 element list is a completely meaningless inhuman concept, I'm not really swayed. Sometimes it almost seems like irony. The exemples are really hard to take at face value..
If anything, this pushes me further away from that. I've done YAML. I've done languages where everything is helpfully implicitly cast, and all the "WTF talks" that result from the weird rules and edge cases.
We know what it's like when DE is a string but NO is a boolean, and only some version numbers are strings while some are floats.
Quoting strings is really not what costs me time and effort. It's weird edge cases and surprises that come back to bite you.