It just seems like YAML is making things more implicit. I like explicitness. Not against YAML, but it's no holy grail in my mind.
I'm more experienced with JSON than YAML and could be swayed on this, but so far I'm not. They seem like two roughly equal ways to do the same thing.
This seems oddly similar to the nit-picky problems people have when learning Python from a C-like background and being annoyed that semicolons and flexible indentation aren't the norm. When I spend months working on JS code, looking at Python slightly annoys me, and vice-versa, so I think I get it. I just don't think it's an important, game-changing thing.
I'll be learning about Ansible soon, so I'll have to also learn YAML syntax. I guess I'll have a more educated view once I've gotten deep into that.
YAML has a couple big advantages (in the proper situations) over JSON. Self-referencing, complex datatypes, and (my personal favorite) embedded block literals. I've, perhaps unwisely, embedded entire webpages a single indentation level after a variable and retained all of its formatting.
I'm no YAML zealot though, I do use JSON most of the time, but in the past couple years I've started writing most of my config files in YAML where I have the opportunity.
We recently moved much of our application's schema definitions and configurations out of code/JSON and into YAML files (we generate JSON during the build process in placess where it is necessary). A pleasant side effect was that the YAML file set began to double as basic documentation.
I'm actually having a hard time, even with some google-fu, finding information about how this might be used. YAML->json parsers have failed to handle docs with a '...' separating parts of a document so far. Would this triple-dot structure be used to create multiple JS objects? How would I find more information about how this might be used in practice?
I have a feeling this would be easier to figure out if I just went and played around with it in practice, but it also seems like something that could be documented in more beginner-friendly terms.
I'll probably be kicking myself once I figure it out, but I also only have a cursory understanding of how streams work. Maybe this is an opportunity to fix that. Once I figure it out, I'll respond to this if nobody else has.
I think the GP was using '...' as a standard "stuff removed for brevity" ellipsis.
However, since you asked:
In a YAML document, there is an implicit top-level object, either a hash/dict/associative array or a list/array depending on your formatting.
The '...' separates streams, or top-level objects, within a single file or IO stream.
YAML parsers generally stop when they hit the separator, even in languages that can do multiple assignment. If your IO stream behaves like a file handle, you can read it repeatedly into different variables until EOF.
Multiple streams in a YAML document are fairly uncommon. I think most people don't know they exist, but I appreciate the flexibility and use it whenever it makes sense.
The stack overflow page is disappointing because in my opinion all of the answers miss the point: Yaml is designed to be readable and editable by humans. Json is only designed to be human readable - and intentionally does not have features to support editing by people.
Json intentionally does not support comments. Think about that for a bit and you'll realise what Json is for and what it isn't for.
As SeoxyS said: use yaml for config files, Json for APIs.
Actually, after finding out about TOML[1], I would suggest to look at it. It is a lot simpler than YAML, explicitly does not support dangerous features like deserializing arbitrary data structures, while being very readable. Spec is not entirely stable yet, though.
Oh, yes, we do. We need a language targeted at humans that is machines can
process and we need a language targeted at machine processing that is
inspectable by humans. Those are two contradicting targets: the latter calls
for simplicity, but the former calls for shortcuts[#] to make human's work
easier, which adds, not reduces, complexity.
[#] Shortcuts like not quoting keys and omitting braces and commas in
hash definition.
I like both JSON & YAML for different reasons. YAML is a great configuration format; it's fantastic for being read and written by humans. JSON is much better as a serialization format.
I hadn't thought about comments - that is a clear advantage to YAML. Great point to bring up, and supports the popular view I'm seeing that YAML is maybe better for config files. I can think of numerous scenarios where a configuration decision could use some context.
For some use cases - and configuration is a prime example - it can make sense to add the comment directly to the data, as another property of the relevant object. that then allows you to use the comment e.g. in a front-end config editor.
HHVM (FB's PHP/Hack) moved from YAML (hdf flavor) to INI for its configuration file format. Sad but true, it was just awful to work with - good that they changed it. https://github.com/facebook/hhvm/wiki/Runtime-options
One problem I've hit a few times is parsing of strings with escapes in YAML. In JSON it's absolutely clear how escaping is works, in YAML, in practice, different parsers do subtly different things.
I have had a similar experience with spaces handling in yaml: the application configuration was producing error because of tabs were being used instead of spaces ....
No, the whole implementation is actually very simple. It's based on the 'standard' parsers for JS/C#/Python and only differs for handling quoteless stings and the optional syntax.
No problems with dwimmy typing ;) - not saying that it couldn't happen but I think that would be the exception. Not having to use quotes/escape characters helps a lot more.
I hadn't really looked into YAML before, so I fed it some data for a web app I'm working on.
Man, It sure is condensed, but because of that, not very readable to me. If this is the standard for YAML, it's definitely interesting, but not my cup of tea. It's sometimes hard to parse where a list ends or an object begins. If you want to argue that it's more space efficient, I'd say, just use gzipped JSON. If you want to say you're using it for the spacing and/or line breaks, I'd just say find a viewer that prettifies your JSON well.
Having a delimiter, at least when you're representing chunks of data, is really useful, easier to code for, and easier to read than tabs or other systems. This, not so much. It's kind of a hot mess. Maybe that thinking works well in python, but I'm not so sure the principle translates away from code. If I have a list of objects, I really need to know where one thing ends and the next begins without having to keep track of more than one thing at a time.
Your argument is inconsistent. You're saying that YAML is hard to read, but you suggest people who find JSON hard to read ought to use a viewer that prettifies it. Why aren't you applying your own logic to YAML, and finding a viewer that makes it easier for you to read?
YAML requires a specific indentation and spacing to work. There's no changing the layout.
JSON is free to do whatever. You could have everything in one line if you wanted. You could use indentations of 8 spaces or 21 tabs. White space does not matter! Point is, it is not hard to figure out a sensible indentation and line breaking scheme if you need it. This is the beauty of braces when dealing with data.
Yaml is in fact a superset of JSON. You can parse a json with yaml parser. You can also mix json and yaml in the same document and use [..., ..] for annotating arrays and { "bar" : "foo"} for objects. Some yaml libraries allow you to generate yaml that has one style for top level and json style for nested elements / long arrays to improve readability.
However I have also learned the following:
- in python, parsing yaml file is hundred times slower than json (in fact, my benchmark was showing around 400x slowdown). Therefore you can't really use it in cases where performance matters at least a little bit. A yaml 1k line yaml can load more than half a second (yrmv)
- if your yaml doc is longer than two screens, it loses its readability benefits.
Therefore it is best to use a whole directory of yaml files, each describing a specific feature. E.g. Ansible is a good example of how to use yaml files.
And it might just be confirmation bias, but I've seen far more exploits in YAML parsers than in JSON parsers. YAML might be okay for configuration files, but I would never use it for data exchange…
Hmm, I don't like the fact that strings don't have quotes; 1 vs '1', false vs 'false', with javascript's === operator, it's nice to be explicit about the type of data, since javascript is where most json (and later YAML maybe?) gets consumed...
YAML seems inherently unsafe as it's indentation based, which makes copy-pasting from different levels very difficult. The best (and one of the oldest) serialization format is S-expressions. You have the best of both worlds, compactness and non-significant whitespace.
I really like yaml. Especially for writing config files. But I still wish there was a cut down version without the bells and whistles that aren't really useful/relevant to human read/written documents. Just a superset of json syntax, not semantics.
As several comments here point out, the intended usage of JSON is as a serialization format. In which case, I would expect dates to be in epoch time (if you need to include timezone data, that should be a seperate field).
What annoys me is that JSON lacks a binary data type. The best you can do is base64, which really sucks if 99% of your binary data falls into the ascii range, but you have the occasional high bit character and you explicitly aren't trying to treat it as unicode.
"I would expect ..." that's the problem. Expectations fail when both sides don't have the same expectations. As there is no way in JSON to specify what type a field is and as dates aren't even specified in the JSON, you aren't sure if dates are in epoch time or any common or uncommon variant of ISO 8601 (or worse, non of these two options).
JSON hasn't been designed for binary data, so it's not surprising that it lacks a binary data type. There are several options besides base64, you could e.g. use yEnc or BSON.
But unless you have really large binary data (in that case I would instead of embed it in JSON, only embed an URL in JSON and let the client download the data separately), I wouldn't bother with another encoding than base64. It is easily compressable and this is handled transparently, so you reach such a low overhead that it is hard to justify using a non-standard option like yEnc.
wow, there are some really amazing points here. the take away i've gathered is that yaml is preferred for configuration, with json being the clear winner for data interchange. any suggestions for improvements to the site or content?
Think again if you really want that for every service.
I've encountered many poorly written services where I would have been happy if there were a schema to be at least sure what kind of messages are exchanged.
Yes, BUT. The other day someone proposed a JSON 'standard' that mimicked SOAP - don't do that.
JSON is beloved because it's easy and simply. XML was originally simple too with just DTD as Schema. Then they come up with XML-RPC, XMLSchema, XSLT, SOAP and many other complex concepts that in the end more or less failed.
> XML was originally simple too with just DTD as Schema.
The complexity of XML has not changed since its inception.
> Then they come up with XML-RPC, XMLSchema, XSLT, SOAP and many other complex concepts that in the end more or less failed.
This doesn't really has any bearing on the specification of XML itself. XML is simple (with some caveats like entity expansion), and extremely flexible. The problem is that it is designed to be read and written by machines, while still being debugable by people. It is extremely verbose. It shines in some use cases, eg for documents with a complex structure and a lot of semantic metadata. But it is even less suitable than JSON for configuration files, for instance.
the problems with xml (as in "outside the XML-RPC, XMLSchema, XSLT, SOAP stuff") IMHO arises these two / three points
* it's a little too verbose (<xml></xml>.. while even S-expressions use just ')' as terminator)
* there is no obvious way to transform any xml to an object (pojo is the best aproxymation, but how do you differentiate between a sub-tag, a text node, and an attribute?)
* it's essentially typeless (how do you serialize a number? how to differentiate it from a string? from a boolean?)
I agree that it is verbose, but I am not convinced that S-expressions are the solution. In a complex, deeply nested XML document, you should be able to tell where a given tag is inserted without counting parentheses.
> there is no obvious way to transform any xml to an object (pojo is the best aproxymation, but how do you differentiate between a sub-tag, a text node, and an attribute?)
That's OK. Just don't use XML to serialize data structures. IMHO, the use cases at which it is good are a lot closer to use cases for which HTML works than when you hesitate between XML and JSON.
> it's essentially typeless (how do you serialize a number? how to differentiate it from a string? from a boolean?)
That's not entirely wrong, but you can enforce a lot of things via XML schemas (eg, something like XSD or RelaxNG).
Nothing too specific, but I usually have flash disabled on the browser I use the most and had to view the website in a basic Chrome install instead (which I do have installed for this kind of cases). ;)
I thought, honestly, that html5 api was able to provide the same capability as flash, as far as file/clipboard was concerned.
It just seems like YAML is making things more implicit. I like explicitness. Not against YAML, but it's no holy grail in my mind.
I'm more experienced with JSON than YAML and could be swayed on this, but so far I'm not. They seem like two roughly equal ways to do the same thing.
This seems oddly similar to the nit-picky problems people have when learning Python from a C-like background and being annoyed that semicolons and flexible indentation aren't the norm. When I spend months working on JS code, looking at Python slightly annoys me, and vice-versa, so I think I get it. I just don't think it's an important, game-changing thing.
I'll be learning about Ansible soon, so I'll have to also learn YAML syntax. I guess I'll have a more educated view once I've gotten deep into that.