Thanks for the comment. It helps me think how to clarify what I was trying to sa...

Thanks for the comment. It helps me think how to clarify what I was trying to say.

What I wanted to express is that using JSON Schema (or any such) for validation encounters a many-to-one mapping from multiple possible types across any/all given programming languages to a single JSON Schema form. That is, instances of multiple programming language types may be serialized to JSON such that their data may be validated according to a single, common JSON Schema form. This is fine, no problem.

OTOH, using JSON Schema (or any such) for codegen reverses that mapping to be one-to-many. It is this that leads to ambiguity and problems.

Restricting to a subset of JSON Schema is only goes so far. For example, we can not discard JSON Schema `object` as it is too fundamental. But, given a simple `object` schema that happens to specify all properties have a common type `T` it is ambiguous to generate C++'s `class` or `struct` or a `std::map<string,T>`. Likewise, a JSON Schema `array` can be mapped to a large set of possible collection types.

To fight the ambiguity, one possibility is to augment the schema with language-specific information. At least, if we have a JSON Schema `object` we may add a (non `required`) property to provide a hint. Eg, we may add `cpp_type` propety. Then, typically, the overhead of using a codegen schema is only beneficial if we will generate code in multiple languages. So, this type hinting approach means growing our hints to include a `java_type`, `python_type`, etc. This is minor overhead compared to writing language types in "long hand" but still somewhat unsatisfying. With enough type-theory expertise (which I lack) perhaps it is possible to abstractly and precisely name the desired type which then codegen for each language can implement without ambiguity. But, given the wealth of types, even sticking with just a programming language's standard library, this abstraction may be fraught with complication. I think of the remaining ambiguity between specifying use of C++'s `std::map` vs `std::unordered_map` given an abstract type hint of, say, `associative_array`. Or `std::array`, `std::list`, `std::vector`, `std::tuple` if given a JSON Schema `array`).

I don't think this is a failing of JSON Schema per se but is an inherent problem for any codegen schema to confront. Something new must enter the picture to remove the ambiguity. In (my) practice, this ambiguity is killed simply by making limiting choices in the implementation of the codegen. This is fine until it isn't and the user says, "what do you mean I can't generate a `std::map`!". Ask me how I know. :)