But what this does under the hood is still sending instructions to a workflow en...

mfateev · on Aug 31, 2020

It is a workflow engine. So the actual work happens inside the activities. So from this point of view it is the same. The difference is that the orchestration code has the full power of a programming language including threads and OO techniques. And all the tools like IDEs, debuggers, unit testing frameworks work out of the box and don't need to be reinvented for YAML based reimplementation of Java.

There are lot of advantages of using a general purpose programming language for implementing workflows. An incomplete list in no particular order:

   * Strongly typed for languages that support it
   * No need to learn a new programming language
   * Practically unlimited complexity of the code 
   * IDE support which includes code analysis and refactoring
   * Debuggers
   * Reuse of existing libraries and data structures. For example can YAML based definition support ordered maps or priority lists without any modification?
   * Standard error handling. In Java, for example, exception are used.
   * Easy to implement handling of asynchronous events
   * Standard toolchains just work. For example Gradle for Java and modules for Go.
   * Standard logging and context propagation can be supported

And so on. Any new language has to have a ton of tools, libraries and frameworks to be useful. And using an existing language allows to benefit from the existing ecosystem out of the box.

xiaq · on Aug 31, 2020

Yes, the orchestration code can be developed and debugged as normal code. I agree that it can be a pain to write and debug YAML "code".

But what's usually more interesting when it comes to workflows is inspecting and debugging the workflows themselves, and you'd still need custom tooling for that, regardless of how the workflow is built.

The actual appeal of using YAML here is that the "code" is amenable to static analysis. YAML is irrelevant; the DSL embedded in YAML is. For example, you can easily count how many steps there are, how many edges there are between steps, etc. If you build the workflow in code, in general you can only know these after the orchestration code has executed.

mfateev · on Aug 31, 2020

I'm not sure how useful all these counts are. If the static analysis that you described was that important we would write most of the software in YAML instead of C/C++/Java/Go/Python, etc. Linux kernel in YAML would be really cool :).

In my experience, no developer ever asked for this information, especially if the price is writing code in turing complete YAML/XML/JSON based language.

xiaq · on Aug 31, 2020

Well, that was a contrived example. Something more useful would be e.g. enumerating all the HTTP endpoints being depended on, or calculating the maximum resource consumption.

I think our disagreement really boils down to different approaches towards workflow configuration. Static configuration has its place in a system where all the "smartness " can easily fit somewhere else. This is often the case when you are building a workflow that glues many in-house components together, which tend to have uniform behavior and you can easily extend them. On the other hand, if you are working with many heterogeneous components that are clumsy to extend, having a smarter, more dynamic workflow configuration API definitely makes more sense.

mfateev · on Aug 31, 2020

In my opinion, there are two classes of workflow definition languages: domain-specific (DSL) and general-purpose.

Configuration based languages are awesome for domain-specific use cases. For example, AWS Cloud Formation or HashiCorp Terraform configuration language are good examples of domain-specific workflow definition languages. They solve just one specific problem that allows them to be mostly declarative and omit most procedural complexity. Even in this case, I'm pretty sure that Pulumi folks would not be 100% in agreement.

The general-purpose workflow definition languages are procedural. And I believe that procedural code in YAML/XML/JSON is a bad idea. It looks ugly, doesn't add much value, and never matches any of the general-purpose languages in expressiveness and tooling. Such configuration languages work in limited situations, but developers quickly hit their boundaries in most real use cases and have to look for real solutions.

BTW Temporal and its predecessor Cadence are perfect platforms for supporting custom DSL workflow definitions. Many production use cases run custom DSLs on top of them.