One thing I'm missing when modeling FSM like this is different states having different set of constraints, even if only being concerned with nullity. It's a shame having to make the field optional just because you do not have the appropriate value in the initial state of the entity.
For any use-case I can think of, which are all "small data" (under 1B rows), I'd rather use DB migrations to take care of this problem rather than forcing myself to live with my previous design decisions indefinitely.
I am attracted to clever but in practice, simple always beats clever.
Wouldn't that risk some FSM getting modified while it's 'running'? That may or may not be desirable I guess. Eg. a customer not knowing that the process changed since they placed an order and not understanding the process anymore
This is a nice blog and I like the versioning feature, if you're looking to explore this idea more I made a very simple unversioned FSM extension for Postgres years ago for fun and no profit:
I always wondered about this, a while ago I saw a library similar to this that modeled FSMs in Postgres, and had a comment saying "todo: cycle detection". And made me wonder if it's even possible to do this.
Hmm… consider the following. Your FSM is acyclic iff you can assign each state an integer depth such that a state at depth d only ever transitions to states at depths strictly greater than d. So consider the following tables:
CREATE TABLE states (
state order_state PRIMARY KEY,
depth int NOT NULL,
UNIQUE(state, depth)
);
CREATE TABLE transitions (
start_state order_state NOT NULL,
event order_event NOT NULL,
end_state order_state NOT NULL,
start_depth int NOT NULL,
end_depth int NOT NULL,
CONSTRAINT transitions_start_depth_correct
FOREIGN KEY(start_state, start_depth) REFERENCES states(state, depth)
ON UPDATE CASCADE,
CONSTRAINT transitions_end_depth_correct
FOREIGN KEY(end_state, end_depth) REFERENCES states(state, depth)
ON UPDATE CASCADE,
CONSTRAINT transitions_depth_increases CHECK(end_depth > start_depth),
PRIMARY KEY(start_state, event)
);
Let's bang on it for a quick test. You can define a state machine; here's one that roughly matches the regex `^(AB|BA)$` (I know I'm being a bit sloppy):
And, as you need to modify it, you can increase a node's depth to make room for intervening nodes:
fsm=> UPDATE states SET depth = 4 WHERE state = 'error';
UPDATE 1
fsm=> TABLE transitions;
start_state | event | end_state | start_depth | end_depth
-------------+-------+-----------+-------------+-----------
start | A | a | 1 | 2
start | B | b | 1 | 2
a | B | done | 2 | 3
b | A | done | 2 | 3
a | A | error | 2 | 4
b | B | error | 2 | 4
(6 rows)
But you can't decrease a node's depth too far:
fsm=> UPDATE states SET depth = 2 WHERE state = 'error';
ERROR: new row for relation "transitions" violates check constraint "transitions_depth_increases"
DETAIL: Failing row contains (a, A, error, 2, 2).
CONTEXT: SQL statement "UPDATE ONLY "public"."transitions" SET "end_state" = $1, "end_depth" = $2 WHERE $3::pg_catalog.text OPERATOR(pg_catalog.=) "end_state"::pg_catalog.text AND $4 OPERATOR(pg_catalog.=) "end_depth""
And you can't introduce transitions that don't increase depth:
Now, I don't know that I would immediately recommend this for high-throughput production use. You're storing "unnecessary" state not once but many times (each state's depth appears `1 + \deg(v)` times), plus additional indices and lookups. But I do think it meets the desired consistency goals!
Flexibility and constraint are both useful. (Constraint is one of the reasons to use a finite state machine in the first place.) An overly constrained system can't function, an overly flexible system can't be relied upon. You'll answer your question the day you encounter a problem that NoSQL is a bad fit for, and after you learn SQL it won't feel so alien or complicated. Or maybe that day will never come and you'll never feel the need to learn SQL.
For a motivating example, consider a billing or accounting system. This domain is well known, we probably don't need the ability to rapidly evolve our schema. If we violate the constraints of this system, there may be serious consequences, like spending money we don't have or billing for the service twice.
We could build this system with either an NoSQL or a SQL database, but SQL would seem to me to be the natural choice.
I understand what you're saying, but in the example for accounting, we can also solve this problem using NoSQL. Because the most important feature we're talking about there is transaction support. Similarly, schema-on-write can be provided by a library.
To me it seems like NoSQL works better when there is less to normalize, which is the case with microservices. Those services struggle with the support for a distributed transaction when they have to make a distributed transaction. This problem will be very easily solved in SQL (assuming its not shared to completely denormalize everything for performance).
Note that this normalization problem also shows up in schema-on-write. If multiple people are contributing to a schema from different teams, then it will become hard to maintain a schema-on-write.
NoSQL will bring it's own set of issues right? Too flexible of a system with no built in Validations would mean that they need to be handled somewhere else. If we take the example in the post, refund should not precede awaiting payment, If a new status gets added, it becomes easy to know where the migrations have to be run and in NoSQL, either we write something custom or handle it each time the document is called.
I mean, you shouldn't be throwing shit at the DB and seeing what sticks, but why not have the absolute source of truth also be the validator? CHECK constraints are fairly cheap.