"For the OR clause query, the DBMS sequentially compares each condition's value one by one. In the given example, it performs three comparisons for each row to determine if (price = 1), (price = 2), and then (price = 3). This evaluation approach means that for N predicates, the complexity of the filtering operation per row is O(N).
On the other hand, with the IN clause, PostgreSQL builds a temporary hash table populated with the elements in the query’s IN clause (ExecEvalHashedScalarArrayOp). Then as the DBMS scans each tuple, it probes this hash table to see whether the tuple's attribute matches with any entry. This hash-based evaluation has a more efficient complexity of O(1) since the DBMS only needs to perform one lookup in the hash table per tuple.
Hence, as the number of predicates increases in a query, the OR clause incurs significant overhead because the DBMS evaluates predicates for each row individually, in contrast to the more efficient IN clause."
Why is there no optimization in place for this? Converting a=x or a=y or a=z to a in(x,y,z) should be trivial and the db should have heuristics to calculate the expected query cost to decide when to apply this transformation.
Yes, and Postgres remains staunchly opposed to planner hints because the planner knows better! It always computes the optimal query plan!
The "optimal query plan" changes at the drop of a hat, as you can see in this case. Absolutely trivial syntax changes result in a completely different query, sometimes turning a sequential scan into an index-only scan or vice versa. So, 100x difference in query time, it doesn't just do that for small tables.
At the scale of three comparisons, performance is equivalent enough that there's no gain.
At the scale of 5,000 comparisons, the language's structure itself heavily disincentivizes writing it as an OR query, unless you're using a code generator that doesn't care about language structure.
My wild guess is that it's a rare enough corner case that it wasn't worth burning the resources on yet.
My gut says it's a resource issue. Oracle has the time and money to optimize a gajillion scenarios that increase performance without making developers think things through. Which makes the product easier to use and seem faster. It's very simple to code this one, but then there are 10,000 other cases you need to code too to make it cover a large percentage of potential optimizations.
Indeed, Postgres may be wildly popular these days thanks to being $free, but it's far from being the most intelligent relational database engine on offer.
It's not that easy to analyze a binary logic predicate to extract a set of possible values (especially given the three-value logic that SQL uses, where `a=x or a≠x` is not true). It's easy in the easy cases of course, but doing the analysis for a more complex expression quickly gets ugly. And I'm not sure it's easy to tell ahead of time if an expression will be easy or hard to analyze. They could hard-code a few simple patterns, but at that point it may get very unpredictable whether the optimization will be applied or not, defeating the purpose.
"For the OR clause query, the DBMS sequentially compares each condition's value one by one. In the given example, it performs three comparisons for each row to determine if (price = 1), (price = 2), and then (price = 3). This evaluation approach means that for N predicates, the complexity of the filtering operation per row is O(N).
On the other hand, with the IN clause, PostgreSQL builds a temporary hash table populated with the elements in the query’s IN clause (ExecEvalHashedScalarArrayOp). Then as the DBMS scans each tuple, it probes this hash table to see whether the tuple's attribute matches with any entry. This hash-based evaluation has a more efficient complexity of O(1) since the DBMS only needs to perform one lookup in the hash table per tuple.
Hence, as the number of predicates increases in a query, the OR clause incurs significant overhead because the DBMS evaluates predicates for each row individually, in contrast to the more efficient IN clause."