> In the example the author gives the query cost is very likely dominated by fin...

Normal_gaussian · on Dec 23, 2022

On the off chance you see this:

> Why would you expect that there's no index?

I don't. Whilst I didn't write it particularly eloquently, I included that it would be an index-scan if there was one. And like the full table scan, this is a 50 vs 1 cost (unless the querying ids are well sorted, at which point you'd maybe get a 5vs1 cost at best).

> I have never seen a single database system where the most basic primary key A.id wasn't indexed.

Nobody has said it was a primary key. In fact it very likely isn't. All we really know is that there were approx 50k rows selected from B; we do not know how many are matched in A.

As they are using BigQuery and the queries are taking such a long time, it would be reasonable to assume A is some large dataset clustered around some other value (e.g. timestamp). But that itself would be an assumption.

> [streaming] does require you to keep a client connection open for 50x longer than batching would,

It does not. It will be less time.

---

Looking at pg.

I'm trying really hard to see a your point. As far as I can tell, you're bothered by the working set memory of the query caused by the join exceeding a limit and causing contention - this is the only time the streamed join is worse than the batching. On an index join this would have to be a very large table.

As for CPU contention - its a non-issue.

There may be a point related to time-to-execute with respect to lock contention.

Regardless, if either lock or index memory contention are problems for you then you will still want to `JOIN` - just against a subquery/cte with limit and offset.

Roundtripping is not the answer!