It is still under the limit today with 362,107,148 repositories and 818,516,506 ...

ssalka · on Feb 18, 2024

I'm guessing this won't be including issues & PRs from private repos, which could be substantial

zx8080 · on Feb 18, 2024

Elapsed: 12.618 sec, read 7.13 billion rows, 42.77 GB

This is too long, seems the ORDER BY is not set up correctly for the table.

zx8080 · on Feb 18, 2024

Also,

> `repo_name` LowCardinality(String),

This is not a low cardinality:

7133122498 = 7.1B

Don't use low cardinality for such columns!

zX41ZdbW · on Feb 19, 2024

The LowCardinality data type does not require the whole set of values to have a low cardinality. It benefits when the values have locally low cardinality. For example, if the number of unique values in `repo_name` is a hundred million, but for every million consecutive values, there are only ten thousand unique, it will give a great speed-up.

zx8080 · on Feb 22, 2024

> LowCardinality data type does not require the whole set of values to have a low cardinality.

Don't mislead others. It's not true unless low_cardinality_max_dictionary_size is set to some other value than the default one: 8192.

It does not work well for hundred million values.

zX41ZdbW · on Feb 19, 2024

This is an ad-hoc query. It does a full scan, processing slightly less than a billion rows per second on a single machine, and finishes in a reasonable time with over 7 billion events on GitHub from 2015. While it does not make sense to optimize this table for my particular query, the fact that it works well for arbitrary queries is worth noting.

nly · on Feb 18, 2024

That query took a long time