Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is still under the limit today with 362,107,148 repositories and 818,516,506 unique issues and pull requests:

https://play.clickhouse.com/play?user=play#U0VMRUNUIHVuaXEoc...




I'm guessing this won't be including issues & PRs from private repos, which could be substantial


Elapsed: 12.618 sec, read 7.13 billion rows, 42.77 GB

This is too long, seems the ORDER BY is not set up correctly for the table.


Also,

> `repo_name` LowCardinality(String),

This is not a low cardinality:

7133122498 = 7.1B

Don't use low cardinality for such columns!


The LowCardinality data type does not require the whole set of values to have a low cardinality. It benefits when the values have locally low cardinality. For example, if the number of unique values in `repo_name` is a hundred million, but for every million consecutive values, there are only ten thousand unique, it will give a great speed-up.


> LowCardinality data type does not require the whole set of values to have a low cardinality.

Don't mislead others. It's not true unless low_cardinality_max_dictionary_size is set to some other value than the default one: 8192.

It does not work well for hundred million values.


This is an ad-hoc query. It does a full scan, processing slightly less than a billion rows per second on a single machine, and finishes in a reasonable time with over 7 billion events on GitHub from 2015. While it does not make sense to optimize this table for my particular query, the fact that it works well for arbitrary queries is worth noting.


That query took a long time




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: