Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

20 GB of JSON is correct; here’s the entire dump straight from the API up to last Monday:

  $ du -c ~/feepsearch-prod/datasource/hacker-news/data/dump/*.jsonl | tail -n1
  19428360        total
Not sure how your sqlite file is structured but my intuition is that the sizes being roughly the same sounds plausible: JSON has a lot of overhead from redundant structure and ASCII-formatted values; but sqlite has indexes, btrees, ptrmaps, overflow pages, freelists, and so on.


Sqlite also doesn’t have fixed types, but uses a tagged value system to store data. Well according to what I’ve read on the topic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: