I can't figure out exactly how it knows which chunk to download. Does it always download the whole index first? Or does it include it in the built JS file itself?
Both the index and table data are btrees. These are trees - the root node sits in some known location (offset) in the file, referenced by the file header and metadata. As SQLite traverses the tree, it encounters new descendents it would like to visit, presumably identified by their byte offset in the file, which is all needed for this VFS magic to issue a suitable range request.
- SQlite opens the file and reads 4kb worth of header -> range request for byte 0-4096
- headers/metadata refers to index table with root node at 8192kb
- user issues SELECT * from index WHERE name = 'foo'
- SQLite reads root node from the file (range request for 8192kb..)
- Root node indicates left branch covers 'foo'. Left branch node at address 12345kb
- Fetch left branch (range request for 12345kb)
- New node contains an index entry for 'foo', row 55 of data page at 919191kb
- SQLite reads data page (range request for 91919191kb..)
SQLite has runtime-pluggable VFS support, i.e. you give it a struct with functions for opening a file, reading some bytes, writing some bytes, synchronizing file contents, closing a file. This project provides such a VFS module, that, because it actually runs in the browser, performs HTTP requests to read data. Emscripten provides a way to run a mix of C/C++ code in the same environment as some JavaScript code inside the browser. The reason SQLite has this pluggable VFS support is to properly support embedded systems, different locking APIs, and things like database encryption.
That's part of SQLite. It has been optimised to reduce disk reads, because those can be slow on spinning hard drives. Coincidentally, this translates well into an optimised algorithm that minimises the amount of HTTP range requests to make.
The B-Tree is a tree that in this case is perfectly balanced. So if you do a query with an index in a database it will fetch an logarithmic amount of data from the index and then a constant amount of data from the table.
For the example the wdi_data table is 300MB and an index on it is 100MB in size. This index has a tree depth of 4 - which means SQLite has to read exactly 4 pages (4KiB) to get to the bottom of it and find the exact position of the actual row data.
you can check the depth of the b-trees with `sqlite3_analyzer`.
Everything in SQLite is stored in B-Trees. Data or indexes. So you don't need to download the whole index first; you only need to download the necessary pages of the trees to access data, whether it's part of an index or actual data
At least the information describing what tables and indices there are and where to find them - and then it gets what it needs once a query is run. Just like sqlite would if running from a local file on disk.
It has to download some parts ("pages") of the index as the query execution proceeds, and some header/schema description pieces of it first before execution starts.
If queries make use of indices, only a fraction of the database needs to be downloaded.
Also, you can use SQLite to query the DOM.