TL;DR: Compile SQLite to JS with emscripten, implement a virtual fs in JS to str...

pudmaidai · on May 2, 2021

I can't figure out exactly how it knows which chunk to download. Does it always download the whole index first? Or does it include it in the built JS file itself?

cookguyruffles · on May 2, 2021

Both the index and table data are btrees. These are trees - the root node sits in some known location (offset) in the file, referenced by the file header and metadata. As SQLite traverses the tree, it encounters new descendents it would like to visit, presumably identified by their byte offset in the file, which is all needed for this VFS magic to issue a suitable range request.

- SQlite opens the file and reads 4kb worth of header -> range request for byte 0-4096

- headers/metadata refers to index table with root node at 8192kb

- user issues SELECT * from index WHERE name = 'foo'

- SQLite reads root node from the file (range request for 8192kb..)

- Root node indicates left branch covers 'foo'. Left branch node at address 12345kb

- Fetch left branch (range request for 12345kb)

- New node contains an index entry for 'foo', row 55 of data page at 919191kb

- SQLite reads data page (range request for 91919191kb..)

etc etc etc

jaza · on May 3, 2021

Thanks, I too was struggling to understand how it's able to do such efficient targeted range requests, you explained it nicely.

formerly_proven · on May 2, 2021

SQLite has runtime-pluggable VFS support, i.e. you give it a struct with functions for opening a file, reading some bytes, writing some bytes, synchronizing file contents, closing a file. This project provides such a VFS module, that, because it actually runs in the browser, performs HTTP requests to read data. Emscripten provides a way to run a mix of C/C++ code in the same environment as some JavaScript code inside the browser. The reason SQLite has this pluggable VFS support is to properly support embedded systems, different locking APIs, and things like database encryption.

https://www.sqlite.org/vfs.html

mvanaltvorst · on May 2, 2021

That's part of SQLite. It has been optimised to reduce disk reads, because those can be slow on spinning hard drives. Coincidentally, this translates well into an optimised algorithm that minimises the amount of HTTP range requests to make.

fulafel · on May 2, 2021

B-Tree indexes are designed to work like this, to require a low number of IO operations. The index contains pointers to other places in the index.

pudmaidai · on May 2, 2021

So the answer is “yes, it has to download the index first”? None of these comments answer my question.

phiresky · on May 2, 2021

The B-Tree is a tree that in this case is perfectly balanced. So if you do a query with an index in a database it will fetch an logarithmic amount of data from the index and then a constant amount of data from the table.

For the example the wdi_data table is 300MB and an index on it is 100MB in size. This index has a tree depth of 4 - which means SQLite has to read exactly 4 pages (4KiB) to get to the bottom of it and find the exact position of the actual row data.

you can check the depth of the b-trees with `sqlite3_analyzer`.

rakoo · on May 2, 2021

Everything in SQLite is stored in B-Trees. Data or indexes. So you don't need to download the whole index first; you only need to download the necessary pages of the trees to access data, whether it's part of an index or actual data

detaro · on May 2, 2021

At least the information describing what tables and indices there are and where to find them - and then it gets what it needs once a query is run. Just like sqlite would if running from a local file on disk.

tehbeard · on May 2, 2021

It just needs to download the pages (x KB chunks) to traverse from the header to the particular index and table schema, a fraction of the whole data.

fulafel · on May 3, 2021

It has to download some parts ("pages") of the index as the query execution proceeds, and some header/schema description pieces of it first before execution starts.