Hacker News new | past | comments | ask | show | jobs | submit login

The page is the result of this exchange on Twitter:

https://twitter.com/gorilla0513/status/1784756577465200740

I was surprised to receive a reply from you, the author. Thank you :) Since I'm a novice with both compilers and databases, could you tell me what the advantages and disadvantages are of using a VM with SQLite?

https://twitter.com/DRichardHipp/status/1784783482788413491

It is difficult to summarize the advantages and disadvantages of byte-code versus AST for SQL in a tweet. I need to write a new page on this topic for the SQLite documentation. Please remind me if something does not appear in about a week.




I am amazed that the author (D Richard Hipp) made an effort to find and respond to a tweet that was (1) not directed/"@tted" at him or (2) written in his native language of English (original tweet is in Japanese[1]).

[1] https://twitter.com/gorilla0513/status/1784623660193677762


He might have a google alert for sqlite.org


Side note, but I'm amazed that anyone that is not a journalist or a politician still actively uses X/twitter. Everyone I used to follow has stopped.


Simple, I'm on Mastodon, Substack, Threads and Bluesky.

And they are all barely breathing.

The community density is low (because of the split), the discovery sucks (because of decentralization and no algo to suggest content), the culture is homogeneous (I encounter people mostly from one political side) and the publication of limited quality.

You can't start new communities from old users. Communities are built by the young, they are the ones with the creativity, motivation and free time to do it.

That's why twitter, made mostly by young people from 20 years ago, still got the inertia of it, are is more interesting and active.

And that's why tik tok and roblox are full of life. I'm sure the kids are also elsewhere, where we are not looking, creating the communities of tomorrow.


I’d argue with that bit about creativity and youth.

In sports, for instance, if you are talented and 15 there are pipelines into established and pro sports that make sense to follow. If you are 30 and missed your chance for that you can still be a pioneer in an emerging “extreme” sport.

A lot of young people seem to think creativity is 99% asking for permission and 1% “just do it”, it takes some experience before people realize it is really the other way around.


I've never seen more people use it, as in being actually active on it, and it pops up everywhere due to the community note memes. But yeah I really need to get around creating a Mastodon account since some very good posters moved there too.


I refuse to use it and I can't even view tweet "threads" properly, nor replies or any kind of coherent timeline.... its unusable


I'm amazed people like yourself care so much about it to write a post like this. Doesn't your brain have something better to think about beyond artificial outrage?


"I'm surprised people use $WHATEVER because nobody uses it" usually means "I don't like $WHATEVER and wish you would stop using it."

I see this frequently directed towards reddit and twitter.


[flagged]


I stopped using Twitter because "Space Man" banned most of one political side. Maybe if you didn't notice that, you're in a bubble?


I've not been following this stuff. Whom did Musk ban?


Like most large social media sites, pre-Musk Twitter had some combination of algorithms and bureaucracy that act as a chaos monkey to shadowban and suspend accounts apparently at random, often with an inability to comprehend satire or under incoherent or undisclosed pretexts. The site's users and staff were predominantly left-leaning, so right-leaning accounts were disproportionately the victims of these false positives because of human bias in which posts are flagged and how they're subsequently moderated.

Musk bought the site under a claim of doing something about the bias after Twitter suspended the account of a right-leaning satire site. Many of the existing users were displeased by this, because they liked having a site whose moderation system was biased in their favor, so some of them left. Meanwhile new right-leaning accounts were created as you might expect. This affected the ideological balance of the site, so even though it's still somewhat left-leaning it's less than before. And now when the drunk chaos monkey suspends an account on the left, people are finding new satisfaction in their ability to blame Musk in particular.


Musk has personally bragged about banning a lot of left-leaning accounts and unbanning right-leaning. What is your evidence that the site still leans left?


Bans in either direction are only a small percentage of accounts. People with large followings have a large disincentive to leave (have to start over), and new users haven't had long to build followings, so most large accounts are the same ones they were before he bought the company.

Musk is not actually a right-wing figure -- he smokes pot, makes electric cars, isn't religious, etc. But he isn't a left-wing figure either, which confuses people who can't contemplate that the same person could simultaneously e.g. support gay marriage and think peculiar pronouns are silly.

The result is that he's more inclined to ban accounts he doesn't like, but what he doesn't like isn't inherently associated with any particular party. And if you look at the "left-leaning" accounts he's suspended, it's the likes of Aaron Rupar and Taylor Lorenz, who... well, here it is:

https://www.urbandictionary.com/define.php?term=rupar

They're the sort of accounts that get themselves suspended once there is no longer anything protecting them from getting themselves suspended.


There are three approaches:

1. interpreted code

2. compiled then interpreted bytecode

3. compiled machine code

The further up, the simpler.

The further down, the faster.


Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.

So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.

A key point to keep in mind is that SQLite bytecodes tend to be very high-level (create a B-Tree cursor, position a B-Tree cursor, extract a specific column from a record, etc). You might have had prior experience with bytecode that is lower level (add two numbers, move a value from one memory location to another, jump if the result is non-zero, etc). The ratio of bytecode overhead to actual work done is much higher with low-level bytecode. SQLite also does these kinds of low-level bytecodes, but most of its time is spent inside of high-level bytecodes, where the bytecode overhead is negligible compared to the amount of work being done.


Part of the benefit of compiling bytecode (or anything) is specializing code to the context (types, values, etc) in which it appears. While I don't doubt your analysis, it could be the case that compiled C code in question is full of branches that can be folded away when specialized to the context of the query, such as the structure of the rows, the type and values of columns, etc.

Basically all of what you are saying about high-level bytecodes applies to dynamic languages, too. But they benefit highly from specializing each bytecode given static and dynamic context, and shortcutting dataflow through local variables.


There's usually a cost to shuttling data between bytecodes too. When two are fused together the second can lift the data from wherever the first wanted to leave it, as opposed to routing through a fixed location. Might be what you mean by shortcutting dataflow?

Also doing control flow in bytecode is usually slower than doing it in the native code.

I wonder if the context in which the instructions occur is sufficiently finite in sqlite for ahead of specialisation of the bytecode to be better. That is, the program you're operating on isn't known until JIT time, but the bytecode implementations are. SQL should correspond to an unusually specific set of operations relative to a general purpose language implementation.

The compiler will notice some values are constant when working with the bytecode. It can know ahead of time which arguments correspond to folding branches within the bytecode instructions and specialise correspondingly. If that works, you've emitted a sequence of calls into opcodes which are less branchy than they would otherwise be, at which point the opcode implementations start to look like basic blocks and a template JIT to machine code beckons.


> Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records

Emphasis added. This is because of SQLite3's varint encoding method for numbers. Performance-wise it was probably a mistake, though it's a form of compression, which might have paid off in terms of space.

(I seem to remember seeing something, possibly by you, about this before.)

I wonder if it would be possible to replace the varint encoding... Yes, it'd be a compatibility break in that older SQLite3s couldn't open newer DBs.


Looks like SQLIte bytecode is similar to IBM RPG language from "IBM i" platform, which is used directly, without translation from SQL :)

Edit: PRG->RPG.


I'm curious. Would you have a pointer to the documentation of PRG language?


Oooops, it is IBM RPG, not PRG! My bad!

There are some links in Wikipedia.

I never used it, but only read big thread about SQL vs RPG on Russian-speaking forum, and there was a ton of examples from person who works with IBM i platform in some big bank. Basic operations look like SQLite bytecodes: open table, move cursor after record with key value "X", get some fields, update some field, plus loops, if's, basic arithmetic, etc.


For those that aren’t familiar, IBM i is ye olde A/S 400.

To be fair, RPG has been continuously updated and the version I’ve seen has Java integration.

It’s somewhat amusing that a single developer using Python can run circles around entire teams of RPG programmers. Technology marches on. :)


Well, having programmed RPG professionally, there is no question that a python programmer would clock any RPG programmer. RPG is a terrible way to program.


That kind of manipulation isn't limited to RPG. I believe dBase and descendants also work similarly with manual cursor manipulation.


Ah. I did RPG on a system 33, a precursor to the AS 400.


32, 34 or 36? or 38 which is very different and the real R system that leads to as/400 or I … s/33?


Sorry, good point. It was a 34.


It is possible to construct a worst case scenario for bytecode execution, for example very complex expressions in the WHERE and/or SELECT clauses that compute values, and a query plan that performs a full table scan over a table with say 100 million rows that is cached in RAM (or maybe use generate_series, whatever works best).

Computing the expressions should dominate execution time, right?

Then, to compare against the best possible case, we can write a custom C program that uses sqlite internals to perform the same task (full scan of the table, extract values from row) and does not use the bytecode VM and computes the complex expressions in regular C code (e.g. a function that accepts floats and returns a float or whatever).

Then comparing the two implementations will tell us how much faster sqlite can be if it had a "perfect" JIT.


> A key point to keep in mind is that SQLite bytecodes tend to be very high-level

That's right. "Bytecode" is a spectrum. SQLite's bytecode is higher-level than e.g. WASM, JVM, Python, etc. (Notably, because the original source code is higher-level.)


A while back was musing if it was possible to come up with something resembling the instruction set for a CPU for an abstract relational-engine. Is this basically what SQLite is doing with bytecodes?


I think yes, essentially. Bytecode running on a software interpreter is the same thing as machine code running on a silicon interpreter. Probably slower, probably a different ISA, but very much the same idea.

The OP suggests the sqlite bytecode also has arithmetic & control flow in it which you might want to exclude in other settings. There's a parallel with cisc vs risc instruction sets here, and to calls into a compiler runtime vs expanding instructions into larger numbers of more primitive ones in place.

@rkrzr there's a circular definition in your "no" - if "CPU" means "an interpreter that executes instructions simpler than SQL" then this is indeed not an instruction set. If "CPU" means "interpreter of a given ISA" then it could be. The virtual machine in the sqlite codebase is the definition of the CPU for this instruction set. Unless you want to define CPU as exclusive of software implementations of interpreters.


No. The SQLite bytecode is much higher-level than the instruction set of a CPU.


> Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements

> compiling all the way down to machine code might provide a performance boost 3% or less

This logic doesn't seem sound? Because the application now spends 3% on bytecode dispatch doesn't tell us anything about how long it would instead spend on e.g. interpreting SQL.


He’s saying in the best case scenario that 3% would go to 0. Therefore the reality would probably be even less.


Unfortunately you can’t do performance analysis this way but I think the overall point that it’s a fraction probably still stands as you’d expect the I/O work to dominate.

The reason you can’t do the analysis the way that you explicitly stated (which fairly is what was implied) is that when you lower the code to machine code you typically get rid of branches in the execution path. Since branches slow down execution by a disproportionate amount vs how long they themselves take, it’s easy to get a >3% boost getting rid of code paths that seem like there’s only 3% of room.

What the author also failed to mention is that they’ve gone on many optimization hunts eeking out less than a percent here or there to get a cumulative boost, so 3% isn’t anything to sneeze at.

That being said, the broader point which is valid is that the maintenance isn’t worth the performance profile that SQLite targets not to mention that JITs open up an attack vector for security exploits. So the cost vs benefit is squarely in the interpreted side for now.

Edit: Forgot to mention that performance tuning a JIT to work well is also hard - for small queries you’ll spend more time compiling the byte code than you would just executing. That’s why all the big JS engines do a tiered approach where each tier optimizes a bit more.


4. lots of small building blocks of static machine code precompiled/shipped with DB software binary, later iterated & looped through based on the query plan the optimizer came up with. Oracle does this with their columnar/vector/SIMD processing In-Memory Database option (it's not like LLVM as it doesn't compile/link/rearrange the existing binary building blocks, just jumps & loops through the existing ones in the required order)

Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.


Isn't what you're describing just an interpreter, bytecode or otherwise? An interpreter walks through a data structure in order to determine which bits of precompiled code to execute in which order and with which arguments. In a bytecode interpreter the data structure is a sequential array, in a plain interpreter it's something more complicated like an AST.

Is this just the same as "interpret the query plan" or am I missing something?


It interprets where it needs to jump and then jumps to the precompiled binary machine code locations that are executed natively on the CPU, no interpretation going on (I guess it's conceptually something like instruction traces inside the CPU trace cache, but done at higher level). These precompiled building blocks are shipped in separate libraries, one library for each CPU architecture on a platform (SSE4.2, AVX, AVX2, AVX-521 on AMD64 and other SIMD architectures on other platforms that Oracle supports). Oracle dynamically loads the correct library matching the CPUs capability during the startup and starts jumping there as needed.

So this is not bytecode interpretation, it's native machine code executed directly on CPU whenever the SQL exec engine decides to jump there. This is different from, say Apache Impala that compiles new LLVM machine code for each query execution for the query's scanning tight loops.

Edit: I guess why I see this as not just not regular bytecode interpretation (I don't actually know that much about it in general), is that these building blocks can include various loops (and can do their thing on entire vectors of data), so looks like they can push quite a bit of complexity into the machine code sections, before returning back to the normal interpreted AST/opcode land.


> lots of small building blocks of static machine code precompiled/shipped with DB software binary

You might call those interpreter operations.


That’s a really cool idea! Is there any writeups or articles about this in more detail?


It's called a template JIT. You get to remove the interpreter control flow overhead but tend to end up with a lot of register shuffling at the boundaries. Simpler than doing things properly, usually faster than a bytecode interpreter.


Sounds like 'copy & patch compilation' is a close cousin.


> 2. compiled then interpreted bytecode

You can also compile to bytecode (when building), and then compile that bytecode to the machine code (when running). That way, you can take advantage of the exact processor that is running your program.

This is the approach taken by .NET and Java, and I presume most other runtime environments that use bytecode.


There is also the option of native code generation from bytecode at install time as opposed of runtime.


Aka missing out on all the delicious profiling information approaches with later translation enjoy. There's no simple best answer.


I'm probably wrong, but I always thought for platforms like .Net and JVM, AoT delivers the same performance as JIT in most cases. Since a lot of information is available before runtime, unlike JS where VM always needs to be read, ditch optimized byte code and go back to the whiteboard.


As a rule of thumb: AOT has shorter cold start time, but similar overall performance to JIT.

In fact, JIT can perform slightly better once the program reaches "steady state" because it can target the exact processor running and also perform the dynamic profile-guided optimization, which may or may not yield meaningful improvements in real life.

Nick Chapsas did some benchmarking of .NET AOT vs JIT, if you are interested:

https://youtu.be/gJcPqdbKF90?si=PnSPnFvVhr0pjLL-


There are a few concerns here that make it confusing for the general public to reason about AOT vs JIT performance.

Different JIT compilers have different levels of sophistication, being able to dynamically profile code at runtime or otherwise, and the range of applied optimizations of this type may also vary significantly.

Then, AOT imposes a different restrictions in the form of what types of modularity the language/runtime wants to support (Swift ABI vs no AOT ABI in .NET NativeAOT or GraalVM Native Image), what kind of operations modifying type system are allowed, if any, and how the binary is produced.

In more trivial implementations, where JIT does not perform any dynamic recompilation and profile based optimizations, the difference might come down to simply pre-JITing the code, with the same quality of codegen. Or the JIT may be sophisticated by AOT mode might still be a plain pre-JIT which makes dynamic profiling optimizations off-limits (which was historically the case for .NET's ReadyToRun and partially NGEN, IIRC JVM situation was similar up until GrallVM Native Image).

In more advanced implementations, there are many concerns which might be at odds with each other: JIT compilers have to maintain high throughput in order to be productive but may be able to instrument intermediate compilations to gather profile data, AOT compilers, in the absence of static profile data (the general case), have to make a lot more assumptions about the code, but can reason about compiled code statically, assuming such compilation comes with "frozen world" type of packaging (rather than delegating dynamic parts to emulation with interpreter).

And then there are smaller details - JIT may not be able to ever emit pure direct calls for user code where a jump is performed to an immediate encoded in the machine code, because JIT may have to retain the ability to patch and backpatch the callsites, should it need to de/reoptimize. Instead, the function addresses may be stored in the memory, and those locations are encoded instead where calls are emitted in the form of dereference of a function pointer from a static address and then a jump to it.

JIT may also be able to embed the values of static readonly fields as JIT constants in codegen, which .NET does aggressively, but is unable to pre-initialize such values by interpreting static initialization at compile time in such an aggressive way that AOT can (constexpr style).

So in general, a lot of it comes down to offering a different performance profile. A lot of the beliefs in AOT performance stem from the fact lower-level languages rely on it, and the compilers offering it are very advanced (GCC and Clang mainly), which can expend very long compilation times on hundreds of optimization passes JIT compilers cannot. But otherwise, JIT compilers can and do compete, just a lot of modern day advanced implementations are held back by the code that they are used for, in particular in Java land where OpenJDK is really, really good, but happens to be hampered by being targeted by Java and JVM bytecode abstraction, which is not as much of a limitation in C# that can trade blows with C++ and Rust quite confidently the moment the abstraction types match (when all use templates/struct generics for example).

More on AOT and JIT optimizations (in the case of .NET):

- https://devblogs.microsoft.com/dotnet/performance-improvemen...

- https://migeel.sk/blog/2023/11/22/top-3-whole-program-optimi...

If someone has similar authoritative content on what GraalVM Native Image does - please post it.


You can also parse the code into an augmented syntax tree or code DOM and then directly interpret that. This approach eliminates the intermediate code and bytecode machine at the cost of slower interpretation. It's slower due to memory cache and branch prediction issues rather than algorithmic ones.


For simple functions which are not called repeatedly you have to invert this list - interpreted code is fastest and compiling the code is slowest.

The complied code would still be faster if you excluded the compilation time. It's just that the overhead of compiling it is sometimes higher than the benefit.


I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?

There are situations where the analyst enters SQL into the computer interactively. But in those cases the overhead of compiling does not really matter since this is infrequently done, and there is only a single user asking for the operation of running the SQL.


> I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?

You would hope, but no that's not reality. In reality most of the people "writing SQL" in a business scenario don't even know SQL, they're using an ORM like Hibernate, SQLAlchemy, or similar which is dynamically generating queries that likely subtly change every time the application on top changes.


I didn't mean to imply that compilation never makes sense, I'm only pointing out that it isn't always the fastest way of executing a query

In situations where you have a limited number of queries and these are used repeatedly then some kind of cache for the complied code will be able to amortise the cost of compilation.

I certainly agree that a situation where an analyst enters SQL manually is a weird niche and not common. However most applications can dynamically construct queries on behalf of users and so getting a good hit rate on your cache of precompiled queries isn't a certainty.


As far as I know, database often does query caching. So it doesn't have to recompile the query all the time.

On the other hand, query plan may depend on the specific parameters - the engine may produce a different plan for a query for users with deleted=true (index scan) and for deleted=false (99% of users are not deleted, not worth it to use index).


APL interpreters are tree-walking, but they're backed by C procedures. Then GCC optimizes quite well and you get excellent performance! With stuff like vector instructions.

Getting on par with GCC/Clang with your own JIT is pretty hefty.


There is also JIT bytecode.


Yes, the timing of that compilation of bytecode and machine code can be either AOT or JIT.

For example, Java/JVM compiles bytecode AOT and compiles machine code JIT.


optimizing runtimes (usually bytecode) can beat statically compiled code, because they can profile the code over time, like the JVM.

... which isn't going to close the cold startup execution gap, which all the benchmarks/lies/benchmarks will measure, but it is a legitimate thing.

I believe Intel CPUs actually sort of are #2. All your x86 is converted to microcode ops, sometimes optimized on the fly (I remember Intel discussing "micro ops fusion" a decade ago I think).


Yeah, for example, movs between registers are generally effectively no-ops and handled by the register renaming hardware.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: