The page is the result of this exchange on Twitter: https://twitter.com/gorilla0...

coolandsmartrr · on April 30, 2024

I am amazed that the author (D Richard Hipp) made an effort to find and respond to a tweet that was (1) not directed/"@tted" at him or (2) written in his native language of English (original tweet is in Japanese[1]).

[1] https://twitter.com/gorilla0513/status/1784623660193677762

bena · on April 30, 2024

He might have a google alert for sqlite.org

jonp888 · on April 30, 2024

Side note, but I'm amazed that anyone that is not a journalist or a politician still actively uses X/twitter. Everyone I used to follow has stopped.

BiteCode_dev · on April 30, 2024

Simple, I'm on Mastodon, Substack, Threads and Bluesky.

And they are all barely breathing.

The community density is low (because of the split), the discovery sucks (because of decentralization and no algo to suggest content), the culture is homogeneous (I encounter people mostly from one political side) and the publication of limited quality.

You can't start new communities from old users. Communities are built by the young, they are the ones with the creativity, motivation and free time to do it.

That's why twitter, made mostly by young people from 20 years ago, still got the inertia of it, are is more interesting and active.

And that's why tik tok and roblox are full of life. I'm sure the kids are also elsewhere, where we are not looking, creating the communities of tomorrow.

PaulHoule · on April 30, 2024

I’d argue with that bit about creativity and youth.

In sports, for instance, if you are talented and 15 there are pipelines into established and pro sports that make sense to follow. If you are 30 and missed your chance for that you can still be a pioneer in an emerging “extreme” sport.

A lot of young people seem to think creativity is 99% asking for permission and 1% “just do it”, it takes some experience before people realize it is really the other way around.

mardifoufs · on April 30, 2024

I've never seen more people use it, as in being actually active on it, and it pops up everywhere due to the community note memes. But yeah I really need to get around creating a Mastodon account since some very good posters moved there too.

p9fus · on April 30, 2024

I refuse to use it and I can't even view tweet "threads" properly, nor replies or any kind of coherent timeline.... its unusable

infecto · on April 30, 2024

I'm amazed people like yourself care so much about it to write a post like this. Doesn't your brain have something better to think about beyond artificial outrage?

lupusreal · on April 30, 2024

"I'm surprised people use $WHATEVER because nobody uses it" usually means "I don't like $WHATEVER and wish you would stop using it."

I see this frequently directed towards reddit and twitter.

zarathustreal · on April 30, 2024

[flagged]

immibis · on April 30, 2024

I stopped using Twitter because "Space Man" banned most of one political side. Maybe if you didn't notice that, you're in a bubble?

cryptonector · on April 30, 2024

I've not been following this stuff. Whom did Musk ban?

AnthonyMouse · on April 30, 2024

Like most large social media sites, pre-Musk Twitter had some combination of algorithms and bureaucracy that act as a chaos monkey to shadowban and suspend accounts apparently at random, often with an inability to comprehend satire or under incoherent or undisclosed pretexts. The site's users and staff were predominantly left-leaning, so right-leaning accounts were disproportionately the victims of these false positives because of human bias in which posts are flagged and how they're subsequently moderated.

Musk bought the site under a claim of doing something about the bias after Twitter suspended the account of a right-leaning satire site. Many of the existing users were displeased by this, because they liked having a site whose moderation system was biased in their favor, so some of them left. Meanwhile new right-leaning accounts were created as you might expect. This affected the ideological balance of the site, so even though it's still somewhat left-leaning it's less than before. And now when the drunk chaos monkey suspends an account on the left, people are finding new satisfaction in their ability to blame Musk in particular.

immibis · on May 4, 2024

Musk has personally bragged about banning a lot of left-leaning accounts and unbanning right-leaning. What is your evidence that the site still leans left?

AnthonyMouse · on May 5, 2024

Bans in either direction are only a small percentage of accounts. People with large followings have a large disincentive to leave (have to start over), and new users haven't had long to build followings, so most large accounts are the same ones they were before he bought the company.

Musk is not actually a right-wing figure -- he smokes pot, makes electric cars, isn't religious, etc. But he isn't a left-wing figure either, which confuses people who can't contemplate that the same person could simultaneously e.g. support gay marriage and think peculiar pronouns are silly.

The result is that he's more inclined to ban accounts he doesn't like, but what he doesn't like isn't inherently associated with any particular party. And if you look at the "left-leaning" accounts he's suspended, it's the likes of Aaron Rupar and Taylor Lorenz, who... well, here it is:

https://www.urbandictionary.com/define.php?term=rupar

They're the sort of accounts that get themselves suspended once there is no longer anything protecting them from getting themselves suspended.

paulddraper · on April 30, 2024

There are three approaches:

1. interpreted code

2. compiled then interpreted bytecode

3. compiled machine code

The further up, the simpler.

The further down, the faster.

SQLite · on April 30, 2024

Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.

So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.

A key point to keep in mind is that SQLite bytecodes tend to be very high-level (create a B-Tree cursor, position a B-Tree cursor, extract a specific column from a record, etc). You might have had prior experience with bytecode that is lower level (add two numbers, move a value from one memory location to another, jump if the result is non-zero, etc). The ratio of bytecode overhead to actual work done is much higher with low-level bytecode. SQLite also does these kinds of low-level bytecodes, but most of its time is spent inside of high-level bytecodes, where the bytecode overhead is negligible compared to the amount of work being done.

titzer · on April 30, 2024

Part of the benefit of compiling bytecode (or anything) is specializing code to the context (types, values, etc) in which it appears. While I don't doubt your analysis, it could be the case that compiled C code in question is full of branches that can be folded away when specialized to the context of the query, such as the structure of the rows, the type and values of columns, etc.

Basically all of what you are saying about high-level bytecodes applies to dynamic languages, too. But they benefit highly from specializing each bytecode given static and dynamic context, and shortcutting dataflow through local variables.

JonChesterfield · on May 1, 2024

There's usually a cost to shuttling data between bytecodes too. When two are fused together the second can lift the data from wherever the first wanted to leave it, as opposed to routing through a fixed location. Might be what you mean by shortcutting dataflow?

Also doing control flow in bytecode is usually slower than doing it in the native code.

I wonder if the context in which the instructions occur is sufficiently finite in sqlite for ahead of specialisation of the bytecode to be better. That is, the program you're operating on isn't known until JIT time, but the bytecode implementations are. SQL should correspond to an unusually specific set of operations relative to a general purpose language implementation.

The compiler will notice some values are constant when working with the bytecode. It can know ahead of time which arguments correspond to folding branches within the bytecode instructions and specialise correspondingly. If that works, you've emitted a sequence of calls into opcodes which are less branchy than they would otherwise be, at which point the opcode implementations start to look like basic blocks and a template JIT to machine code beckons.

cryptonector · on April 30, 2024

> Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records

Emphasis added. This is because of SQLite3's varint encoding method for numbers. Performance-wise it was probably a mistake, though it's a form of compression, which might have paid off in terms of space.

(I seem to remember seeing something, possibly by you, about this before.)

I wonder if it would be possible to replace the varint encoding... Yes, it'd be a compatibility break in that older SQLite3s couldn't open newer DBs.

blacklion · on April 30, 2024

Looks like SQLIte bytecode is similar to IBM RPG language from "IBM i" platform, which is used directly, without translation from SQL :)

Edit: PRG->RPG.

wglb · on April 30, 2024

I'm curious. Would you have a pointer to the documentation of PRG language?

blacklion · on April 30, 2024

Oooops, it is IBM RPG, not PRG! My bad!

There are some links in Wikipedia.

I never used it, but only read big thread about SQL vs RPG on Russian-speaking forum, and there was a ton of examples from person who works with IBM i platform in some big bank. Basic operations look like SQLite bytecodes: open table, move cursor after record with key value "X", get some fields, update some field, plus loops, if's, basic arithmetic, etc.

kayodelycaon · on April 30, 2024

For those that aren’t familiar, IBM i is ye olde A/S 400.

To be fair, RPG has been continuously updated and the version I’ve seen has Java integration.

It’s somewhat amusing that a single developer using Python can run circles around entire teams of RPG programmers. Technology marches on. :)

wglb · on May 1, 2024

Well, having programmed RPG professionally, there is no question that a python programmer would clock any RPG programmer. RPG is a terrible way to program.

lmz · on May 1, 2024

That kind of manipulation isn't limited to RPG. I believe dBase and descendants also work similarly with manual cursor manipulation.

wglb · on May 1, 2024

Ah. I did RPG on a system 33, a precursor to the AS 400.

ngcc_hk · on May 1, 2024

32, 34 or 36? or 38 which is very different and the real R system that leads to as/400 or I … s/33?

wglb · on May 2, 2024

Sorry, good point. It was a 34.

wolf550e · on April 30, 2024

It is possible to construct a worst case scenario for bytecode execution, for example very complex expressions in the WHERE and/or SELECT clauses that compute values, and a query plan that performs a full table scan over a table with say 100 million rows that is cached in RAM (or maybe use generate_series, whatever works best).

Computing the expressions should dominate execution time, right?

Then, to compare against the best possible case, we can write a custom C program that uses sqlite internals to perform the same task (full scan of the table, extract values from row) and does not use the bytecode VM and computes the complex expressions in regular C code (e.g. a function that accepts floats and returns a float or whatever).

Then comparing the two implementations will tell us how much faster sqlite can be if it had a "perfect" JIT.

paulddraper · on April 30, 2024

> A key point to keep in mind is that SQLite bytecodes tend to be very high-level

That's right. "Bytecode" is a spectrum. SQLite's bytecode is higher-level than e.g. WASM, JVM, Python, etc. (Notably, because the original source code is higher-level.)

temporarely · on April 30, 2024

A while back was musing if it was possible to come up with something resembling the instruction set for a CPU for an abstract relational-engine. Is this basically what SQLite is doing with bytecodes?

JonChesterfield · on May 1, 2024

I think yes, essentially. Bytecode running on a software interpreter is the same thing as machine code running on a silicon interpreter. Probably slower, probably a different ISA, but very much the same idea.

The OP suggests the sqlite bytecode also has arithmetic & control flow in it which you might want to exclude in other settings. There's a parallel with cisc vs risc instruction sets here, and to calls into a compiler runtime vs expanding instructions into larger numbers of more primitive ones in place.

@rkrzr there's a circular definition in your "no" - if "CPU" means "an interpreter that executes instructions simpler than SQL" then this is indeed not an instruction set. If "CPU" means "interpreter of a given ISA" then it could be. The virtual machine in the sqlite codebase is the definition of the CPU for this instruction set. Unless you want to define CPU as exclusive of software implementations of interpreters.

rkrzr · on April 30, 2024

No. The SQLite bytecode is much higher-level than the instruction set of a CPU.

beeboobaa3 · on April 30, 2024

> Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements

> compiling all the way down to machine code might provide a performance boost 3% or less

This logic doesn't seem sound? Because the application now spends 3% on bytecode dispatch doesn't tell us anything about how long it would instead spend on e.g. interpreting SQL.

asimpletune · on April 30, 2024

He’s saying in the best case scenario that 3% would go to 0. Therefore the reality would probably be even less.

vlovich123 · on April 30, 2024

Unfortunately you can’t do performance analysis this way but I think the overall point that it’s a fraction probably still stands as you’d expect the I/O work to dominate.

The reason you can’t do the analysis the way that you explicitly stated (which fairly is what was implied) is that when you lower the code to machine code you typically get rid of branches in the execution path. Since branches slow down execution by a disproportionate amount vs how long they themselves take, it’s easy to get a >3% boost getting rid of code paths that seem like there’s only 3% of room.

What the author also failed to mention is that they’ve gone on many optimization hunts eeking out less than a percent here or there to get a cumulative boost, so 3% isn’t anything to sneeze at.

That being said, the broader point which is valid is that the maintenance isn’t worth the performance profile that SQLite targets not to mention that JITs open up an attack vector for security exploits. So the cost vs benefit is squarely in the interpreted side for now.

Edit: Forgot to mention that performance tuning a JIT to work well is also hard - for small queries you’ll spend more time compiling the byte code than you would just executing. That’s why all the big JS engines do a tiered approach where each tier optimizes a bit more.

tanelpoder · on April 30, 2024

4. lots of small building blocks of static machine code precompiled/shipped with DB software binary, later iterated & looped through based on the query plan the optimizer came up with. Oracle does this with their columnar/vector/SIMD processing In-Memory Database option (it's not like LLVM as it doesn't compile/link/rearrange the existing binary building blocks, just jumps & loops through the existing ones in the required order)

Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.

lolinder · on April 30, 2024

Isn't what you're describing just an interpreter, bytecode or otherwise? An interpreter walks through a data structure in order to determine which bits of precompiled code to execute in which order and with which arguments. In a bytecode interpreter the data structure is a sequential array, in a plain interpreter it's something more complicated like an AST.

Is this just the same as "interpret the query plan" or am I missing something?

tanelpoder · on May 1, 2024

It interprets where it needs to jump and then jumps to the precompiled binary machine code locations that are executed natively on the CPU, no interpretation going on (I guess it's conceptually something like instruction traces inside the CPU trace cache, but done at higher level). These precompiled building blocks are shipped in separate libraries, one library for each CPU architecture on a platform (SSE4.2, AVX, AVX2, AVX-521 on AMD64 and other SIMD architectures on other platforms that Oracle supports). Oracle dynamically loads the correct library matching the CPUs capability during the startup and starts jumping there as needed.

So this is not bytecode interpretation, it's native machine code executed directly on CPU whenever the SQL exec engine decides to jump there. This is different from, say Apache Impala that compiles new LLVM machine code for each query execution for the query's scanning tight loops.

Edit: I guess why I see this as not just not regular bytecode interpretation (I don't actually know that much about it in general), is that these building blocks can include various loops (and can do their thing on entire vectors of data), so looks like they can push quite a bit of complexity into the machine code sections, before returning back to the normal interpreted AST/opcode land.

paulddraper · on April 30, 2024

> lots of small building blocks of static machine code precompiled/shipped with DB software binary

You might call those interpreter operations.

FridgeSeal · on April 30, 2024

That’s a really cool idea! Is there any writeups or articles about this in more detail?

JonChesterfield · on May 1, 2024

It's called a template JIT. You get to remove the interpreter control flow overhead but tend to end up with a lot of register shuffling at the boundaries. Simpler than doing things properly, usually faster than a bytecode interpreter.

UncleEntity · on April 30, 2024

Sounds like 'copy & patch compilation' is a close cousin.

branko_d · on April 30, 2024

> 2. compiled then interpreted bytecode

You can also compile to bytecode (when building), and then compile that bytecode to the machine code (when running). That way, you can take advantage of the exact processor that is running your program.

This is the approach taken by .NET and Java, and I presume most other runtime environments that use bytecode.

gpderetta · on April 30, 2024

There is also the option of native code generation from bytecode at install time as opposed of runtime.

usrusr · on April 30, 2024

Aka missing out on all the delicious profiling information approaches with later translation enjoy. There's no simple best answer.

0x457 · on April 30, 2024

I'm probably wrong, but I always thought for platforms like .Net and JVM, AoT delivers the same performance as JIT in most cases. Since a lot of information is available before runtime, unlike JS where VM always needs to be read, ditch optimized byte code and go back to the whiteboard.

branko_d · on May 2, 2024

As a rule of thumb: AOT has shorter cold start time, but similar overall performance to JIT.

In fact, JIT can perform slightly better once the program reaches "steady state" because it can target the exact processor running and also perform the dynamic profile-guided optimization, which may or may not yield meaningful improvements in real life.

Nick Chapsas did some benchmarking of .NET AOT vs JIT, if you are interested:

https://youtu.be/gJcPqdbKF90?si=PnSPnFvVhr0pjLL-

neonsunset · on April 30, 2024

There are a few concerns here that make it confusing for the general public to reason about AOT vs JIT performance.

Different JIT compilers have different levels of sophistication, being able to dynamically profile code at runtime or otherwise, and the range of applied optimizations of this type may also vary significantly.

Then, AOT imposes a different restrictions in the form of what types of modularity the language/runtime wants to support (Swift ABI vs no AOT ABI in .NET NativeAOT or GraalVM Native Image), what kind of operations modifying type system are allowed, if any, and how the binary is produced.

In more trivial implementations, where JIT does not perform any dynamic recompilation and profile based optimizations, the difference might come down to simply pre-JITing the code, with the same quality of codegen. Or the JIT may be sophisticated by AOT mode might still be a plain pre-JIT which makes dynamic profiling optimizations off-limits (which was historically the case for .NET's ReadyToRun and partially NGEN, IIRC JVM situation was similar up until GrallVM Native Image).

In more advanced implementations, there are many concerns which might be at odds with each other: JIT compilers have to maintain high throughput in order to be productive but may be able to instrument intermediate compilations to gather profile data, AOT compilers, in the absence of static profile data (the general case), have to make a lot more assumptions about the code, but can reason about compiled code statically, assuming such compilation comes with "frozen world" type of packaging (rather than delegating dynamic parts to emulation with interpreter).

And then there are smaller details - JIT may not be able to ever emit pure direct calls for user code where a jump is performed to an immediate encoded in the machine code, because JIT may have to retain the ability to patch and backpatch the callsites, should it need to de/reoptimize. Instead, the function addresses may be stored in the memory, and those locations are encoded instead where calls are emitted in the form of dereference of a function pointer from a static address and then a jump to it.

JIT may also be able to embed the values of static readonly fields as JIT constants in codegen, which .NET does aggressively, but is unable to pre-initialize such values by interpreting static initialization at compile time in such an aggressive way that AOT can (constexpr style).

So in general, a lot of it comes down to offering a different performance profile. A lot of the beliefs in AOT performance stem from the fact lower-level languages rely on it, and the compilers offering it are very advanced (GCC and Clang mainly), which can expend very long compilation times on hundreds of optimization passes JIT compilers cannot. But otherwise, JIT compilers can and do compete, just a lot of modern day advanced implementations are held back by the code that they are used for, in particular in Java land where OpenJDK is really, really good, but happens to be hampered by being targeted by Java and JVM bytecode abstraction, which is not as much of a limitation in C# that can trade blows with C++ and Rust quite confidently the moment the abstraction types match (when all use templates/struct generics for example).

More on AOT and JIT optimizations (in the case of .NET):

- https://devblogs.microsoft.com/dotnet/performance-improvemen...

- https://migeel.sk/blog/2023/11/22/top-3-whole-program-optimi...

If someone has similar authoritative content on what GraalVM Native Image does - please post it.

GartzenDeHaes · on April 30, 2024

You can also parse the code into an augmented syntax tree or code DOM and then directly interpret that. This approach eliminates the intermediate code and bytecode machine at the cost of slower interpretation. It's slower due to memory cache and branch prediction issues rather than algorithmic ones.

fancyfredbot · on April 30, 2024

For simple functions which are not called repeatedly you have to invert this list - interpreted code is fastest and compiling the code is slowest.

The complied code would still be faster if you excluded the compilation time. It's just that the overhead of compiling it is sometimes higher than the benefit.

galaxyLogic · on April 30, 2024

I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?

There are situations where the analyst enters SQL into the computer interactively. But in those cases the overhead of compiling does not really matter since this is infrequently done, and there is only a single user asking for the operation of running the SQL.

tristor · on April 30, 2024

> I wonder in actual business scenarios isn't the SQL fully known before an app goes into production? So couldn't it make sense to compile it all the way down?

You would hope, but no that's not reality. In reality most of the people "writing SQL" in a business scenario don't even know SQL, they're using an ORM like Hibernate, SQLAlchemy, or similar which is dynamically generating queries that likely subtly change every time the application on top changes.

fancyfredbot · on April 30, 2024

I didn't mean to imply that compilation never makes sense, I'm only pointing out that it isn't always the fastest way of executing a query

In situations where you have a limited number of queries and these are used repeatedly then some kind of cache for the complied code will be able to amortise the cost of compilation.

I certainly agree that a situation where an analyst enters SQL manually is a weird niche and not common. However most applications can dynamically construct queries on behalf of users and so getting a good hit rate on your cache of precompiled queries isn't a certainty.

lambdaxyzw · on May 2, 2024

As far as I know, database often does query caching. So it doesn't have to recompile the query all the time.

On the other hand, query plan may depend on the specific parameters - the engine may produce a different plan for a query for users with deleted=true (index scan) and for deleted=false (99% of users are not deleted, not worth it to use index).

vmchale · on April 30, 2024

APL interpreters are tree-walking, but they're backed by C procedures. Then GCC optimizes quite well and you get excellent performance! With stuff like vector instructions.

Getting on par with GCC/Clang with your own JIT is pretty hefty.

DeathArrow · on April 30, 2024

There is also JIT bytecode.

paulddraper · on April 30, 2024

Yes, the timing of that compilation of bytecode and machine code can be either AOT or JIT.

For example, Java/JVM compiles bytecode AOT and compiles machine code JIT.

AtlasBarfed · on April 30, 2024

optimizing runtimes (usually bytecode) can beat statically compiled code, because they can profile the code over time, like the JVM.

... which isn't going to close the cold startup execution gap, which all the benchmarks/lies/benchmarks will measure, but it is a legitimate thing.

I believe Intel CPUs actually sort of are #2. All your x86 is converted to microcode ops, sometimes optimized on the fly (I remember Intel discussing "micro ops fusion" a decade ago I think).

ehaliewicz2 · on April 30, 2024

Yeah, for example, movs between registers are generally effectively no-ops and handled by the register renaming hardware.