More

hackthesystem · on April 18, 2022

It's possible to make a compiler backdoor that is "updatable" and therefore a lot less brittle. And yes this does make the backdoor easier to detect since it's now communicating over the network. But such flexibility could really future-proof the backdoor and let it evolve over time as the target language changes.

schoen · on April 18, 2022

For example, you could also make a compiler compile certain other software incorrectly in order to introduce exploitable vulnerabilities in the binaries. When I was working on convincing people of the importance of reproducible builds, I used to use an example where changing a single bit in the binary could introduce a fencepost error by changing a conditional branch operation into a different conditional branch operation. If the conditional branch related to overwriting memory and incrementing pointers (for example), that could make the resulting binary exploitable even though there was no fencepost error in the original source code.

(My examples on x86 involved changing JGE to JG, or JL to JLE, corresponding to changing >= to >, and < to <=, in loop conditions.)

Combining this with the trusting trust attack, you could have a self-perpetuating bug in the compiler plus a bugdoor in other software. The pattern match for the other software does not necessarily have to be super-specific in that case.

I would definitely agree that this wouldn't survive that many generations of software evolution without active intervention. It definitely wouldn't survive a change of programming language or target machine architecture, for example.

hackthesystem · on April 18, 2022

Who said it has to "generalize"? No virus generalizes to hack every program. That doesn't mean viruses aren't dangerous.

Also OSS makes up most of the modern stack, so access to source code is a given. And hand-crafting a backdoor when you have the source code is trivial because you can literally change anything you want with confidence.

hackthesystem · on Feb 2, 2022

Thanks for mentioning, I just updated my website and Github to use the new WAT functions.

hackthesystem · on Feb 2, 2022

I actually tried comparing 128-bit SIMD to the 64-bit performance and the difference was 2x. I only published the results for the 4x comparison, but it should be pretty easy to reproduce if you change the types in the non-SIMD code[1] from i32 -> i64.

[1] https://github.com/awelm/simd-wasm-profiling/blob/master/fil...

hackthesystem · on Feb 2, 2022

Thanks! Good points, I think in general the fixed-width "packed" SIMD ISAs have the downsides that you mentioned.

But it seems that WebAssembly doesn't have length-agnostic SIMD instructions yet. There is an open proposal to add this though: https://github.com/WebAssembly/flexible-vectors

hackthesystem · on Feb 1, 2022

Great questions! I'm not a database expert either but I can try answering these:

1) I think databases like to manage pages directly because the db can make more optimizations than the OS because the db has more context. For example, when aborting a transaction the db knows its dirty pages should be evicted (i'm not sure if mmap offers custom eviction). Also I believe if the db uses mmap, it loses control over when pages are flushed to disk. Flush control is necessary for guaranteeing transaction durability.

2) What you're describing here sounds similar to a LSM-tree database (e.g. RocksDB). They are used often for write-heavy workloads because writes are just appends, but they might not be great for read-heavy things.

3) This reminds me of PRQL[1] (which was trending on Hacker News last week) and Spark SQL. I'm not too familiar with this area though, so I can't really say why SQL was designed this way.

[1] https://github.com/max-sixty/prql?utm_source=hackernewslette...

akrymski · on Feb 2, 2022

1) Indeed you should only use mmap for reads afaik

2) Was thinking more of an event-sourcing model, whereby you log the SQL statements first, then update a B-Tree in the background.

Read via mmap, write by appending to a log and asynchronously applying the changes to the file.

3) Rather than yet another QL, expose a higher level API that I can target in any language

akrymski · on Feb 3, 2022

Another thing to consider is pluggable storage (a key/value interface) and pluggable query language (relational algebra interface?) and how to fit the two together.

hackthesystem · on Feb 1, 2022

Good point. I just removed my name from the license, but I couldn't find the original author(s) so the copyright name list is just empty now

hackthesystem · on Feb 1, 2022

There is also BusTub from CMU which I stumbled upon earlier today:

https://github.com/cmu-db/bustub

hackthesystem · on Feb 1, 2022

Yeah, I think its a shame that most teachers don't give assignments like this that tie the big picture together with the low-level details. After students complete a big assignment like SimpleDB, they'll have a working artifact that they can reference for the rest of their career

haggy102 · on Feb 2, 2022

I think the main issue that universities face is time. There is only so much time in each semester and, as we all know, building and improving on a database is a lifelong task.

hackthesystem · on Feb 1, 2022

When I was implementing SimpleDB in 2019, I believe CMU's course didn't have resources and lab assignments that were publicly available. Now CMU has published a full video lecture series (which MIT doesn't have) and their labs. So if I were starting again today, I would probably go with CMU's course.

CMU Intro to Databases Labs: https://15445.courses.cs.cmu.edu/fall2021/assignments.html

CMU Intro to Databases Lectures: https://www.youtube.com/playlist?list=PLSE8ODhjZXjZaHA6QcxDf...

BusTub - CMU's Version of SimpleDB: https://github.com/cmu-db/bustub