When I tried to do learn some to put together a little app, every search result for my questions was for a quick blog seemingly aimed at iOS devs who didn’t want to learn and just wanted to copy-paste the answer - usually in the form of an extension method
A failure mode of ULIDs and similar is that they're too random to be easily compared or recognized by eye.
This is especially useful when you're using them for customer or user IDs - being able to easily spot your important or troublesome customers in logs is very helpful
Personally I'd go with a ULID-like scheme similar to the one in the OP - but I'd aim to use the smallest number of bits I could get away with, and pick a compact encoding scheme
It is entirely viable to never have more than 1 or 2 open pull requests on any particular code repository, and to use continuous delivery practices to keep deploying small changes to production 1 at a time.
That's exactly how I've worked for the past decade or so.
Look into libeatmydata LD_PRELOAD. it disables fsync and other durability syscalls, fabulous for ci. Materialize.com uses it for their ci that’s where i learned about it.
There's a couple of passing mentions of Download Monitor, but also the timeline strongly implies that a specific source was simply guessing the URL of the PDF long before it was uploaded
I'm not clear from the doc which of these scenarios is what they're calling the "leak"
> but also the timeline strongly implies that a specific source was simply guessing the URL of the PDF long before it was uploaded
A bunch of people were scraping commonly used urls based on previous OBR reports, in order to report as soon as it was live, as it common with all things of this kind
The mistake was that the URL should have been obfuscated, and only changed to the "clear" URL at publish time, but a plugin was bypassing that and aliasing the "clear" URL to the obfuscated one
It sounds like a combination of the Download Monitor plugin plus a misconfiguration at the web server level resulted in the file being publicly accessible at that URL when the developers thought it would remain private until deliberately published.
For pure duckdb, you can put an Arrow Flight server in front of duckdb[0] or use the httpserver extension[1].
Where you store the .duckdb file will make a big difference in performance (e.g. S3 vs. Elastic File System).
But I'd take a good look at ducklake as a better multiplayer option. If you store `.parquet` files in blob storage, it will be slower than `.duckdb` on EFS, but if you have largish data, EFS gets expensive.
We[2] use DuckLake in our product and we've found a few ways to mitigate the performance hit. For example, we write all data into ducklake in blog storage, then create analytics tables and store them on faster storage (e.g. GCP Filestore). You can have multiple storage methods in the same DuckLake catalog, so this works nicely.
GizmoSQL is definitely a good option. I work at GizmoData and maintain GizmoSQL. It is an Arrow Flight SQL server with DuckDB as a back-end SQL execution engine. It can support independent thread-safe concurrent sessions, has robust security, logging, token-based authentication, and more.
It also has a growing list of adapters - including: ODBC, JDBC, ADBC, dbt, SQLAlchemy, Metabase, Apache Superset and more.
We also just introduced a PySpark drop-in adapter - letting you run your Python Spark Dataframe workloads with GizmoSQL - for dramatic savings compared to Databricks for sub-5TB workloads.
This doesn't seem accurate to me - Gambling sites legally operating in the UK already have strict KYC requirements applied to them via the Gamling regulator.
Visiting a gambling site isn't restricted, but signing up and gambling is.
If age restriction technology is now being introduced to prevent kids *viewing* "inappropriate" websites, then why are gambling websites being given a free pass?
They’ve already found a loophole for that: If you gamble with fake money (acquired through real money and a confusing set of currency conversions) and the prizes are jpegs of boat-girls (or horse-girls, as I hear are popular lately) or football players, you can sell to all the children you want.
The only mention I can see in this document of compression is
> Significantly smaller than JSON without complex compression
Although compression of JSON could be considered complex, it's also extremely simple in that it's widely used and usually performed in a distinct step - often transparently to a user. Gzip, and increasingly zstd are widely used.
I'd be interested to see a comparison between compressed JSON and CBOR, I'm quite surprised that this hasn't been included.
> I'm quite surprised that this hasn't been included.
Why? That goes against the narrative of promoting one over the other. Nissan doesn't advertise that a Toyota has something they don't. They just pretend it doesn't exist.
But how does it compare to an actual modern observability stack built on a columnar datastore like Honeycomb?
reply