Awesome! Only a little over a billion more to go before GitHub’s very own OpenAPI Spec can start overflowing int32 on repositories too, just like it already does for workflows run IDs!
The company where I did my stint as CTO I turned up, noticed they were using 32-bit integers as primary keys on one of their key tables that already had 1.3 billion rows and, at the rate they were adding them, would overflow on primary key values within months… so we ran a fairly urgent project to upgrade the IDs to 64-bit to avoid the total meltdown that would have ensued otherwise.
heh, that's happened at at least 5 companies I have worked at - go to check the database, find - currency as floats, hilarious indexes, integers gonna overflow, gigantic types with nothing in them.
Just curious as someone with limited experience on this. Whats wrong with it? decimal is consistent & predictable (compared than float), so it shouldn't be that big of a deal right? CMIIW
Yeah, not a big deal but completely useless nonetheless as you would never really query your table for just the one decimal column (eg the price) but a couple more (eg the category and the price) so you'd have a multi-column index on those columns. The index on just the price column never gets used.
What if you wanted to select "top 100 most expensive products" or number of products between $0.01 and $10, $10.01 and $100, $100.01 and $1000? Sure you could do a full table scan on your products table on both queries but an index on price would speed both queries up a lot if you have a lot of products. Of course you have to determine if the index would be used enough to make up for the extra time on index update when the price changes or products are added or deleted.
Cheap solution, sure, add an index. But you're asking an OLAP question question of an OLTP system. Questions like that are best asked at least of an out-of-production read replica or better an analytics db.
In general just avoiding mixed types of load. Predictable, audited application queries in a user request shouldn’t be mixed with potentially extremely expensive long running analytics queries. Different replica sets isolates customers from potential performance impacts caused by data analytics.
You stream CDC events to have a 1 to 1 read replica in something like Snowflake/Databricks where you can run all kinds of OLAP workflows on this analytics db replica.
Oh, sure, but wouldn't the whole website be served out of a read-friendly database? Why would you have a separate "analytics" database to the main database(s) driving the site?
They'd certainly need decimals in the first place, but yeah I have seen indexes on every column, multiple times, I have seen indexes such that the sum of their size are 26 times the size of the original data... that's actively being written to.
What are the challenges of such projects? How many people are usually involved? Does it incur downtimes or significant technical challenges for either the infrastructure or the codebase?
Changing the type of the column is no big deal per se, except on a massive table it’s a non-trivial operation, BUT you also have to change the type in everything that touches it, everywhere it’s assigned or copied, everywhere it’s sent over the wire and deserialized where assumptions might be made, any tests, and on, and on. And god help you if you’ve got stuff like int.MaxValue having a special meaning (we didn’t in this context, fortunately).
Our hosting environment at that time was a data centre so we were limited on storage, which complicated matters a bit. Like ideally you’d create a copy of the table but with a wider PK column and write to both tables, then migrate your reads, etc., but we couldn’t do that because the table was massive and we didn’t have enough space. Procuring more drives was possible but took sometimes weeks - no just dragging a slider in your cloud portal. And then of course you’d have to schedule a maintenance window for somebody to plug it in. It was absolutely archaic, especially when you consider this was late 2017/early 2018.
You need multiple environments so you can do thorough testing, which we barely had at that point, and because every major system component was impacted, we had to redeploy our entire platform. Also, because it was the PK column affected, we couldn’t do any kind of staged migration or rollback without the project becoming much more complex and taking a lot longer - time we didn’t have due to the rate at which we were consuming 32-bit integer values.
In the end it went off without a hitch, but pushing it live was still a bit of a white knuckle moment.
If you've written your services in JavaScript, going from i32 to i64 means your driver is probably going return it as a string (or a BigInt or some custom Decimal type), rather than the IEEE754 number you were getting before. This means you now need change your interfaces (both internal and public-facing) to a string or some other safely serializable representation. And if you are going to go through all that trouble, you may as well take the opportunity to just switch to some uuid strategy anyway.
The alternative is that you can monkey-patch the database driver to parse the i64 id as an IEEE754 number anyway and deal with this problem later when you overflow the JavaScript max safe integer size (2^53), except when that happens it will manifest in some really wacky ways, rather than the db just refusing to insert a new row.
Maybe you are better off moving to UUIDs then? It seems that there's packages to make handling them easier, but you'll still need a tiny hack to map old i32 Ids to some UUID.
I remember such a project, and due to our large and aging TypeScript frontend projects it would have added a couple of weeks to adjust all the types affected. All IDs in many places deep in code caused thousands of errors from the mismatch which was a nightmare. I can't remember exactly why it was so tough to go through them all, but we were under intense time pressure.
To speed things up we decided to correct the ID types for the server response, which was key since they were generated from protobuf. But we kept everything using number type IDs everywhere else, even though they would actually be strings, which would not cause many issues because there ain't much reason to be doing numeric operations on an ID, except the odd sort function.
I remember the smirk on my face when I suggested it to my colleague and at the time we knew it was what made sense. It must have been one of the dumbest solutions I've ever thought of, but it allowed us to switch the type eventually to string as we changed code, instead of converting the entire repos at once. Such a Javascript memory that one :)
Not the original commenter, but I've read through half a dozen post-mortems about this kind of thing. The answer is: yes. There's challenges and sometimes downtime and/or breaking changes are inevitable.
For one, if your IDs are approaching the 2^31 signed integer limit, then by definition, you have nearly two billion rows, which is a very big DB table! There are only a handful of systems that can handle any kind of change to that volume of data quickly. Everything you do to it will either need hours of downtime or careful orchestration of incremental/rolling changes. This issue tends to manifest first on the "biggest" and hence most important table in the business such as "sales entries" or "user comments". It's never some peripheral thing that nobody cares about.
Second, if you're using small integer IDs, that decision was probably motivated in part because you're using those integers as foreign keys and for making your secondary indexes more efficient. GUIDs are "simpler" in some ways but need 4x the data storage (assuming you're using a clustered database like MySQL or SQL Server). Even just the change from 32-bits to 64-bits doubles the size of the storage in a lot of places. For 2 billion rows, this is 8 GB more data minimum, but is almost certainly north of 100 GB across all tables and indexes.
Third, many database engines will refuse to establish foreign key constraints if the types don't match. This can force big-bang changes or very complex duplication of data during the migration phase.
Fourth, this is a breaking change to all of your APIs, both internal and external. Every ORM, REST endpoint, etc... will have to be updated with a new major version. There's a chance that all of your analytics, ETL jobs, etc... will also need to be touched.
> For one, if your IDs are approaching the 2^31 signed integer limit, then by definition, you have nearly two billion rows
Just wanted to nitpick this; this is not actually definitively true. A failed insert in some systems will increment the counter and deleting rows usually does not allow the deleted ID to be re-used (new inserts use the current counter). Of course, that is beside the point: the typical case of a table approaching this limit is a very large table.
It's actually fairly common to see this problem crop up in systems that are using a database table as a queue (which is a bad idea for many reasons, but people still do it) in which case the number of live rows in the table can be fairly small.
Lived that with a MySQL table. The best thing is that the table was eventually dismissed (long after the migration) because the whole data model around it was basically wrong.
A couple weeks ago there was some Lua community issues because LuaRocks surpassed 65,535 packages.
There was a conflict between this and the LuaRocks implementation under LuaJIT [1] [2], inflicting pain on a narrow set of users as their CI/CD pipelines and personal workflows failed.
I moved to Max after projecting a $2,000 annual API bill. I haven't yet hit five hour limits, but login/ toggles easily between plans. I believe the interface tells you when you've hit a limit, but as I said I don't know first hand.
According to CCUsage, I hit limits on Opus usage around the equivalent of $150. If we naively extrapolate, that suggests about $600 of Opus usage per session on Max 20x.
I would prefer the BSL with some sort of trial period grant and source available to closed source.
Other nice thing about BSL is it converts to an Open Source license after 3-4 years which addresses the concern “what if the software vendor goes out of business”. You can support it yourself or another vendor can pick it up and support it after that time period.
LGPL can make it a tough decision to use, since that essentially rules out Single-file publish, R2R, and Native AOT in the .NET world. Your project seems like it could be very useful for edge in which those packaging models can be very attractive.
Ah nice! Unfortunately many people (like me) won’t notice that and will probably not be able to consider using it though if they are using corporate license scanners or blanket enterprise policies that don’t allow LGPL.
But that is a nice special permission for those who are still able to use it!
And that could also be a good model if the plan is to dual license it and sell to enterprise.
Maybe funny now but once (if?) it can eventually contribute meaningfully to dotnet/runtime, AI will probably be laughing at us because that is the pinnacle of a massive enterprise project.
I love Michael Lewis’ writing style! I have probably read half of the books in this blog article and can remember hardly any of them, but I could still summarize Flash Boys because it was such a cool story.
Anthropic is releasing blog articles where they are discovering how Claude works through experiments and observations. It seems more like science than engineering when even the creators have to run scientific experiments to figure out how what they engineered works.
The description of how Claude does "arithmetic" was enlightening. Shows how it uses clever pattern matching to fit the data, without ever learning the algorithm, even though the algorithm description was certainly in its training data.
https://github.com/github/rest-api-description/issues/4511
reply