The legacy back-end system being migrated was Clipper + dBase III on DOS, which is reminiscent of COBOL.
The part I added was built with ASP.NET 2.0 on top of Microsoft SQL Server 2005, and was eventually upgraded to 4.0 and 2008 respectively.
The only magic sauce was the use of SQLCLR to embed a few small snippets of C# code into the SQL Server database engine. This allowed the full-text indexing to be specialised for the high level data partitioning. Without this, searches would have taken up to ten seconds. With this custom search the p90 response time was about 15 milliseconds! I believe PostgreSQL is the only other popular database engine out there that allows this level of fine-tuned custom indexing.
p90 for a full-text search on 330 million documents was 15ms?
I know you can tune the hell out of search performance, but that seems a bit too insane for what looks like a relatively unspecialized setup (Standard DB).
Not likely the full book, just title, author and a few other low cardinality fields I'm sure. Also not likely 330 million unique volumes, but total books. This is within reach of a single database with proper indexing.
I simply added the "library id" as a prefix to almost every table's primary key. Every lookup specified it with an equality filter, so essentially it was thousands of standalone libraries in a single schema.
One hiccup was that when the query cardinality estimator got confused, it would occasionally ignore the partition prefix and do a full scan somewhere, bloating the results by a factor of 2000x! This would cause dramatic slowdowns randomly, and then the DB engine would often cache the inefficient query plan, making things slow until it got rebooted.
This is a very deep rabbit hole to go down. For example, many large cloud vendors have an Iron Rule that relational databases must never be used, because they're concerned precisely about this issue occurring, except at a vastly greater scale.
I could have used actual database partitioning, but I discovered it had undesirable side-effects for some cross-library queries. However, for typical queries this would have "enforced" the use of the partitioning key, side-stepping the problem the cloud vendors have.
Modern versions of SQL Server have all sorts of features to correct or avoid inefficient query plans. E.g.: Query Store can track the "good" and "bad" version of each plan for a query and then after sufficient samples start enforcing the good one. That would have been useful back in 2007 but wasn't available, so I spent about a month doing the same thing but by hand.
This makes the performance a lot more understandable if you're only searching in a single library. I assume that cuts out >99.9% of those 330 million documents.
The part I added was built with ASP.NET 2.0 on top of Microsoft SQL Server 2005, and was eventually upgraded to 4.0 and 2008 respectively.
The only magic sauce was the use of SQLCLR to embed a few small snippets of C# code into the SQL Server database engine. This allowed the full-text indexing to be specialised for the high level data partitioning. Without this, searches would have taken up to ten seconds. With this custom search the p90 response time was about 15 milliseconds! I believe PostgreSQL is the only other popular database engine out there that allows this level of fine-tuned custom indexing.