Hacker News new | past | comments | ask | show | jobs | submit login
Constant factors: constantly foiling you (ksplice.com)
29 points by leonidg on May 13, 2010 | hide | past | favorite | 11 comments



Hmmm...let's think outside the schema for a moment.

OP is benchmarking two instances of COUNTs against a relational database, each looking for the total number of rows in a 3 day period, right?

I'm going to make a guess that this is historical data and that "time" is stamped and never changes again...

Why not just pre-answer the question by incrementing a field (based on date) in another table every time you add a record. Then instead of COUNTing millions of records in seconds, you can read 3 records in micro-seconds.

I know what you're thinking: this will "slow down" your transaction processing time. That's true, but only by microseconds. I'd rather wait a few microseconds several times per day than 61 seconds when I really needed to know something.

Just because a DBMS has lots of capabilities (SQL, indexing, triggers, stored procedures) doesn't necessarily mean that's the best way to deploy something. Sometimes a little thinking and some extra functionality in your app can leapfrog mountains.

[Aside: I have implemented this exact solution many times, often reducing historical reporting run times by 95% or more. No one ever noticed any latency at transaction update time.]


You are totally right, if all I wanted to do was figure out how many total rows there were, and needed to do this often, it would make much more sense to store that number in a separate table. Unfortunately, those particular queries were just simple examples illustrating the problem --- the real program actually needed to read and use the real data. :-)


The OP mentions that the COUNT query is a simplification that has the same performance behavior as the original -- in the actual application, the data inside the rows is needed.

Your idea is a good one for when the count is what's needed, though. Even in more complex cases, maintaining aggregate data ready to go can help a lot.


I have to admit that I'm still a little puzzled here. Why does MySQL need to retrieve the actual rows from disk at all? Since the query is just returning count(*), can't it be answered entirely from the index? Does the index not reflect deletions or something like that?


That's a good question, and one I'm not entirely sure about. My guess is that the index itself is also scattered across the disk, since it gets added to at the same time as the rows get added. So in the real query, where we are reading actual data, and not just counting the number of rows, there are actually two seek costs --- seeking to finding the correct part of the index to read, and then seeking to find the row it points to.


When you insert in sorted order into a b-tree (as you do with timestamps), later inserts will be more likely to be all in one block, and also more likely to be near the top of the tree.

Since b-trees have the guarantee that no root-to-leaf path is more than a constant factor longer than any other, you are probably hitting the very edge of that constant, and since disk seeks are the designated hitter of the performance team, you're getting hit pretty hard.

So, effectively, even your index is growing fragmented the more you use it.

To verify the fact that you're not hitting the data file though, maybe you could strace it and watch for reads on the index and the data file.


Typically, one expects a b-tree lookup to require at most 1 disk seek (for the leaf-nodes), and ideally your indices are fully in memory. So, color me skeptical that fragmentation of the index itself is to blame here.


Inserting in sorted order is not a typical case, and I strongly doubt your "at most 1 disk seek" claim but my memory is fuzzy so you may be correct.

You're definitely right that the cache should be hot though, so I am now also a skeptical hue.


One of the primary virtues of b-trees is that they make efficient use of disk. Even if the full data structure is too large to fit in memory, because of the large branching factor you can usually fit all the interior nodes in memory. So, in a lookup, only the final access of the leaf node requires a disk access. Of course, traversal may require additional seeks and reads.


Depending on 1) The storage engine you're using, and, 2) What other queries you run against this table

You could cluster the table on the 'time' column. That would store the data in the table in the same order as it appears in the index, and make time-based range queries a lot faster.


Shouldn't matter, this query shouldn't hit the data file. Clustering would just make fragmentation worse. I'd like to see the OP verify that it's only hitting the index file though (see my other post).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: