Hacker News new | past | comments | ask | show | jobs | submit login

> , which provides the 64-bit number of clocks your core has ticked.

Not quite




I mean... computers are complex these days. I could type up like 3 paragraphs that more specifically describes what is going on but is that really helpful?

Yeah, pipelines and out-of-order exeuction makes the definition a bit difficult. If you want to ensure that all previous instructions are done executing, you need lfence, and if you want to prevent future instructions from filling in the pipelines you'll need an mfence.

There are many clocks (even within a core). The turbo-clock is different from the standard clock. I forget exactly which clock rdtsc uses, but I do know that under some processors under certain conditions, you'll get weird results.

Different processors may have different interpretations of "clock" (mostly due to turbo and/or sleeping behavior). Etc. etc. I don't recall the details, but these different clock states could vary as much as 2.2GHz to 4GHz on my processor (P1? Turbo? I forget the exact name...)

---------------

But all in all, you get a 64-bit number that describes the number of clock-ticks --- for some "definition" of clock tick that differs between processors... and for some definition of "now" (in the case of out-of-order execution and/or pipelined execution, the "now" is a bit ambiguous, as previous instructions may have not finished executing yet and future instructions may already be executing).

If you really want to know, read the processor manual specific to the microarchitecture (since different microarchitectures could change these definitions)


> If you want to ensure that all previous instructions are done executing, you need lfence, and if you want to prevent future instructions from filling in the pipelines you'll need an mfence.

LFENCE does not serialize, nor MFENCE. CPUID, however, is documented to as a serializing instruction and is the recommended way to serialize, particularly with RDTSC.

> I don't recall the details, but these different clock states could vary as much as 2.2GHz to 4GHz on my processor (P1? Turbo? I forget the exact name...)

Oh heck, it's way more than that. I've measured ~5x difference in clock cycle count for short loops using RDTSC. Supposedly RDTSC returns "nominal" cycles that advance at the same rate relative to the wall clock, but TBH that doesn't smell right. OSes also try to synchronize the absolute values of the various processors, so jumping between CPUs isn't that bad.


"Invariant RDTSC" has been the norm for a long time now (identifiable by a CPUID feature bit) and it doesn't vary with power states or dynamic frequency. Which means it's just a lightweight, high precision timer at this point. In the Pentium 4 era you had a weaker guarantee called "Constant RDTSC" which could stop ticking in certain low power states.

Anyway, invariant RDTSC's tick rate is completely separate from the core clock. So the main issue you have to worry about with invariant RDTSC is having your process unscheduled or having ticks "stolen" by interrupts (which includes firmware invisible to the kernel or hypervisor).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: