Hacker News new | past | comments | ask | show | jobs | submit login

Personally, I prefer to just use the RDTSC assembly instruction (Read TimeStamp Counter), which provides the 64-bit number of clocks your core has ticked.

Both Windows and Linux provide more robust timers (in particular: if your thread takes longer than 10ms, there's a chance your thread will sleep to give other threads a shot at the CPU). So if you're timing something longer than 10ms, you probably want to use OS timers instead.

-------------

There were a few programs where I couldn't add rdtsc easily to the code (in particular: I was trying to test something so fast that rdtsc took up the bulk of the time). In these cases, I went into the BIOS, disabled "turbo" on my CPU, locking my computer to 3.4GHz.

From there, I took the Windows timer and measured 1-billion events, and then divided by 3.4-Billion (3.4GHz == 3.4-billion clocks per second).

---------

I don't know the specific methodology that the blogpost used. But there's many easy ways to do this task.




There's a bit more to it than turning a flag off in your BIOS.

Intel have a document about it [0]

[0] https://www.intel.com/content/dam/www/public/us/en/documents...


> , which provides the 64-bit number of clocks your core has ticked.

Not quite


I mean... computers are complex these days. I could type up like 3 paragraphs that more specifically describes what is going on but is that really helpful?

Yeah, pipelines and out-of-order exeuction makes the definition a bit difficult. If you want to ensure that all previous instructions are done executing, you need lfence, and if you want to prevent future instructions from filling in the pipelines you'll need an mfence.

There are many clocks (even within a core). The turbo-clock is different from the standard clock. I forget exactly which clock rdtsc uses, but I do know that under some processors under certain conditions, you'll get weird results.

Different processors may have different interpretations of "clock" (mostly due to turbo and/or sleeping behavior). Etc. etc. I don't recall the details, but these different clock states could vary as much as 2.2GHz to 4GHz on my processor (P1? Turbo? I forget the exact name...)

---------------

But all in all, you get a 64-bit number that describes the number of clock-ticks --- for some "definition" of clock tick that differs between processors... and for some definition of "now" (in the case of out-of-order execution and/or pipelined execution, the "now" is a bit ambiguous, as previous instructions may have not finished executing yet and future instructions may already be executing).

If you really want to know, read the processor manual specific to the microarchitecture (since different microarchitectures could change these definitions)


> If you want to ensure that all previous instructions are done executing, you need lfence, and if you want to prevent future instructions from filling in the pipelines you'll need an mfence.

LFENCE does not serialize, nor MFENCE. CPUID, however, is documented to as a serializing instruction and is the recommended way to serialize, particularly with RDTSC.

> I don't recall the details, but these different clock states could vary as much as 2.2GHz to 4GHz on my processor (P1? Turbo? I forget the exact name...)

Oh heck, it's way more than that. I've measured ~5x difference in clock cycle count for short loops using RDTSC. Supposedly RDTSC returns "nominal" cycles that advance at the same rate relative to the wall clock, but TBH that doesn't smell right. OSes also try to synchronize the absolute values of the various processors, so jumping between CPUs isn't that bad.


"Invariant RDTSC" has been the norm for a long time now (identifiable by a CPUID feature bit) and it doesn't vary with power states or dynamic frequency. Which means it's just a lightweight, high precision timer at this point. In the Pentium 4 era you had a weaker guarantee called "Constant RDTSC" which could stop ticking in certain low power states.

Anyway, invariant RDTSC's tick rate is completely separate from the core clock. So the main issue you have to worry about with invariant RDTSC is having your process unscheduled or having ticks "stolen" by interrupts (which includes firmware invisible to the kernel or hypervisor).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: