Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Making an Accurate Sleep() Function (blat-blatnik.github.io)
61 points by joemanaco on June 24, 2022 | hide | past | favorite | 51 comments


This seems like an anti-pattern.

I usually tell people that sleeps only belong in one place in their code and they should make linters complain if it pops up anywhere else. The place is the method they call to poll. And even then it's better to wait for something than poll to see if we're ready.

This, though, seems to be trying to sleep to slow down frame rate. What the? Maybe it's intended as some way to make all users' experiences suck equally even if they have fast boxes. Maybe. And depending on the actual scenario maybe it makes sense to handle this with a "better" sleep.

The problem as stated is not a solved problem because it's not generally anything anyone would want to do.

In fact, people might have left this intentionally hard to do to discourage its use.


Depending on the graphics api, something needs to sleep somewhere to get decent latency. E.g. in vulkan, if the present mode is fifo, then rendering frames as fast as possible will fill the swapchain with stale frames that won't be presented for a while. It can be a lot better to sleep and render the frame right before it can be presented. But if you sleep a little too long, you will miss the frame, so you need a better sleep function like this.


The problem with all the modern graphics APIs isn't sleep()--the graphics card generally can be filled to 100%, and you kick off another render when the swapchain releases a frame.

The problem is knowing how long the actual render is going to take. Scenes can vary vastly in complexity and predicting how long they will take to render a priori is extremely difficult.


Not an expert in graphics rendering, but could you estimate the render time by calculating (a priori) the number of draw calls needed to render the frame?


Unfortunately no - the cost will depend on the number of visible pixels, and worse, there are secondary costs depending on the arrangement of those visible pixels on screen, what part of the texture happens to be there, whether any sparse textures need to get paged in from system memory, whether GPU occlusion culling is going to cull certain things it wouldn't normally cull, etc. And other things are competing for GPU time like your OS compositor


That's what vsync is for.

And despite what some people and games seem to think, vsync isn't an optional feature that you can turn on or off, it's something that must always be enabled since it's essential to display correct graphics.


Sadly VSync is not all upside and it increases input latency significantly, so it might be a deal breaker depending on what you play or just the person playing. Despite their age GSync and FreeSync are not that ubiquitous yet.

I agree that for most single player games it is simply correct to enable it, but people often don't do it, because they paid lots of money for their gaming computers and want to see the FPS numbers go way up to justify their expense.


Assuming a fixed monitor display, and a separate faster than framerate game update loop, then the ideal way to render is to sleep until you just barely have enough time and then start rending, finishing just time for the card to read out the data. But obviously that is hard, since render time is not the same for every frame. [1] But if we could pull this off, it would obviously have latency that is as low as possible given the hardware in question, with the frame actually output always containing the latest possible data without screen tearing. (If you allow for screen tearing by swapping buffers mid scan-out, then you can get even less input latency for the lower parts of the screen with the render as fast as possible techniques.)

The continuously render new frames as fast as possible approach (with triple buffering) is somewhat of a crude approximation of this. It works, and is potentially better than vsync, but is not ideal. It can give impressive framerate numbers though.

Variable refresh rate (gsync, freesync) can avoid this, and are best, at least as long as you can't render frames faster than the maximum refresh rate of the monitor. If you can render frames faster, then all these considerations apply again.

Footnotes: [1] I seem to recall was some technology that tried to simulate this (without the sleep) by making render calls for "frames" that are too early be no-ops, and only once we are close to when the frame needs to start being actually drawn do the draw calls do anything. This obviously yielded absurd framerate numbers.


A solution like this isn't portable and will fall over as soon as you change any piece of hardware in the system.

Separating responsibilities creates systems that are more robust.


I was never suggesting that starting a render run at the last possible second to ensure the frame renders in time to be scanned out was actually a viable approach. Only that it was theoretically optimal.

But hardware changeout would not meaningfully affect people attempting such an approach, because it would obviously be based on actually timing how long a render pass took, adding a small buffer margin, and using that as the timing.

Which does mostly work, except when there is a rapid change in render time, like due to switching on a giant particle effect changing the camera from pointing at the ground to pointing out at the open world, getting close enough to an complex object to swap from a lower LOD mesh to a higher one, etc. These would likely cause meaningful lag spikes as the system tries to adapt.


The situation I'm describing is with vsync enabled. This problem is not solved by vsync and mostly goes away if you turn vsync off, since then it acts more like mailbox present mode at the expense of 100% utilization. Vsync solves tearing.


You could argue that graphics code should just activate VSYNC and run as fast as possible otherwise. The article hints that that doesn't always work in practice, but even if we pretend that it does that still leaves physics engines and other simulation code.

That kind of code usually works best with a fixed "framerate"/tickrate, but for performance reason that rate is set at something like 20 ticks per second. If your framerate is a multiple of that and graphics and physics runs on the same thread that's trivial (e.g. do physics every third frame for 60fps), but what if you move it to a different thread and some of your users have 75hz monitors? Now you have to sleep, or waste resources.


It's for physics. Writing a [stable] fixed time step simulation is significantly simpler than variable time step.

Tons of games do it (console ports are guilty of fixed 30FPS), and it really sucks.


In most cases there's no reason to couple physics to graphics.

The game "is" the physics engine (esp. in cases of networked games); while the graphics engine is simply an "observer" of the output of the physics engine, trying to interpolate/extrapolate graphics frames given a stream of physics frames.

Do physics at 30FPS in one thread; draw graphics as fast as you can in a separate thread; and at the beginning of each render frame, get the latest physics integration-state, and how much time it's been since it was calculated, and extrapolate object displacements+rotations (incl. the camera) linearly from existing velocities. Those changes will be so small, and corrected by the next physics-step so soon, that it'll almost always look correct.

The only reason to not do this, is if you can't afford concurrent execution (i.e. you only have a single CPU core and interrupts are expensive), and you also don't have a practical Lowest Common Multiple frequency to run an event-loop at. Which is something that was maybe true in the gameboy era, but hasn't been true for at least 30 years.


> Do physics at 30FPS in one thread; draw graphics as fast as you can in a separate thread

This sounds like a race condition nightmare. How would one guarantee synchronization? One big mutex for the entire physics state? Or don‘t bother and read whatever values you get in the render thread?

I suppose a single threaded context you would automatically get something like a global mutex, just without the mutex, thanks to the leap frogging nature (frame, frame, frame, tick, frame, frame, frame, tick, …). At least with a fixed timestep.


Just have 2 snapshots of all of the positions of all elements in the game (which is trivial in terms of storage requirements), just the same way you have it with double-buffering for the pixels on the screen.


There are good reasons to do things at certain times (like polling for input or rendering) or slowing down the render loop (adjustable frame rate cap not necessarily the same as refresh rate), and that's really the end of that.

Thinking this might be "intentionally hard" rather than "non-realtime operating systems are just sloppy" is silly.


> This, though, seems to be trying to sleep to slow down frame rate. What the? Maybe it's intended as some way to make all users' experiences suck equally even if they have fast boxes. Maybe.

Or to conserve power, why burn CPU and render at hundreds of fps when the output is limited to 60 or even 30?

I don’t need my machine loaded at 100% when playing slay the spire or dungeons of into the breach.


Which is why you use vsync, ensuring the game renders once for each frame your monitor shows, as opposed to having the game just guess when it should present frames - which can produce tearing or be at the wrong frame rate entirely.


What about when you want to render at half the vsync rate to conserve battery?


That seems easy, just keep the frame around in a buffer and send it twice?


I've used this exact pattern before when designing a homegrown kiosk display renderer with an immediate mode GUI; and I think it is pretty common in games as well (this is how Raylib's renderer works). You could push frames as fast possible, but you will chew up the CPU doing inefficient work and overload the hardware needlessly. If you have something like a physics engine you may also want to design a physics loop that runs every X milliseconds (again, if you let it run as fast as possible, the game world will be in super speed; there have been reports of bad developers who couple their render loop with the physics loop and once the framerate becomes uncapped, the game becomes unplayable).


> again, if you let it run as fast as possible, the game world will be in super speed

Not really, you can calculate how long had passed since the last physics recalculation and use that as a delta between frames.


Better than that is a vsync event to wait on.


What if you're developing an authoritative game server? There will be no vsync event.


Unrelated but a "sleep story" nonetheless. Several years ago worked on a desktop Qt app and I spent several weeks juicing platform flags, caching stuff on save to start faster, slimming down the mainwindow classes, everything I could to get the app to boot in under several 500ms down from 5+s only to be told by my boss that it was too fast and he wanted to see the splash screen so after all that to this day, there is a `sleep(rand(x, y))` call somewhere in that codebase.


When I was an intern at ON Semi, I worked on a testbed that was designed to stress power mosfets with extremely time-accurate pulses of current. The prototype was using an Arduino microcontroller, and I noticed that when I passed small values to delayMicroseconds(), it was just not accurate at all. After a lot of digging, it turned out that they used a small inline assembly loop to count clock cycles, but they had miscounted the number of clock cycles required by the setup. When you're talking about 2-3us with an 8MHz clock, even half a dozen clock cycles will make a big difference!


Arduino is notorious for promoting this sort of synchronous, main-loop only code. It is a horrible model for embedded devices unless they are severely resource constrained.


It's not uncommon to be resource constrained on embedded devices. I actually worked on something quite similar as kayson, helping create a device to synchronize a camera with a LED that was pulsed with a constant current in order to generate photon transfer curves. The pulses had to be as accurate as possible, and indeed doing the inline assembly loop with NOP instructions was the way to go: simple, deterministic, and the possibility to have pulses down to a single clock cycle. I believe PWM wasn't possible because it couldn't have pulses that short and still support the required minimum interval between pulses.

But I do agree that it would be much more helpful if tutorials for Arduino would promote using timers and interrupts.


If the alternative you propose is RTOS/scheduler then the synchronous main-loop approach for embedded is, by far, more common than the alternative.


It's rarely a good idea to implement spin locks as an empty while() loop. Processors have a special instruction for the spinning, _mm_pause()


On windows you can get 1ms accuracy out of the built in Sleep() function by calling timeBeginPeriod[1] It sets the resolution of timers in that process to the specified ms value. I've used this + spinning to make high res sleep before and it works well.

1: https://docs.microsoft.com/en-us/windows/win32/api/timeapi/n...


About ~20 years ago, I discovered that using WinSock's "select()" on Win2K can give you microsecond sleep resolution with reasonable (100us or so) precision. It was still that way on XP, I have no idea if it stayed that way for Vista/7/8/8.1/10/11


Not 100% sure the article doesn't cover this, but if you want a constant rate of wakeups over time (not a precise duration of each sleep call), it's important to schedule your next sleep 1 period after your most recent intended wakeup time (in absolute time), not 1 period after the call to the sleep function (in relative time), to ensure that the time spent after sleep() returns before you call sleep() again doesn't accumulate with every sleep iteration and cause you to fall behind in real time.


It's probably for the sake of an example, but writing a game/render loop with Sleep is a bad idea, perhaps even on a realtime OS. What you should be doing is to render the frame that's meant for the current time. This way slowdowns, hiccups, boosts don't alter the speed of animations, and you can regulate how frequent you render regardless of the timer precision.


Sleep and functions like it should be [almost] banned...

They choose a time that computation happens. But computation should instead be done as soon as possible, and the output delayed to the necessary time.

For example, in an animation, you don't need to wake up the CPU every 16ms to compute the next frame. You should instead just compute all the frames and send it off to the GPU to display in order and with the right timings.


And then when every time the user interacts with the system you would have to trash the GPU queue and recompute everything. If you could compute everything before hand, it would be far simpler just encode the entire scene as a video.

For example, consider a game server that must handle inputs from multiple clients. It can't just compute everything and queue it off to the network because a user input may change the entire game world. A "computation" loop provides a consistent model of which all the game interaction happens.


> They choose a time that computation happens. But computation should instead be done as soon as possible, and the output delayed to the necessary time.

This is fundamentally not a solution for any problem where you're reading input from outside. If that animation is reacting to user input, rendering the frames 100ms ahead will create 100ms or more of "input lag" that the user can see and feel and they hate it.


Well you only do the computation when you have the necessary inputs available, and in the case of something interactive that means waiting till you have the necessary keypresses (or lack of keypresses).

You can still compute ahead if you have a good guess what inputs you'll receive (eg. You predict that no keys will be pressed), but you have to be happy to throw away the results if your prediction was wrong.


Sleeping a thread doesn't "choose" when something happens, the programmer does. Sleep functions are just abstractions over hardware timer interrupts. There's some inherent complexity there, and abstracting it away is naturally going to impose some limits or compromises. Windows' "sleep" function seems to be especially limited, but being able to precisely schedule a CPU thread to a hardware timer is important for solving a lot of interesting problems. As you push a program to its limits, of course you'll need to carefully choose the abstractions that most optimally achieve your requirements given the available hardware. I agree that bare sleep style functions rarely make sense. Blocking select-style calls with a timeout often do, though, which are largely equivalent to sleep.

The whole point of a CPU is to be general purpose. A CPU isn't optimal for anything. The magic of a computer is that it makes "anything" possible. Why would you want to ban sleep? If somebody makes a crappy arcade game with terrible latency and jitter, good for them. Someone not yet familiar with the nuances of "sleep" will learn a lot faster building some crappy but functional programs than they will trying to learn a modern graphics API. A seasoned expert will readily barf out a crappy program with some sleep calls in it when they know it's good enough to solve the problem at hand.


Hmm, what’s wrong with:

  sleep until reddy to blit next frame

  blit frame

  render next frame
repeat


Rendering can take a variable amount of time, so you instead want to wait on a hardware even like vsync instead of a timer.


Right, that was what the sleep was supposed to cover. Sleep until the frame is needed.


Depending on your goals, using a deadline sleep (sleep_until in C++ parlance: https://en.cppreference.com/w/cpp/thread/sleep_until) rather than a duration sleep (sleep_for) is beneficial


well, sleep_until(…) will also drift based on vagaries of os scheduling etc. it would be (pleasantly) surprising iff it did not approximately exhibit the same behaviour as author’s first implementation.


This site looks horrible for me due to some font issues?

See https://i.imgur.com/dAHmjSc.png


Linux and MinGW can just use nanosleep().

As for Windows, use CreateWaitableTimerEx, with a CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag.


Windows can also use beginTimePeriod() to set the timer resolution (sadly system wide) so if one needs a 1ms time period that is plausible with defaults. But you do have to set it. Otherwise you have to depend on the variable tick timer to give you want you want. Which don't get me wrong can be good, but if you need more guaranteed timer ticks... it may not be the way.

https://docs.microsoft.com/en-us/windows/win32/api/timeapi/n...


You could also drive this using an audio queue which uses hardware to generate the timer events.


Unfortunately history has shown that you can't necessarily lean on the audio hardware to give you a stable, reliable clock. Part of the reason Windows moved to doing all its audio mixing in software (a long time ago at this point) was that hardware audio devices and/or drivers tended to be really bad at providing reliable clocks. See this random example from google results:

https://duc.avid.com/showthread.php?t=334589

Winamp had a ton of special purpose code in it just to compensate for bad clocking and timing in consumer sound cards so that you could play back audio at proper rates without gaps or glitches.

Mobile seems to be no better - I remember getting fed up playing rhythm games on a Motorola phone I owned because, I kid you not, the hardware audio clock would occasionally run backwards.


Looks like a good way to drift from the clock and eventually skip frames.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: