Doubling Mono’s Float Speed

aetherspawn · on April 11, 2018

Wow, I hadn’t imagined that because the coordinates are sent to the GPU as floats, that means triangles start voxelizing when you get further away from origin.

Seems like it could make an awesome shader-free animation if you translated the entire worlds position by a ridiculously large (increasing in value) float.

Const-me · on April 11, 2018

> that means triangles start voxelizing when you get further away from origin

They can in a few edge cases, but in most cases they don’t.

Simplifying many things, to render a model, a game engine uploads 2 things to GPU:

1. Model’s vertex buffer + index buffer. The vertices are in mesh’s own coordinate system, and most 3D designers don’t design their meshes placed 100km away from origin.

2. A single 4x4 matrix, containing ( world * view * projection ) transform. World transforms the model from local to world coordinate system, view from world to camera related, and projection from camera related to 2D screen coordinates + depth.

If you’re 200km far from origin, and are looking at a model near the camera, world transform will contain large values because you’re very far, view transform will also contain large values because camera’s also very far, but multiplied together they won’t have very large values, because model is near the camera.

And if your model is very far from the camera, so the ( world * view * projection ) transform contains huge values, you won’t notice precision degradation, because the whole model will occupy a single pixel at most.

dahart · on April 12, 2018

> They can in a few edge cases, but in most cases they don’t. [...] world transform will contain large values because you’re very far, view transform will also contain large values because camera’s also very far, but multiplied together they won’t have very large values, because model is near the camera.

This is a good point; this problem is more prone to show up in a ray tracer than a rasterizer, since rasterizers have to apply the camera transform to the geometry, and ray tracers don't.

It's pretty easy to see this problem while using Maya though. Z-buffer resolution in the editor drops off from the origin.

We might see this issue crop up with increasing frequency as more and more people use GPUs for ray tracing...

1wd · on April 12, 2018

And the problem is old and well known e.g. by game developers. For example "A Real-Time Procedural Universe, Part Three: Matters of Scale" by Sean O'Neil from 2002 [1] discusses rendering problems at planetary, star-system and even galaxy scale, including Z-buffer precision and various options regarding 32-bit float vs. 64-bit double vs. 128-bit fixed-point integer + float offsets (for vertex coordinates of a mesh) etc.

[1] https://www.gamasutra.com/view/feature/131393/a_realtime_pro...

andreareina · on April 11, 2018

I thought that subtracting two large numbers to get a small one is one of the things that gives you precision issues in floats?

Const-me · on April 12, 2018

You can use 64 bit floats for intermediate matrices. A matrix is just 16 numbers, quite fast to compute even with doubles.

Also if you’ll do nothing and ignore precision issues, numerical errors made while calculating the WVP matrix won’t cause such voxelization of models. A model can be slightly misplaced, maybe jittery between frames, but the shape will stay fine.

andreareina · on April 12, 2018

Thanks for the explanation!

usefulcat · on April 12, 2018

> And if your model is very far from the camera, so the ( world * view * projection ) transform contains huge values, you won’t notice precision degradation, because the whole model will occupy a single pixel at most.

Unless the field of view is sufficiently narrow?

Const-me · on April 12, 2018

> Unless the field of view is sufficiently narrow?

Good catch.

If the projection matrix is extremely non uniform (e.g. orthographic with very large Z size and very small XY size, or perspective one with FOV angle very close to zero), the described problem can still be encountered.

But I did mentioned “a few edge cases”, this is one of them.

dahart · on April 12, 2018

Note it's only voxely if you translate in all 3 dimensions. If you were far away in just x, but not y & z, you'd get a weird looking image that's slabby in x but detailed in y & z.

It's cool that the pbrt renders hold up in voxel form, like the model's still solid and the shadows don't freak out or anything.

The problem is fairly well known in film & games production. Artists, especially world designers, all know to model things near the origin and not far away because precision drops as you move away. They will also sometimes avoid modeling small things in small units like millimeters even though they might prefer it, because the units dictate how big your floats get, which in turn determines how fast you lose precision.

Here's the voxel prediction chart: https://en.m.wikipedia.org/wiki/IEEE_754#/media/File%3AIEEE7...

Const-me · on April 12, 2018

> They will also sometimes avoid modeling small things in small units like millimeters even though they might prefer it

Scaling model from meters to mm only changes precision in model’s units, but the precision in real millimeters stays the same, so the units don’t matter.

I think they avoid millimeters because down the art pipeline people who’ll use the models expect meters, e.g. to place the stuff in a metric world without extra scale involved.

dahart · on April 12, 2018

You're right that normally a more important consideration for production is consistent units.

The units do matter though, since the absolute float value is what determines your precision.

The author of this article translated by 200k units, so the precision loss is relative in this case. But artists might need to translate something 100 meters, so their precision loss depends completely on their choice of units.

Const-me · on April 12, 2018

> their precision loss depends completely on their choice of units.

If an artist needs to translate a model 100 meters off center, and the model is in meters, the graph you’ve linked says the precision will be 10^(-5) model’s units, which is 0.01mm. If the same model is in mm, the graph says the precision for 100*1000 will be 10^(-2) model’s units, which is the same 0.01mm.

As you see, it's independent on the choice of units.

dahart · on April 12, 2018

How big can my world be in meters if I require at least 0.01mm precision and I model things in millimeters? Does that number change if I model in meters?

fyi1183 · on April 12, 2018

Because of how floating point works, it doesn't matter. The difference between modelling in meters or millimeters disappears in the exponents. What matters for precision is the mantissa, which is 23 bits in single precision floats. So you get ~16 million of your smallest unit before you're losing precision, counting the implied leading 1. You can double that to ~32 million by using the sign bit.

Edit: this translates to 320m if you want 0.01mm precision.

Const-me · on April 12, 2018

About 0.008 km^3, each coordinate between -100 and +100 meters.

The number's the roughly the same even if you'll model in km or imperial units.

dahart · on April 12, 2018

Yeah, you're right. Brain fart. I'm mis-remembering production stories instead of thinking clearly about floats, despite having just linked the precision chart. :P

white-flame · on April 12, 2018

There was a 3d engine a while back, focused on collision and pathing, which used 32-bit fixed point values for everything, citing this problem. If you divide out your game world size by the resolution a 32-bit number offers, it ends up being very practical. However, I don't know how overflow was handled for intermediates, if there was "enough" buffer around the outside of the world or if the math just worked out.

Certainly 64-bit fixed point would be "sufficient".

robertAngst · on April 11, 2018

Let me say that this topic made me realize how crazy some of the problems in programming are.

I cant even imagine programming at FB and amazon scales.

CyberDildonics · on April 11, 2018

Numerical accuracy of floats and splitting up data into chunks that don't depend on each other are almost as different as any two problems can get.

workerIbe · on April 11, 2018

I imagine someone used a float as an index, once.

__s · on April 12, 2018

I've dealt with systems using floats as IDs in their DB. Better than when the monetary values were floats

kibibu · on April 13, 2018

Twitter used to use JavaScript numbers to represent various ids in their JSON API. These are, per the JavaScript spec, double-precision (64-bit) floating point numbers. You can only really use these to represent 53-bit integers, so they had to add string versions.

(see https://developer.twitter.com/en/docs/basics/twitter-ids)

CyberDildonics · on April 12, 2018

Actually in the CG shader language you can index arrays with floats to get automatic linear interpolation, but I digress...

Retr0spectrum · on April 11, 2018

"Keanu wonders: is Minecraft chunky purely because everything’s rendered really far from the origin?"

Actually, Minecraft used to have some interesting float-precision-induced artifacts once you got far from the origin:

https://minecraft.gamepedia.com/Far_Lands

Amelorate · on April 12, 2018

Minecraft's Far Lands aren't actually related to floating point precision, although movement near the Far Lands is affected.

My understanding of what causes the Far Lands isn't that great, but I think it's caused by one of six or eight shorts overflowing in the generation algorithm.

Edit: It seems I needed to read more of the Minecraft Wiki link for the Farlands. https://minecraft.gamepedia.com/Far_Lands#Cause covers it much better than I could.

drew-y · on April 11, 2018

I've run into a similar issue writing a ray tracer in Swift! We are even both using Peter Shirley's "Ray Tracing in One Weekend" as a reference! I'm to this day struggling to get the performance near the level of the C++ reference implementation. I've made some improvements but overall the swift version is around 5x slower.

If any one is interested the source code is available here: https://gitlab.com/youngwerth/ray-tracer/settings/repository

Compiled using "swift build -c release".

CyberDildonics · on April 11, 2018

You should look at the assembly to know for sure (after you have profiled and narrowed it down).

With this sort of speed difference my guess is that there is either excess memory allocation going on, pointer hopping when you think you are using something by value on the stack, or both.

maxxxxx · on April 11, 2018

I'll second looking at the assembly code. I had a similar problem a few years ago and an assembly expert told us that we were moving data in and out of the cache all the time instead of leaving it there. We had to change the way we looped through arrays and got a huge improvement. this could be wrong but I would also guess that the Swift optimizer is not as good as the ones C++ has mainly for the reason that it's newer.

CyberDildonics · on April 12, 2018

You don't need to look at assembly to avoid cache misses, you just need to understand what the language is doing under the hood in a general sense, then make sure you access memory in a way that it can be prefetched.

My guess is that it is either the program being written slightly differently or that swift is doing some sort of indirection under the hood.

maxxxxx · on April 12, 2018

"you just need to understand what the language is doing under the hood in a general sense"

Isn't the best to find out to look at the assembly? How else can you know what the compiler and optimizer are doing?

CyberDildonics · on April 12, 2018

No, understanding what the language is doing is not the same as looking at assembly or even understanding what the optimizer is doing.

If swift is creating some variables on the heap and/or creating virtual tables for inheritance (I don't know much about it) then you don't need to look at the assembly to know that you are creating indirection or doing too many heap allocations.

drew-y · on April 12, 2018

Thanks for the advice! I'll look into that. I have not spent much time in assembly land so it should be an interesting lesson.

Insanity · on April 12, 2018

I really enjoyed that book! I've 'followed along' with a Java implementation myself: https://github.com/DylanMeeus/Raytracer

However I did not really care about performance here (just a toy project), so I haven't really delved into that side of it.

EDIT: I get a 404 on your git link

drew-y · on April 12, 2018

Oops! Pasted the wrong link. It's not letting me update either.

Here's the correct one: https://gitlab.com/youngwerth/ray-tracer/tree/performance

bjourne · on April 11, 2018

It pains me to hear about developers doing the wrong thing just to beat a dumb benchmark. If you really care about cpu raytracing performance, you need to write handcrafted simd code and C# default float handling is of no consequence to you. Correctness > speed.

vanderZwan · on April 11, 2018

Did you even read the article?

> In Mono, decades ago, we made the mistake of performing all 32-bit float computations as 64-bit floats while still storing the data in 32-bit locations. (...) Applications did pay a heavier price for the extra computation time, but [in the 2003 era] Mono was mostly used for Linux desktop application, serving HTTP pages and some server processes, so floating point performance was never an issue we faced day to day. (...) Nowadays, Games, 3D applications image processing, VR, AR and machine learning have made floating point operations a more common data type in modern applications. When it rains, it pours, and this is no exception. Floats are no longer your friendly data type that you sprinkle in a few places in your code, here and there. They come in an avalanche and there is no place to hide. There are so many of them, and they won’t stop coming at you.

The raytracer is just a good performance test.

tom_mellior · on April 12, 2018

> The raytracer is just a good performance test.

The article does say that "it was a real application", which is a bit of a stretch.

nerpderp83 · on April 11, 2018

Early C compilers made that same mistake.

slavik81 · on April 12, 2018

It's the x86 hardware which made the original mistake. Using 80-bit floats in the x87 FPU turned out to be a bad idea. Thankfully, standardizing SSE and SSE2 in x86-64 gave us a way out of that mess.

bjourne · on April 12, 2018

Yes I did and fwiw, comments asking people whether they read the article or not is ot on hn. A performance test is exactly the same thing as a benchmark. In any real world code, slower floats doesn't matter at all. None of you who have commented have been able to or even tried to prove me wrong on that point.

vanderZwan · on April 12, 2018

> In any real world code, slower floats doesn't matter at all. None of you who have commented have been able to or even tried to prove me wrong on that point.

First of all, this is Burden of Proof fallacy: the onus is on you to prove this statement right, not on us to prove you wrong.

Second of all, nobody has been trying to prove you wrong because you did not actually say that floating point performance does not matter in real world code. You may have had it in mind, but you cannot blame others for not picking up on something you did not communicate in the first place.

What you did say was "correctness > speed", which is not the same thing. Furthermore, while this statement is true it needs a context to be applied to, which you have to give. Without further justification by you why using float32 operations for float32 data types would reduce correctness, it is a hollow truism.

kibibu · on April 13, 2018

> In any real world code, slower floats doesn't matter at all

This is the same reasoning that tanked the Cyrix 6x86.

If it were true, why don't we just ditch hardware floating point altogether and just emulate it with integer arithmetic instead? I'm sure chip manufacturers would appreciate having the die-space back.

bjourne · on April 13, 2018

That's a straw man. I explained that I meant that a slower builtin floating point TYPE doesn't matter. "If you really care about cpu raytracing performance, you need to write handcrafted simd code and C# default float handling is of no consequence to you."

SimonPStevens · on April 11, 2018

This isn't really to do with raytracer perf. It's about how Miguel de Icaza (The creator of Mono) improved Mono's float handling. The difference in raytracer perf on different .net runtimes is just what brought the poor float handling to his attention. Better mono float handling is a good outcome for lots of non raytracer scenarios.

If you read the linked blog about the ray tracer, the C# ports are just blog 3 in a series of 8 that does actually conclude with two posts on SIMD - https://aras-p.info/blog/2018/03/28/Daily-Pathtracer-Part-0-...

mikeash · on April 11, 2018

Doing 32-bit math on 32-bit floats doesn’t seem like the wrong thing to me. It’s exactly what I’d expect when I use that data type.

brighteyes · on April 11, 2018

Maybe GP misspoke with "wrong", but a valid point was raised there, maybe not in the clearest way:

If you are trying to write a super-fast raytracer or game or some such, you will get the best results by writing SIMD manually. Otherwise, you are benchmarking how well autovectorization works in the various compilers and JITs you are testing.

Now, specifically here it looks like Mono had a bunch of other problems they had to fix to get to the right ballpark (not using the right data type, etc.), which is what the blogpost focuses on. And it's nice to see speedups for C# code there.

Still, though, if you need maximal perf, raw SIMD is necessary. Comparing to a C++ version with SIMD might have been interesting, for example. (Likely the reason Burst is "faster than C++" is that it happens to autovectorize that code better.)

vanderZwan · on April 11, 2018

You raise a valid point, but given that GP ended their argument with "correctness > speed" I do not believe that it was the point they were going for.

floatboth · on April 11, 2018

No, the wrong thing the extra precision they were doing before. When I use a 32-bit type, I absolutely do not expect operations to silently use 64-bit precision by default. Especially considering that other CLR implementations (i.e. Microsoft's original one) do not do this. Least Astonishment > unsolicited extra "correctness".

a_t48 · on April 11, 2018

This article was about improving floating point performance in general, they just used the benchmark as a benchmark. What correctness are you talking about here? The fact that floats are no longer being promoted to doubles for some operations? It's pretty unnecessary for most things, especially for a game engine.

benaadams · on April 11, 2018

And you want more correct then you should be using double or decimal - you shouldn't really be expecting that all the calculations for a float are done in double then converted back to float.

Decimal in .NET is 128bit, range (-7.9 x 10^28 to 7.9 x 10^28) / (10^0 to 10^28), 28-29 significant digits