From NAND to Raytracer: Raytracing on the Hack Computer

nwallin · on June 20, 2021

> For example, the old game Quake used Q16.16 for all of its code because the computers at that time didn’t have fast floating point math yet.

This isn't quite true, and the reality is actually way cooler than this.

Quake used a hybrid approach. It did both fixed point math on the ALUs and floating point math on the x87 FPU. The fixed point math on the ALU was used for game simulation, moving around, physics, stuff like that. The floating point math was used for graphics and drawing to the screen. Doing both sorts of math at the same time was non-trivial, but it had a huge upside: the ALU and FPU are independent of each other, so code executes in parallel. By doing FPU+ALU math simultaneously, you are effectively doubling the computational power accessible to the program, and as a result Quake looked head and shoulders more advanced than its contemporaries.

Not all CPUs had independent ALUs and FPUs. The Cyrix 6x86 ALU ops would delay its FPU ops and vice versa. So even though the 6x86 was faster than the Pentium at ALU ops and in the same ballpark for FPU ops, the 6x86 had absolutely terrible performance running Quake. This led to an undeserved reputation of the 6x86 of being slow and its benchmarks being fake marketing material amongst a certain segment of the population, which arguably led to Cyrix's decline and ultimate downfall.

--------------------------

Regarding square roots- the underlying architecture has an integer square root function, and fixed point numbers are represented as a left of decimal integer and a right of decimal integer. Why not just do integer square root on the left of decimal integer, and then do one (or two) iterations of Newton's method? This would be much faster and IMHO less complicated to code.

imalex · on June 21, 2021

Wow, that is indeed way cooler than what I knew! Thanks a lot for sharing (I wrote the blog post).

As for square roots, the reason why you can't use the underlying architecture is that it only works on the native i16 type. That's why I had to reimplement integer square root in the Int32 and then fixed point square root using that.

If you're saying that I could just ignore the Int32 part of the code and implement square roots just for the fixed point using the underlying architecture, that's definitely doable, but I'd be concerned about Newton's method (specifically the squaring of x_n) overflowing the fixed point. Maybe it'd work and I haven't thought it through enough. Thanks for the suggestion :D

nwallin · on June 21, 2021

> As for square roots, the reason why you can't use the underlying architecture is that it only works on the native i16 type.

My understanding of your code is that you've got `struct Number { int16 a,b,c,d; };` and the value of a number is equal to the floating point number 256.0a + b + c / 256.0 + d / 65536.0. (if the arch had floats, which it doesn't) If this assumption is wrong, my bad.

What I'm proposing is that you do `b = i16.sqrt(256a + b); a = 0;` then do 1-2 iterations of newton's method. For instance, if you have the value 1234.56789 then i16.sqrt(256*a+b) = i16.sqrt(1234) = 35. So your first approximation is x = 35.56789. Do x = (x + 1234.56789/x)/2, and with floats you get x = 35.139035. Do it again, you get x = 35.136418. The true value is also 35.136418, so there ya go.

> I'd be concerned about Newton's method (specifically the squaring of x_n) overflowing the fixed point.

I may have lost the plot. Are you calculating _inverse_ square root? x_n isn't squared for Newton's method of square root, but it is for inverse square root.

(Either way, the square of the square root of a number should be the same as the original number. So if you have overflow, you have other problems.)

Unrelated: What do you need pi and trig functions for? I wrote a ray tracer (actually a path tracer, same diff) about a year ago, and I only used trig/pi to set up the camera for the render. While the render was actually rendering, no trig was happening. It didn't matter for me, so I just used the standard library, but in your case it would probably be fine to use slow brutish methods.

imalex · on June 22, 2021

OK, I got what you're saying now, thanks! I got my wires crossed; I thought Newton's method was something else, but I was actually already implementing that.

I think it's a clever idea to use the native square root to seed the guess. That hadn't occurred to me. In my solution, I do integer square roots, and the fixed just dispatches to that, so I could drastically improve my guess if I used the native square root there. Nice idea!

I use pi just for normalization of the light power: https://github.com/aquach/from-nand-to-raytracer/blob/main/r...

And I don't use trig. I mention it just for fun. In the original Rust raytracer tutorial, they use trig to set up the field of view, but I hardcoded it to 90 degrees to remove the trig.

ant6n · on June 21, 2021

The parallel use of fixed point and floating point instructions was used in the software renderer of quake, where the division necessary to compute perspective correct texture mapping was issued to the fpu while rendering 16 pixels using integer instructions.

AndrewDucker · on June 20, 2021

Having done the original course I am impressed he got a ray tracer up and running on it!

XpertWizard · on June 20, 2021

Very impressive, huge fan of the course and love seeing projects written with HACK.

jweather · on June 21, 2021

This is a very cool project, and a fantastic writeup. Personally I would have been tempted to extend the architecture to add unsigned i16 support.

imalex · on June 21, 2021

That would've been fun too as a different challenge (I wrote the blog post). My main hesitation was that doing so might prevent me from using the built-in VM/hardware emulators they provide, so I wouldn't be able to easily run my solution.