I recently learned how Doom was ported to the SNES. It's quite impressive. The S...

pillusmany · on March 8, 2024

Games targetting pre-Pentium PCs also used precomputed trig tables.

Pentium was fast enough that it didn't matter as much.

Just a few years later it was slower to read a trig precomputed table.

BD103 · on March 8, 2024

Yup, I remember watching a video about how the RAM bus is the bottleneck when running Super Mario 64 on the N64. The original implementation used trig lookup tables, but the person optimized it by instead using Taylor series (I think) and some negation / shifting.

Polycryptus · on March 8, 2024

For anyone curious, the video:

https://youtu.be/xFKFoGiGlXQ

pests · on March 9, 2024

Also enjoy his discovery that, while the game got flak online for years due to the release build have files that were not optimized, it turns out most of the optimizations that were done were actually for the worse due the low instruction cache/ram (I forget the exact details.) Things like unrolling loops just increased the code size and required more slow code paging.

xarope · on March 8, 2024

in other words, for those of us who remember, they used the equivalent of a slide rule

fodkodrasz · on March 8, 2024

More like a Trigonometry table, which predates even slide rules:

https://en.wikipedia.org/wiki/Mathematical_table

UndefinedRef · on March 8, 2024

Back in the 80's, when I made demos on the C64, we also used pre-calculated sines. I remember going to the library to get a book with the values.

albrewer · on March 8, 2024

Abramowitz and Stegun's Handbook of Mathematical Functions[0] - I still use it to test whenever I need to implement any kind of fundamental math function that's not built into whatever environment I'm using. Haven't used it in some time but it comes in handy.

[0]https://en.wikipedia.org/wiki/Abramowitz_and_Stegun

nwellnhof · on March 8, 2024

I just wrote a BASIC program generating values using the SIN function [1] and poked them into memory.

[1] https://www.c64-wiki.com/wiki/SIN

Dwedit · on March 8, 2024

It turns out that SNES DOOM missed out on a big optimization that people figured out later on. If you use the SNES's Mosaic feature combined with scrolling tricks, you can nearly double the fill rate. Rather than outputting two pixels, you let the SNES's mosaic hardware do the pixel doubling.

inkyoto · on March 8, 2024

> […] pre-calculated lookup tables […]

The approach is older than that. I remember my grandfather's engineering books from 1950's – nearly each of them had a large addendum with the said pre-calculated sine, cosine, tangent and logarithm lookup tables. And there was at least one book that only had such tables and no other information.

That is how engineers used to calculate before the advent of computers.

aoanla · on March 8, 2024

The classic of this field of books is Abramowitz and Stegun's "Handbook of Mathematical Functions" - although the two listed names are merely those of the compilation editors, as the calculations of the numerous tables of values (and sheets of mathematical identities) required hundreds of human computers operating for years. Ironically, on publication in 1964 it was just in time to see the dawn of the electronic computer age that would supplant it.

albrewer · on March 8, 2024

I still use it when testing implementations of mathematical functions. Like if all I need is a bessel function, why pull in a whole CAS to do that?

tdudhhu · on March 8, 2024

Once I tested lookup tables for a path tracer (ray tracer).

It is interesting that you can get very decent results even with low quality tables. Of course there will be artifacts but due to the randomness of a path tracer this is not always very noticeable.

ImHereToVote · on March 8, 2024

I always wonder when hearing about these old optimizations why they aren't used in contemporary code. Wouldn't you want to squeeze every bit of performance even on modern hardware?

csande17 · on March 8, 2024

The "processor-memory performance gap" is a big reason why lookup tables aren't as clear of a win on modern hardware as they were on the SNES.

If it takes two CPU cycles to read from RAM, a lookup table will basically always be faster than doing the math at runtime. If it takes fifty cycles (because, while your RAM may be faster, your CPU is a lot faster), and your processor has more advanced hardware that can do more math per cycle, maybe just doing the math is faster.

ImHereToVote · on March 8, 2024

I think this is the only answer that addresses the issue. We always underestimate the cost of a read. and overestimate the cost of a compute.

Const-me · on March 8, 2024

Some of these old optimizations are now deprecated. For example, there’s a famous trick for inverse square root: https://en.wikipedia.org/wiki/Fast_inverse_square_root Modern processors have a special instruction for that. The hardware instruction is several times faster, and couple orders of magnitude more precise: https://www.felixcloutier.com/x86/rsqrtps

Other optimizations are now applied automatically by compilers. For example, all modern compilers optimize integer division by compile-time constants, here’s an example: https://godbolt.org/z/1b8r5c5MG

Squeezing performance out of modern hardware requires doing very different things.

Here’s an example about numerical computations. On paper, each core of my CPU can do 64 single-precision FLOPs each cycle. In reality, to achieve that performance a program needs to spam _mm256_fmadd_ps instructions while only loading at most 1 AVX vector per FMA, and only storing at most 1 AVX vector per two FMAs.

tdudhhu · on March 8, 2024

Artifacts are ugly. So why force it on modern hardware when GPUs are extremely fast?

For reference: I was doing a path tracer in PHP :) so yeah, that renders like ancient hardware.

(The browser requested different buckets of an image. A PHP script then rendered and returned that bucket. So it was a kind of multi-threading but still very slow.)

lambdas · on March 8, 2024

A lot get antiquated by instruction additions, like the infamous inverse square root

lkschubert8 · on March 8, 2024

Doing so costs time/wage dollars.

hakuseki · on March 8, 2024

> However, even using extra hardware wasn't enough in this case. So they pre-calculated lookup tables for sine, cosine, tangent etc. for every angle at the necessary precision.

Is this really the order of events? I imagine the pre-calculated route is what you'd try first, and only go for extra hardware if that failed somehow.

shzhdbi09gv8ioi · on March 8, 2024

Lookup tables were commonplace in 80s-90s graphics programming. It should have been used before any consideration of custom hardware solutions.

pengaru · on March 8, 2024

imagine thinking lookup tables are innovative