That is puzzling to me as well, particularly because `top` and `htop` on my machine don't even touch `50 MB`. That means, the first figure must be something else and I don't find a good explanation for that. I came across people who suggested to ignore that, e.g https://stackoverflow.com/questions/61666819/haskell-how-to-...
If anyone knows the exact reason for that, then please enlighten me as well. Even this otherwise awesome tutorial ignores that value: https://youtu.be/R47959rD2yw?t=1306
You are looking at the current amount of memory that the process is using in top, but your profiling is probably showing the total allocated while the process ran. This means there are probably LOTS of memory allocations that get freed immediately instead of the program allocating memory once and reusing that heap memory. In the case of an image of this size you could probably just put everything on the stack.
When optimizing, excess heap allocations are the first thing to look at, since they are usually inside inner loops, which almost always means they are not necessary. It definitely should not be ignored by someone looking for speed.
> In the case of an image of this size you could probably just put everything on the stack.
That sounds interesting and will probably make it more efficient. But how do I put things on the stack? As per this answer on Stackoverflow, the runtime allocates memory on the heap to call functions (and probably programmers don't have any control over that?):
>I don't know how to avoid heap memory allocations in haskell or make sure memory is on the stack, but both are very easy to do in C++.
Yup. I know it's easy and the default behavior in C++.
This Haskell page says, it is pretty common to allocate and deallocate memory immediately, and that GHC handles this pretty efficiently, which probably explains why Haskellers dont seem to care about this (though it seems weird to me).
To be fair, the Rust program isn't yet optimized fully. We tweaked few things here and there, and the speed improved from 26s to 19s. I believe it can still be improved further. I'm sure it can get better (especially if the issues in Rayon crate are addressed). However, at the same time, I think even Haskell version can be improved further. I'll update the blog post, or might create new blog post, once I have some interesting data to share.
-XStrict is almost never optimal in Haskell as a lot of the optimizations ultimately rely on lazy evaluation. I think more appropriate would be to try out -XStrictData instead.
It was indeed my first Haskell program. So I'd take your comment as compliment. I have been a long time C++ programmer (https://stackoverflow.com/users/415784/nawaz), so I know what kind of tools one should look for in another language when it comes to making the code performant. So that helped me.
If anyone knows the exact reason for that, then please enlighten me as well. Even this otherwise awesome tutorial ignores that value: https://youtu.be/R47959rD2yw?t=1306