I have one remark, though: If your language allows for automatic differentiation already, why do you bother with a neural network in the first place?
I think you should have a good reason why you choose a neural network for your approximation of the inverse function and why it has exactly that amount of layers. For instance, why shouldn't a simple polynomial suffice? Could it be that your neural network ends up as an approximation of the Taylor expansion of your inverse function?
It's a good question, which I answer in much more depth here: https://www.reddit.com/r/MachineLearning/comments/ryw53x/d_f... . Specifically that answer is about NNs vs fourier series, but it's the same point: polynomials, Fourier series, and NNs are all universal function approximators, so why use a NN? If you write out what happens though, you can easily see the curse of dimensionality in action and see why a 100-dimensional function shouldn't be approximated by tensor products of polynomials. See the rest there.
So it's not really about NNs vs polynomials/fourier/Chebyshev, etc., but rather what is the right thing to use at a given time. In the universal differential equations paper (https://arxiv.org/abs/2001.04385), we demonstrate that some examples of automatic equation discovery are faster using Fourier series than NNs (specifically there the discovery of a semilinear partial differential equation). That doesn't say that NNs are a bad way to do all of this, it just means that they are one trainable object that is good for high dimensional approximations, but libraries should allow you to easily move between classical basis functions and NNs to best achieve the highest performance.
I think for more complicated examples like RL control systems a neural network is the natural choice. If you can incorporate physics into your world model then you'd need differentiable programming + NNs, right? Or am I misunderstanding the question.
If you're talking about the specific cannon problem, you don't need to do any learning at all you can just solve the kinematics, so in some sense you could ask why you're using any approximation function,
Requiring an exact justification for a specific NN architecture is not productive. Simply put, such justification almost never exists, yet NN clearly out-perform simple polynomials in a wide variety of tasks. Even if you know that a NN performs better than a simple polynomial on a task, you're very unlikely to get an exact explanation on why a specific NN was chosen vs. the infinite number of other possible NNs. If you require such an explanation, then you're going to miss out on a lot of performance.
in low dimensional spaces, there are a lot of good techniques, but once you go above 10 or so, NN is generally the only option. Pretty much everything else has exponential degradation with more dimensions.
I have one remark, though: If your language allows for automatic differentiation already, why do you bother with a neural network in the first place?
I think you should have a good reason why you choose a neural network for your approximation of the inverse function and why it has exactly that amount of layers. For instance, why shouldn't a simple polynomial suffice? Could it be that your neural network ends up as an approximation of the Taylor expansion of your inverse function?