I don't think you can scale or compare like that numerically. Every commit isn't a chance for it to break for users because they bundle however many commits into a release and resources required to test the release don't scale per commit.
I think most of the recent breakage has been server side changes and feature flags. Ie "whoops, people who were in this 1% experiment had their browser crash when we pushed an update to the spell check dictionary because we didn't test that combination".
Risks like that scale closer to the number of features/number of Devs, since Devs can be fiddling with server side config outside the release process.