I think many people commenting on the model making bad predictions are missing the point.
The speaker argues that even though models are known to be inaccurate, companies like tinder or insurance companies might still use the model outputs since they have nothing better.
Therefore, in some future (or already today?) you can suffer from bad model predictions because you are "not normal enough" for the model to make good predictions, and might therefore receive a wrong predicted life expectancy and higher insurance bills.
Insurance companies hire very smart actuaries... and thus currently use actuarial models. Actuarial models aren't perfect either. However, throwing them out to use one of these machine prediction models would almost certainly be a disaster for the insurance company. And there is currently a lot of competition for life insurance.
Using this sort of data to enrich and refine existing data, vs throwing current stuff all out in favor of this newer data... that's what I'd expect (enrichment vs replacement). I'm fairly confident insurance companies have areas in their models where they know there's stuff they don't know. If more data can enrich their models to provide better accuracy, why wouldn't they?
The classic danger here is that it's very easy to accidentally overfit. What people tend to do when they get a perfectly good actuarial model and then hear that they can "enrich" it with additional data is that they start modeling noise. This is obviously not good to stay profitable as an insurance firm.
yeah, that's fair (and not a problem unique to acturial models).
That being said, enrichment with public data/claims data etc is generally incredibly effective, so as always, what data you add matters a lot more than whether or not you add data.
Perhaps, but the demo did not show anything an insurance company would not already know. But as a wider observation, financial products are already based on models even though none of them are perfect. Making those models better isn't really a bad thing.
If one of these models is on average better, then they would gain an advantage by using it. The problem is for the "not normal enough" folks, it may be _harder_ to remedy an invalid classification, particularly if there are no fallbacks or work arounds. I was cued into this once by an ML book that gave an example of a fraud detection company using an actually worse algorithm, because when it gave false positives it was easier to understand and hence easier to manually override. But if it is less profitable to operate this way, and there is no regulation around it, people getting falsely classified may be out of luck. That's where the discussion around regulation needs to happen, I think.
This is the worst anti-surveillance argument. The last thing I want is to be accurately predicted. As far as I'm concerned, once the models are perfected and they can accurately predict everything you will do or say, things will be far worse.
This isnt an usolvable problem though since you can calculate the strength of model fit for a particular data point. You learn how to do this with linear models in stat 101 so everyone who is paid to be a datascientist will no doubt understand this.
Unfortunately the theory for linear models does not translate easily to deep learning based models, which this demo is based on.
The "strength of model fit" becomes much more complicated and is an active field of deep learning research.