The traditional way that you train a biological network like this is that you gi...

The traditional way that you train a biological network like this is that you give it stimulations based on what the game is doing and you give it an ability to control the game. But when something bad happens it's given a lot of random stimulation that it doesn't understand. So the biological network tries to minimize the amount of random stimulation it gets and it learns to play the game better because the stimulation is consistent and predictable.

I didn't go into the paper to see if that's exactly what they're doing, and I'm no expert. But from what I've read before, that's how this usually works, and I'm sure they're doing something similar to that.