I don't know if that's so much a mistake as it is ambiguity though? To me, using the viewer's perspective in this case seems totally reasonable.
Does it still use the viewer's perspective if the prompt specifies "Put a strawberry in the _patient's left eye_"? If it does, then you're onto something. Otherwise I completely disagree with this.
I think "the left eye" in this particular case (a photo of a skull made of pancake batter) is still very slightly ambiguous. "The skull's left eye" would not be.
“The right socket” can only be implied one way when talking about a body just like you only have one right hand despite the fact that it is on my left when looking at you.
If you are facing a wall-plate with two power sockets on it side by side and you are telling someone to plug something in, which one would be "the right socket", and which would be "the left socket"?
If above the wall-plate is a photo of a person and you are someone to draw a tattoo on the photo, which is "the right arm" and which is "the left arm"?
ETA: and if I were telling someone which socket to plug something into, it would absolutely be from the prospective of the person doing the plugging, not from inside the wall.
"Right hand" is practically a bigram that has more meaning, since handedness is such a common topic.
Also context matters, if you're talking to someone you would say "right shoulder" for _their_ right since you know it's an observer with different vantage point. Talking about a scene in a photo "the right shoulder" to me would more often mean right portion of the photo even if it was the person's left shoulder.
Does it still use the viewer's perspective if the prompt specifies "Put a strawberry in the _patient's left eye_"? If it does, then you're onto something. Otherwise I completely disagree with this.