Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SVG generation is a surprisingly good benchmark for spatial reasoning because it forces the model to work in a coordinate system with no visual feedback loop. You have to hold a mental model of what the output looks like while emitting raw path data and transforms. It's closer to how a blind sculptor works than how an image diffusion model works.

What I find interesting is that Deep Think's chain-of-thought approach helps here — you can actually watch it reason about where the pedals should be relative to the wheels, which is something that trips up models that try to emit the SVG in one shot. The deliberative process maps well to compositional visual tasks.

 help



Yeah, spatial reasoning has been a weak spot for LLMs. I’m actually building a new code exercise for my company right now where the candidate is allowed to use any AI they want, but it involves spatial reasoning. I ran Opus 4.6 and Codex 5.3 (xhigh) on it and both came back with passable answers, but I was able to double the score doing it by hand.

It’ll be interesting to see what happens if a candidate ever shows up and wants to use Deep Think. Might blow right through my exercise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: