I like the idea but, having built many UIs for many different applications over the years, I doubt it would work for anything non-trivial. The thing is, as soon as your UI reaches a certain level of complexity you need tight control over the look and feel, especially over the positioning of UI components relative to each other. Unfortunately, the latter depends a lot on how exactly your UI components look and behave.
For example, your text input box's label might be displayed to the left of the box, to the top, or it might be one of those floating labels that are displayed inside the box but move to the top border as soon as you focus the box. Depending on that, you would choose the spacing to neighboring text boxes and buttons differently, and maybe also arrange things differently altogether, because of different space requirements.
In other words: Your application would need to know what exactly the Wayland "UI layer" is going to display in order to send it precise instructions. But then your abstraction is very leaky.
For example, your text input box's label might be displayed to the left of the box, to the top, or it might be one of those floating labels that are displayed inside the box but move to the top border as soon as you focus the box. Depending on that, you would choose the spacing to neighboring text boxes and buttons differently, and maybe also arrange things differently altogether, because of different space requirements.
In other words: Your application would need to know what exactly the Wayland "UI layer" is going to display in order to send it precise instructions. But then your abstraction is very leaky.