We are stuck in the old paradigm of characters taking space on the screen and the idea that a markup language must support classic dumb TUI. If, just imagine it, we used some Unicode range for control characters for the semantic markup and standardized UX for it, we wouldn’t need using normal characters as delimiters and escaping them in strings.
The following would have parseable structure, but would be free of visual noise.
Title: Markup languages: decades of going in the wrong direction
Keywords: hypertext,
delimiters,
ˋ, ", \
People have suggested using the control characters for CSV structured files. The problem is that they are impossible to edit.
Control characters are invisible, using them means changing text editors to display them. They are also, outside the usual ones, hard to type. ASCII ones have Ctrl combos, but editors used those for other things.
Also, what is the difference between using some new character to start block and "{" or "\n"? Why have new thing to indicate new level when have space and tab?
> Control characters are invisible, using them means changing text editors to display them. They are also, outside the usual ones, hard to type. ASCII ones have Ctrl combos, but editors used those for other things.
Yes. Change of paradigm does require change of tooling. If some legacy tool doesn’t support new format, it’s not a good reason not to use new technology - either tool evolves or a replacement emerges and typing won’t be a problem. Classic formatting commands from rich text (Ctrl-B etc) can be repurposed, for example.
>Also, what is the difference between using some new character to start block and "{" or "\n"?
Any such delimiter has other use in text. Dual use means extra ceremony with escaping and extra complexity. Whitespace as a delimiter has especially bad UX, because most editors don’t understand the semantics and it is very to make mistakes.
This case is much rarer than escaping quotes or whitespace. It will happen only if the content of the block will contain unsanitized inputs. In such case a control character for escaping will help, or, if you can have 2x range for control characters, you can use one bit for escaping. E.g. 0x1-0x7 - delimiters, 0x8-0xF - escaped delimiters.
Markup languages should support TUI / plain terminal, because many people still use that as their IDE. If I can’t pipe a file around to standard *nix tooling, it’s not a good format.
You can pipe a file with unicode control characters. If your terminal supports Unicode (it must), it can even display those control characters (e.g. as small curly braces) or choose another form of presenting the text. Markup languages do not have to support every legacy terminal - all new tech requires users to upgrade at some point.
The following would have parseable structure, but would be free of visual noise.