I am wondering why someone even want to disassemble any editor these days...
There is no mystery in principle. Take Scintilla if you want to study editor in source code. That is straightforward implementation but works well enough.
As of syntax highlighting ...
You can use either regexp'es or, which is better, some predefined tokenizers.
Pretty much all programming languages have concepts of NMTOKENs (keywords variable names), string literals, numbers, comments. You can write very fast tokeneizers/scanners that will give you basic blocks. If needed you can pass these blocks to regexp for further processing. That will give you very fast processing.
If you look at the article, you'll see that I wrote a full clone of Sublime Text's syntax highlighter before even opening a disassembler. I do in fact know how to do syntax highlighting.
And yes I read lots of Scintilla's code years ago.
I did a couple hours of disassembly reading purely out of curiosity and to find exactly the kind of hidden goodies that I ended up finding.
There is no mystery in principle. Take Scintilla if you want to study editor in source code. That is straightforward implementation but works well enough.
As of syntax highlighting ...
You can use either regexp'es or, which is better, some predefined tokenizers.
Pretty much all programming languages have concepts of NMTOKENs (keywords variable names), string literals, numbers, comments. You can write very fast tokeneizers/scanners that will give you basic blocks. If needed you can pass these blocks to regexp for further processing. That will give you very fast processing.
If to deal with markup languages then use separate tokenizer. Almost 10 years ago I've written this: https://www.codeproject.com/Articles/14076/Fast-and-Compact-... , tens of megabytes of XML/HTML processing per second...