Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel very much the same.

When I want to parse a thing I know that all I need to do is turn it into a bunch of tokens and then gradually loop over all the tokens until I've reduced them into something that can't be reduced further, and that lets me do infix and all of that good stuff.

I don't quite understand how to bridge the gap between that understanding and all the theory.



When we taught parsing (formal language theory) we generally did so using a "Chomskian" hierarchy. In a trivial sense it would be something like: regex (finite automata; pushdown) -> context free (CYK; LL(1); LL(k)) -> Turing reduction crap.

I think it did a huge disservice to our students. It completely elided probably the three most important strategies for parsing "in practice", in order: recursive descent (RD), Pratt, and Earley. The time it spent on algorithmic reduction was insufficient to teach students the practice of translating algorithms to simpler, but more efficient, forms (parametric algorithm design).

I think it's because there's not a lot of formal "meat" on RD, Pratt, and Earley — the first two are engineering solutions, and the latter is too difficult for lower-division to do the proofs.

Anyways, learn Pratt and RD. That's all you need, in practice.


Your TL;DR is true. But where the beauty of this theory shows up is as you start seeing different levels of complexity, ambiguity, practical considerations show up in grammars and their parsing!


Also error handling




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: