Thanks for the write up. We're taking a similar approach at my startup to classify business reviews. The breakthrough for us came when we split the reviews into sentences and did N-gram analysis at the sentence level. The challenge is that the most significant N-grams (e.g. N > 2) have such low frequency that there isn't much data to train on. Our current approach is to try to coax patterns out of the N-grams (e.g. "salesman was rude" and "manager was mean" become "[employee]=[negative]"). I do like the top 5 approach, and I think I'll see if I can work that into our approach.