1. The implementation of Stroke Width Transform is not super good. So far, http://libccv.org/ has the best implementation of SWT. But again, you can neither make the head nor the tail of that implementation.
2. There are just too many false text regions and the text detection accuracy is no where near what you can call good. A mixed use of multiple OCR engines might give better results.
All that said, you can't take away the cleverness of the application of detecting text. Mind == Blown, on that area.
I actually modeled my implementation after libccv's implementation. Part of what libccv seems to do is to run it multiple times at different scales, which isn't something that's very computationally feasable for a pure javascript implementation. My implementation has a second stage color filter which refines the SWT (this is something of a tradeoff that improves accuracy for machine-generated text and reduces accuracy for natural scenes, and I'm under the impression that the corpus used by SWT focuses on the latter).
Ocrad is being used as the default because it runs locally and it's small enough that it's easy to ship with. The remote OCR engine uses Tesseract which gets much closer to acceptable in a lot of circumstances.
But there is a lot of work which can be done to improve it. I have a friend who constantly nags me for not having a solid test corpus to run regression analysis/parameter tuning/science. Certainly it lacks the rigor of an academic and scientific endeavor, but I've always imagined this as a sort of advanced proof of concept. I think the application of transparent and automatic computer vision, deserves to be part of the interaction paradigm for the next generation of operating systems and browsers.
1. The implementation of Stroke Width Transform is not super good. So far, http://libccv.org/ has the best implementation of SWT. But again, you can neither make the head nor the tail of that implementation.
2. There are just too many false text regions and the text detection accuracy is no where near what you can call good. A mixed use of multiple OCR engines might give better results.
All that said, you can't take away the cleverness of the application of detecting text. Mind == Blown, on that area.