Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

@antimatter15, i have a project that does client-side image analysis and decompses document structures. it looks like your OCR code would be a great replacement for the server-side Tesseract ocr i currently use :)

here's what the project does now with js + web workers:

http://i.imgur.com/QvXSkY2.png

processing time is < 1500ms in Chrome and < 2000ms in FF

the code is open source, though using it isnt yet polished. i'm working slowly on a blog post series to detail how to use the lib(s). https://github.com/leeoniya/pXY.js

a walkthrough of the base lib is here: http://o-0.me/pXY/



The OCR code is an Emscripten port of the GPL-licensed Ocrad program. I published it on Github a few months ago, http://antimatter15.github.io/ocrad.js/demo.html

But in my experience, the recognition quality isn't good enough to replace Tesseract if you have that capability.


it would be very useful to maybe just use part of the code. (the part that detects where there is text, rather than what the text is)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: