p/nanonets
AI-Powered Document Processing and Workflow Automation
Kat Manalac
Nanonets OCR — Intelligent text extraction using OCR and deep learning
Featured
24
Transform unstructured, human-readable text into structured and validated data using OCR + Deep Learning to extract relevant information. Digitize everything from documents, PDFs to number plates and utility meters. Extract relevant info and key fields.
Replies
Saransh Sinha
This looks great! I'm assuming that I'll need to set up the model with a set of my pre-existing formatted documents? Is there a minimum number that's needed?
Rushabh Nagda
@screenshake Thanks! 50 documents and you're good to go!
Prathamesh Juvatkar
Hello fellow hunters, Thank you for stopping by to have a look at Nanonets' OCR product. I'm one of the co-founders of Nanonets and I would like to give a quick overview of our OCR product. We set out to solve the problem of being able to simplify OCR integration into your product. Especially to automate manual data entry and validation processes in your pipelines. Through this integration, users can easily build production ready OCR models. To give you a little bit of background, Nanonets is a machine learning API for developers to integrate cutting edge ML into their products. Let me give you a quick walkthrough of this feature. 1. Assume you have a large number of invoices that are generated everyday. You have an entire team dedicated to digitizing and extracting key fields from these images. 2. With Nanonets, you can upload these images and teach your model what to look for. For eg: In invoices, you can build a model to extract the product names and prices. 3. Once your annotations are done and your model is built, integrating it is as easy as copying 2 lines of code :) I would urge you to take a look at the product webpage. We have built the product with a lot of passion and would love to have your feedback on it. Happy to answer any questions. Prathamesh
darius vasefi
Sounds like a good product, have to see how the ML part actually contributes to quality. I have an active project needing this.. we’re taking proposals if you’re interested
Prathamesh Juvatkar
@dariusvasefi Interested for sure! Can you share email where we can talk in more detail about the proposal?
Alfonso C. Betancort
The most important questions that I have not seen addressed and are a must: are the accuracy (in %) of the OCR outputted before and after training and when there’s a sudden change in the placement of the fields that has not been part of the training set. There’s already plenty of software that addresses the same problem (some use AI others use a different approach) but what make all them unusable in real world scenarios, where data coherence is critical, is the % of failures which force to outsource to contractors in third word countries the manual/user review of all OCR output (as expensive as having the contractor enter the whole dataset).
Evgeny Pozdeev
Great job! The product looks awesome. On landing page you have examples of document in English and Czech, is Nanonets working with latin text only?
Rushabh Nagda
@yevgeniy_pozdeyev Hey, glad you liked it! Nanonets works with most languages and not only the latin script. For eg: We support Mandarin and Japanese characters as well.
Yash Agarwal
Isn't this template specific again? Or have you generalised it?
Rushabh Nagda
@yash_agarwal8 Hey, it isn't template specific. So if you have say 50 sets of different document types containing similar data, we're able to pull it out for you. Hope this helps
shikhar khanna
Does this also work for hand written documents?
Rushabh Nagda
@shikhar_khanna2 Hey Shikhar, that's a great question. Given enough examples, we're definitely able to make it work on handwritten text.
Anup Surana
Looks promising!! Are these ready to use APIs or do you always use custom models
Rushabh Nagda
@anup_surana Thanks! Currently you build your own custom models with a handful of your data. We've seen that one size fits all models don't work out too well.
Earl  Tate
Hey, great product. How long does it take to train a model after uploading the images?
Rushabh Nagda
@earlctate It generally takes 30 mins - 3 hours. Currently, we're really backed up due to the PH traffic :)
Michael  Kirk
Can Nanonets OCR handle line item extraction on invoices that: i. have sub-headers which need to have their child line items nestled beneath them? ii. sometimes there are no bouncing lines to the table or column separators. iii. tables shift between pages on longer invoices.
Prathamesh Juvatkar
@michaelmkirk We do handle i. and ii. with our deep learning model. For table shifts, there is custom logic to merge multiple pages. But for that, you need to submit whole invoice as pdf so we get all pages together.
Artem Galenko
Just checked your app! Awesome! good luck!
Rushabh Nagda
Pramod Kumar
Awesome! Does it work for any specific file format or any image?
Prathamesh Juvatkar
@pramod_kk It works for most of image types. For a few document digitization customers, we have processed PDF's as well. Are you looking for some specific file format support?
Surya Rasp
can we use for rotated text?