Nanonets OCR : p/nanonets | Product Hunt

Sign in

p/nanonets AI-Powered Document Processing and Workflow Automation

Start new thread

Nanonets OCR - Intelligent text extraction using OCR and deep learning

by

YC Female Founders Holiday Gift Guide 2017

•

5yr ago

Transform unstructured, human-readable text into structured and validated data using OCR + Deep Learning to extract relevant information. Digitize everything from documents, PDFs to number plates and utility meters. Extract relevant info and key fields.

Replies

Best

Looks promising!! Are these ready to use APIs or do you always use custom models

5yr ago

Nanonets

Maker

@anup_surana Thanks! Currently you build your own custom models with a handful of your data. We've seen that one size fits all models don't work out too well.

5yr ago

Prathamesh Juvatkar

Nanonets

Maker

Hello fellow hunters, Thank you for stopping by to have a look at Nanonets' OCR product. I'm one of the co-founders of Nanonets and I would like to give a quick overview of our OCR product. We set out to solve the problem of being able to simplify OCR integration into your product. Especially to automate manual data entry and validation processes in your pipelines. Through this integration, users can easily build production ready OCR models. To give you a little bit of background, Nanonets is a machine learning API for developers to integrate cutting edge ML into their products. Let me give you a quick walkthrough of this feature. 1. Assume you have a large number of invoices that are generated everyday. You have an entire team dedicated to digitizing and extracting key fields from these images. 2. With Nanonets, you can upload these images and teach your model what to look for. For eg: In invoices, you can build a model to extract the product names and prices. 3. Once your annotations are done and your model is built, integrating it is as easy as copying 2 lines of code :) I would urge you to take a look at the product webpage. We have built the product with a lot of passion and would love to have your feedback on it. Happy to answer any questions. Prathamesh

5yr ago

UDAPTOR

Great job! The product looks awesome. On landing page you have examples of document in English and Czech, is Nanonets working with latin text only?

5yr ago

Nanonets

Maker

@yevgeniy_pozdeyev Hey, glad you liked it! Nanonets works with most languages and not only the latin script. For eg: We support Mandarin and Japanese characters as well.

5yr ago

HOMERUN

This looks great! I'm assuming that I'll need to set up the model with a set of my pre-existing formatted documents? Is there a minimum number that's needed?

5yr ago

Nanonets

Maker

@screenshake Thanks! 50 documents and you're good to go!

5yr ago

Hey, great product. How long does it take to train a model after uploading the images?

5yr ago

Nanonets

Maker

@earlctate It generally takes 30 mins - 3 hours. Currently, we're really backed up due to the PH traffic :)

5yr ago

Isn't this template specific again? Or have you generalised it?

5yr ago

Nanonets

Maker

@yash_agarwal8 Hey, it isn't template specific. So if you have say 50 sets of different document types containing similar data, we're able to pull it out for you. Hope this helps

5yr ago

Awesome! Does it work for any specific file format or any image?

5yr ago

Prathamesh Juvatkar

Nanonets

Maker

@pramod_kk It works for most of image types. For a few document digitization customers, we have processed PDF's as well. Are you looking for some specific file format support?

5yr ago

Does this also work for hand written documents?

5yr ago

Nanonets

Maker

@shikhar_khanna2 Hey Shikhar, that's a great question. Given enough examples, we're definitely able to make it work on handwritten text.

5yr ago

Atlassian

Just checked your app! Awesome! good luck!

5yr ago

Nanonets

Maker

@unrealartemg Thanks!

5yr ago

Alfonso C. Betancort

The most important questions that I have not seen addressed and are a must: are the accuracy (in %) of the OCR outputted before and after training and when there’s a sudden change in the placement of the fields that has not been part of the training set. There’s already plenty of software that addresses the same problem (some use AI others use a different approach) but what make all them unusable in real world scenarios, where data coherence is critical, is the % of failures which force to outsource to contractors in third word countries the manual/user review of all OCR output (as expensive as having the contractor enter the whole dataset).

5yr ago

Can Nanonets OCR handle line item extraction on invoices that: i. have sub-headers which need to have their child line items nestled beneath them? ii. sometimes there are no bouncing lines to the table or column separators. iii. tables shift between pages on longer invoices.

5yr ago

Prathamesh Juvatkar

Nanonets

Maker

@michaelmkirk We do handle i. and ii. with our deep learning model. For table shifts, there is custom logic to merge multiple pages. But for that, you need to submit whole invoice as pdf so we get all pages together.

5yr ago

Sounds like a good product, have to see how the ML part actually contributes to quality. I have an active project needing this.. we’re taking proposals if you’re interested

5yr ago

Prathamesh Juvatkar

Nanonets

Maker

@dariusvasefi Interested for sure! Can you share email where we can talk in more detail about the proposal?

5yr ago

can we use for rotated text?

5yr ago