OCR software for Hindi, Marathi, Gujarati, Tamil, and Sanskrit
Making sense of Indian documents
Our OCR programs for Indian scripts process Devanagari (Hindi, Marathi, Sanskrit), Gujarati, and Tamil texts. Use OCR programs for converting printed books, letters, or newspapers into digital text documents. OCR programs are valuable tools for a modern paperless office, because they help to transform printed content into digital data.An OCR or optical character recognition program can be thought of as a “computer typist”: You scan a page of text, and the OCR program will take care of typing the page. After a few seconds, the OCR program has produced a digital and searchable version of the printed Devanagari, Gujarati, or Tamil. This digital text can be edited with any office program.
Using OCR software makes digitization much more efficient: Digitizing a page of Hindi text takes just a few seconds, and you can concentrate on the content instead of typing the page manually.
OCR software is useful for …
- Publishing houses, data entry companies and libraries: Digitize Hindi or Tamil books and newspapers
- Companies and administration: Create digital text documents from printed business letters, or convert printed into digital records
- and, of course, for everybody interested in generating digital, computer readable text documents.
ind.senz OCR programs recognize Devanagari (Hindi, Marathi, Sanskrit), Gujarati, and Tamil documents at high speed and accuracy:
- HindiOCR is designed for typed texts written in Hindi.
- MarathiOCR is designed for typed Marathi texts.
- TamilOCR is designed for printed or typed Tamil texts.
- GujaratiOCR is our latest OCR tool.
- SanskritOCR is suited for anyone who explores the vast Sanskrit literature, and especially the scientific community.
The OCRed digital Hindi texts can be stored as Unicode UTF-8 text, RTF (Rich Text Format), or as PDF files with text under image. You can open them with text editors such as OpenOffice or Microsoft Word®, and work with them as you would with a typed Hindi document.
HindiOCR yields accurate results for most modern Hindi fonts without training. It helps you saving the time otherwise needed for typing Hindi texts.
MarathiOCR transforms printed Marathi texts into text documents in Devanagari-Unicode encoding. MarathiOCR yields accurate results for a wide range of modern Marathi fonts without training, saving the time otherwise needed to type Devanagari texts.
Digitized texts can be stored in different output formats including plain Unicode UTF-8 or RTF (Rich Text Format), and can be opened with text editors such as OpenOffice or Microsoft Word® for further processing.
TamilOCR yields accurate results for a wide range of modern Hindi fonts without training, saving the time otherwise needed to type Devanagari texts.
Digitized Gujarati texts can be stored in different output formats including plain Unicode UTF-8 or RTF (Rich Text Format), and can be opened with text editors such as OpenOffice or Microsoft Word® for further processing.
GujaratiOCR yields accurate results for a wide range of modern fonts without training, and saves the time needed for typing Gujarati texts.
Our SanskritOCR program for Sanskrit converts printed Sanskrit texts into computer readable, editable and searchable digital documents in Unicode-Devanagari encoding. The recognized Sanskrit text can be stored in plain text, RTF or as searchable, text-under-image PDF files.The program has been developed for the scientific community, but is also useful for publishing houses and private users studying Sanskrit.SanskritOCR contains all features of the professional versions of ind.senz OCR engines. This includes batch processing, full directory OCR, and pdf output.Starting with version 18.104.22.168, the Sanskrit OCR engine uses new methods for handling unknown Sanskrit words. Nevertheless, due to the complexity of Sanskrit, the accuracy rates and speed of the program are slightly lower than for our OCR for Hindi.