Cuneiform is an OCR system originally developed and open sourced by Cognitive technologies. This project aims to create a fully portable version of Cuneiform.
OCRFeederOCRFeeder is a document layout analysis and optical character recognition system.
Given the images it will automatically outline its contents, distinguish between what's graphics and text and perform OCR over the latter. It generates multiple formats being its main one ODT.
It
... [More] features a complete GTK graphical user interface that allows the users to correct any unrecognized characters, defined or correct bounding boxes, set paragraph styles, clean the input images, import PDFs, save and load the project, export everything to multiple formats, etc. OCRFeeder was developed as the project of the Master's Thesis in Computer Science of Joaquim Rocha. [Less]
hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages
... [More], and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.
There is a Public Specification for the hOCR Format. [Less]
This site uses cookies to give you the best possible experience.
By using the site, you consent to our use of cookies.
For more information, please see our
Privacy Policy