17
I Use This!
Moderate Activity
Analyzed about 15 hours ago. based on code collected 2 days ago.

Project Summary

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available.

Tesseract will read a binary, grey or color image and output text, ALTO, PAGE XML, hOCR or PDF. It can read most common image formats.

Since 2020 the Internet Archive uses Tesseract to get text for its scanned documents.

Tags

character_recognition document google internetarchive lstm neural_network ocr recognition tesseract

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record
Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities
Few reported vulnerabilities

Did You Know...

  • ...
    in 2016, 47% of companies did not have formal process in place to track OS code
  • ...
    anyone with an Open Hub account can update a project's tags
  • ...
    use of OSS increased in 65% of companies in 2016
  • ...
    you can subscribe to e-mail newsletters to receive update from the Open Hub blog
About Project Security

Languages

HTML
82%
JavaScript
8%
Ruby
5%
12 Other
5%

30 Day Summary

Jun 30 2025 — Jul 30 2025

12 Month Summary

Jul 30 2024 — Jul 30 2025
  • 159 Commits
    Down -131 (45%) from previous 12 months
  • 18 Contributors
    Down -8 (30%) from previous 12 months

Ratings

6 users rate this project:
3.66667
   
3.7/5.0
Click to add your rating
  
Review this Project!
 

Static Analysis ( Generated by Coverity Scan for tesseract-ocr )

Repository URL: https://github.com/tesseract-ocr/tesseract

Version: 5.5.1-13-g80142024

2025-07-06
Last Analyzed
401,255
Lines of Code Analyze
1.08
Defect Density

Defects by status for current build

805
Total defects
167
Outstanding
614
Fixed

CWE Top 25 defects

ID CWE-Name Number of Defects
190 Integer Overflow or Wraparound 18
676 Use of Potentially Dangerous Function 1