tesseract-ocr

I Use This!

Low Activity

Analyzed about 19 hours ago. based on code collected 2 days ago.

Project Summary

The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available.

Tesseract will read a binary, grey or color image and output text, ALTO, PAGE XML, hOCR or PDF. It can read most common image formats.

Since 2020 the Internet Archive uses Tesseract to get text for its scanned documents.

In a Nutshell, tesseract-ocr...

...
has had 9,519 commits made by 510 contributors
representing 4,052,095 lines of code
...
is mostly written in JavaScript
with an average number of source code comments
...
has a well established, mature codebase
maintained by a large development team
with decreasing Y-O-Y commits
...
took an estimated 1,201 years of effort (COCOMO model)
starting with its first commit in March, 2007
ending with its most recent commit 3 days ago

Quick Reference

Project Links:

Homepage
Documentation
Forums ( 2 Links )
Issue Trackers

Code Locations:

(12 Locations)

Similar Projects:

Managers:

Become the first manager for tesseract-ocr

Licenses

Apache License 2.0

Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

All Licenses

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record

Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities

Few reported vulnerabilities

About Project Vulnerability Report

Did You Know...

...
there are over 3,000 projects on the Open Hub with security vulnerabilities reported against them
...
search using multiple tags to find exactly what you need
...
in 2016, 47% of companies did not have formal process in place to track OS code
...
anyone with an Open Hub account can update a project's tags

About Project Security

Code

Lines of Code

Activity

Commits per Month

Community

Contributors per Month

Languages

HTML	82%
JavaScript	8%
Ruby	5%
12 Other	5%

30 Day Summary

Nov 1 2025 — Dec 1 2025

3 Commits
1 Contributor

12 Month Summary

Dec 1 2024 — Dec 1 2025

71 Commits
Down -262 (78%) from previous 12 months
13 Contributors
Down -12 (48%) from previous 12 months

Most Recent Contributors

	Stefan Weil		Zdenko Podobný
	Amit Dovev		0xflotus
	Klaus Rettinghaus		David A. Russo

Ratings

6 users rate this project:

3.7/5.0

Click to add your rating

Review this Project!

Static Analysis ( Generated by Coverity Scan for tesseract-ocr )

Repository URL: https://github.com/tesseract-ocr/tesseract

Version: 5.5.1-16-g17b4d9e5

2025-09-17
Last Analyzed

401,248
Lines of Code Analyze

1.43
Defect Density

Defects by status for current build

807
Total defects

167
Outstanding

614
Fixed

CWE Top 25 defects

ID	CWE-Name	Number of Defects
190	Integer Overflow or Wraparound	21
676	Use of Potentially Dangerous Function	1

tesseract-ocr

Project Summary

Tags

In a Nutshell, tesseract-ocr...

Quick Reference

Licenses

Apache License 2.0

Permitted

Forbidden

Required

All Licenses

Project Security

Vulnerabilities per Version ( last 10 releases )

Project Vulnerability Report

Security Confidence Index

Vulnerability Exposure Index

Did You Know...

Code

Lines of Code

Activity

Commits per Month

Community

Contributors per Month

Languages

30 Day Summary

12 Month Summary

Most Recent Contributors

Ratings

Static Analysis ( Generated by Coverity Scan for tesseract-ocr )

Repository URL: https://github.com/tesseract-ocr/tesseract

Version: 5.5.1-16-g17b4d9e5

Defects by status for current build

CWE Top 25 defects

Project Summary

Code Data

SCM Data

Community Data