1
I Use This!
Activity Not Available
Analyzed over 3 years ago. based on code collected almost 5 years ago.

Project Summary

HITEC is a software package for very high accuracy automatic text categorization . The engine of HITEC is the implementation of UFEX (Universal Feature EXtractor) for textual documents. UFEX is a very sophisticated learning method that ensures the outstanding categorizing performance of HITEC, hence HITEC outperforms its competitors in case of all investigated document collections. (For further details, read the white paper).

HITEC applies supervised learning method, that is it learns based on training data (learning phase), and is able to classify new documents to known categories (operational phase). Obviously, the performace of categorization strongly depends on the quality of training data. For efficient training HITEC requires - fixed category system (usually ordered in hierarchy); during the operational phase the new, unknown documents will be classified into that system; - some relevant training documents for each category of the category system.

During the operation, HITEC returns an ordered list of most relevant categories for unknown documents based on confidence values. The greater is this value HITEC deems the more relevant the corresponding category to the document. The returned list if documents can be further processed depending on the nature of classification problem. If perfect accuracy is required for the classification, an expert can accept, revise, or reject categories proposed by HITEC. If the accuracy of around 90\% having been experienced at tests is sufficient, then proposed categories can be accepted based upon their confidence value.

HITEC is programmed very efficiently, therefore its high performace comes with fast operation even on very large document collections. Once the training of HITEC has been done for a document collection, the operation phase is performed in real-time (see also test pages). It is able to process hunderds of gigabytes in reasonable time (training phase) and work with thousands of categories on an average PC.

Tags

fulltext-search indexer information_analysis information_retrieval search search_engine

In a Nutshell, HITEC...

GNU General Public License v2.0 or later
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Forbidden

Sub-License

Hold Liable

Required

Distribute Original

Disclose Source

Include Copyright

State Changes

Include License

These details are provided for information only. No information here is legal advice and should not be used as such.

All Licenses

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    Black Duck offers a free trial so you can discover if there are open source vulnerabilities in your code
  • ...
    you can subscribe to e-mail newsletters to receive update from the Open Hub blog
  • ...
    there are over 3,000 projects on the Open Hub with security vulnerabilities reported against them
  • ...
    compare projects before you chose one to use

Languages

Languages?height=75&width=75
C
36%
C++
34%
Autoconf
16%
11 Other
14%

30 Day Summary

Apr 3 2013 — May 3 2013

12 Month Summary

May 3 2012 — May 3 2013
  • 0 Commits
    Down -1 (100%) from previous 12 months
  • 0 Contributors
    Down -1 (100%) from previous 12 months

Ratings

Be the first to rate this project
Click to add your rating
   Spinner
Review this Project!