Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Natural Language Toolkit (NLTK)

Compare

  Analyzed 1 day ago

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.

234K lines of code

42 current contributors

2 months since last commit

45 users on Open Hub

Moderate Activity
5.0
 
I Use This

Treex - NLP Framework

Compare

  Analyzed 1 day ago

Treex (formerly TectoMT) is a highly modular NLP software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to ... [More] significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces. [Less]

242K lines of code

4 current contributors

10 days since last commit

4 users on Open Hub

Low Activity
5.0
 
I Use This

krdwrd

Compare

  Analyzed 1 day ago

Use the internet as a linguistic corpus: Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora. Develop a classification engine that learns to automatically annotate pages, provide visual tools for inspection of results.

3.35K lines of code

1 current contributors

over 4 years since last commit

2 users on Open Hub

Inactive
5.0
 
I Use This

moses-for-mere-mortals

Compare

  Analyzed about 8 hours ago

This site offers a set of Bash scripts and Windows executables add-ins that, together, create a basic translation chain prototype able of processing very large corpora. It uses Moses, a widely known statistical machine translation system. The idea is to help build a translation chain for the real ... [More] world, but it should also enable a quick evaluation of Moses for actual translation work and guide users in their first steps of using Moses. The scripts cover the installation, the creation of representative test files, the training, the translation, the scoring and the transfer of trainings between persons or between several Moses installations. A Help/Short Tutorial (http://moses-for-mere-mortals.googlecode.com/files/Help.odt) and a demonstration corpus (too small for doing justice to the qualitative results that can be achieved with Moses, but able of giving a realistic view of the relative duration of the steps involved) are available. Two Windows add-ins allow the creation of Moses input files from *.TMX translation memories (Extract_TMX_Corpus.exe), as well as the creation of *.TMX files from Moses output files (Moses2TMX.exe). A synergy between machine translation and translation memories is therefore created. The scripts were tested in Ubuntu 9.04 (64-bit version). Documents used for corpora training should be perfectly aligned and saved in UTF-8 character encoding. Documents to be translated should also be in UTF-8 format. One would expect the users of these scripts, perhaps after having tried the provided demonstration corpus, to immediately use and get results with the real corpora they are interested in. Though already tested and used in actual work, this should be considered a work in progress. So as to protect the users not yet completely acquainted with Moses, these scripts try to avoid mistakes that would cost them dearly in terms of time and/or results, but do not completely insulate them (especially from the consequences of malformed corpora files). [Less]

7.21K lines of code

0 current contributors

about 4 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This

LexAt Lexical/Corpus Statistics

Compare

  No analysis available

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

Affisix

Compare

  No analysis available

Affisix is a program for automatic recognition of affixes. It takes large amount of words and according to the user setting it tries to determine which segments of these words are prefixes.

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
4.0
   
I Use This
Mostly written in language not available
Licenses: gpl3

Ruby LinkParser

Compare

  Analyzed about 3 hours ago

A high-level interface to the CMU Link Grammar. This binding wraps the link-grammar shared library provided by the AbiWord project for their grammar-checker.

2.41K lines of code

1 current contributors

about 1 year since last commit

1 users on Open Hub

Very Low Activity
0.0
 
I Use This

opencorpora

Compare

  Analyzed 1 day ago

An engine for creating and annotating textual corpora

38.6K lines of code

3 current contributors

7 months since last commit

1 users on Open Hub

Very Low Activity
0.0
 
I Use This

CSniper

Compare

  Analyzed 1 day ago

CSniper (Corpus Sniper) is a tool that implements (i) a web-based multi-user scenario for identifying and annotating non-canonical grammatical constructions in large corpora based on linguistic queries and (ii) evaluation of annotation quality by measuring inter-rater agreement. This ... [More] annotation-by-query approach efficiently harnesses expert knowledge to identify instances of linguistic phenomena that are hard to identify by means of existing automatic annotation tools. [Less]

23.6K lines of code

0 current contributors

over 2 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This

CorpusCatcher

Compare

Claimed by Translate Analyzed 1 day ago

CorpusCatcher is a corpus collection toolset. It can help you to build language or topic specific corpora from publicly available web resources. This can be very useful for many purposes, especially for data to build spell checkers.

813 lines of code

0 current contributors

about 12 years since last commit

0 users on Open Hub

Inactive
0.0
 
I Use This