Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Natural Language Toolkit (NLTK)

Compare

  Analyzed 1 day ago

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.

214K lines of code

56 current contributors

13 days since last commit

45 users on Open Hub

Moderate Activity
5.0
 
I Use This

Text Encoding Initiative

Compare

  Analyzed 23 days ago

The TEI is an international and interdisciplinary community-based open standard used by research project, libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.

473K lines of code

4 current contributors

6 months since last commit

3 users on Open Hub

Very Low Activity
5.0
 
I Use This

Open-Content Text Corpus

Compare

  Analyzed 5 months ago

The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The project is intended to be an open platform for academic and research projects of various kinds (tool-, markup-, or ... [More] language-documentation-oriented) and for collaboration on multilingual corpus encoding in general and application of the TEI Guidelines for that purpose in particular. ("TEI" stands for the Text Encoding Initiative, http://www.tei-c.org/) [Less]

0 lines of code

0 current contributors

0 since last commit

2 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: GPL-3.0+

krdwrd

Compare

  Analyzed 12 days ago

Use the internet as a linguistic corpus: Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora. Develop a classification engine that learns to automatically annotate pages, provide visual tools for inspection of results.

117K lines of code

1 current contributors

over 3 years since last commit

2 users on Open Hub

Inactive
5.0
 
I Use This

Greenstone

Compare

  Analyzed 5 months ago

Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato, and developed ... [More] and distributed in cooperation with UNESCO and the Human Info NGO. [Less]

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: GPL-2.0+

W2C (Web To Corpus)

Compare

  Analyzed over 4 years ago

Package of tools for automatic creating corpora from web.

10.9K lines of code

0 current contributors

about 6 years since last commit

1 users on Open Hub

Activity Not Available
0.0
 
I Use This

CORSIS

Compare

  Analyzed 5 months ago

CORSIS (formerly Tenka Text) is a performance‐oriented, open‐source library for corpus analysis. It utilizes typed assembly, task‐specific compilers and parallelization to deliver the best performance with elegant design. Demonstrative GUI of the project comes with Wordlister - an advanced ... [More] , extremely fast graphical wordlist tool and a regex concordance tool. CORSIS - the open-source answer to WordSmith Tools. [Less]

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: GPL-3.0+

IMS Open Corpus Workbench

Compare

  Analyzed 15 days ago

The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP.

2.11K lines of code

0 current contributors

over 9 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This
Licenses: No declared licenses

opencorpora

Compare

  Analyzed about 20 hours ago

An engine for creating and annotating textual corpora

37K lines of code

3 current contributors

about 1 month since last commit

1 users on Open Hub

Low Activity
0.0
 
I Use This

LexAt Lexical/Corpus Statistics

Compare

  Analyzed about 16 hours ago

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

9.59K lines of code

0 current contributors

over 8 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This