Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

ht://Dig

Compare

  Analyzed about 1 month ago

The ht://Dig system is a complete WWW indexing and searching system for a domain or intranet. This system is not meant to replace the need for internet-wide search systems like Lycos, Infoseek, Google, and AltaVista. Instead, it is meant to cover the search needs for a single company, campus, or ... [More] even a particular sub-section of a Web site. [Less]

507K lines of code

1 current contributors

about 1 month since last commit

20 users on Open Hub

Activity Not Available
3.4
   
I Use This

YaCy

Compare

  Analyzed 10 days ago

YaCy is a P2P search engine for the WWW including a crawler and HTTP proxy

249K lines of code

15 current contributors

11 days since last commit

12 users on Open Hub

High Activity
4.71429
   
I Use This

Grub

Compare

  Analyzed 9 days ago

Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain free (as in freedom) index of the Web. At this moment we have very simple search engine (as a proof of concept) too.

40.6K lines of code

0 current contributors

over 6 years since last commit

7 users on Open Hub

Inactive
4.0
   
I Use This
Licenses: BSD-3-Clause, GPL-3.0+

Apache ManifoldCF

Compare

Claimed by Apache Software Foundation Analyzed 18 days ago

ManifoldCF is an effort to provide an open source framework for connecting source content repositories like Microsoft Sharepoint and EMC Documentum, to target repositories or indexes, such as Apache Solr. ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.

319K lines of code

5 current contributors

21 days since last commit

6 users on Open Hub

Moderate Activity
0.0
 
I Use This

OpenSearchServer

Compare

  Analyzed 9 days ago

OpenSearchServer is a powerful, enterprise-class, search engine program. With the web user interface, the crawlers (web, file, database, ...) and its REST API you will be able to integrate quickly and easily advanced full-text search capabilities in your application. OpenSearchServer runs on Windows ... [More] and Linux/Unix/BSD Multilingual lemmatization, spellcheck, stop words, synonyms, facet, filters, web crawler, database crawler, local and remote file system crawler, documents indexation with OCR, REST with XML or JSON and SOAP API. [Less]

113K lines of code

4 current contributors

about 1 month since last commit

4 users on Open Hub

Moderate Activity
5.0
 
I Use This

Serialist

Compare

  Analyzed almost 6 years ago

Serialist crawls serial stories on the web (such as webcomics), and provides a web interface for users to navigate these serials, mark where they left off, and find out when new pages exist.

2.73K lines of code

2 current contributors

over 6 years since last commit

3 users on Open Hub

Activity Not Available
5.0
 
I Use This

Ex-Crawler

Compare

  Analyzed 7 months ago

Ex-Crawler Project is divided into three subprojects: The main part is the Ex-Crawler daemon server, a highly configurable, flexible (Web-) crawler written in Java. It comes with it's own socket server, where you can manage the server, own usermanagement, distributed grid / volunteer computing ... [More] and much more. Crawled informations are stored in Database. Currently MySQL, PostgreSQL and MSSQL are supported. The graphical (Java Swing) distributed grid / volunteer computing client, including pc idling detection and much more. The web search engine written in PHP. It comes with a CMS, multi language detection and support, templates using smarty. And an application framework partly forked from joomla, so that joomla components could be adapted fast. [Less]

72.1K lines of code

0 current contributors

over 6 years since last commit

3 users on Open Hub

Activity Not Available
5.0
 
I Use This

Murloc

Compare

  Analyzed 13 days ago

Murloc is a website parser framework written in PHP. It features a plugin system that allows quick and easy development of new parsers. It provides several functions that are more or less commonly needed by parsers. Murloc comes with a bunch of handy plugins. As it is written in PHP, it should ... [More] run on pretty much any platform, although some of its features are missing on Microsoft Windows systems. [Less]

3.95K lines of code

0 current contributors

almost 7 years since last commit

3 users on Open Hub

Inactive
5.0
 
I Use This

Eclipse SMILA

Compare

Claimed by Eclipse Foundation Analyzed almost 3 years ago

The amount and diversity of information is growing exponentially, mainly in the area of unstructured data, like emails, text files, blogs, images etc. Poor data accessibility, user rights integration and the lack of semantic meta data are constraining factors for building next generation enterprise ... [More] search and other document centric applications. Missing standards result in proprietary solutions with huge short and long term cost. SMILA is an extensible framework for building search solutions to access unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. [Less]

314K lines of code

5 current contributors

about 3 years since last commit

2 users on Open Hub

Activity Not Available
0.0
 
I Use This

Monkey-Spider

Compare

  Analyzed 7 months ago

The Monkey-Spider is a crawler based low-interaction Honeyclient Project. It is not only restricted to this use but it is developed as such. The Monkey-Spider crawles Web sites to expose their threats to Web clients.

356 lines of code

0 current contributors

almost 8 years since last commit

2 users on Open Hub

Activity Not Available
0.0
 
I Use This