Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

htDig2

Compare

  No analysis available

The ht://Dig system is a complete WWW indexing and searching system for a domain or intranet. This system is not meant to replace the need for internet-wide search systems like Lycos, Infoseek, Google, and AltaVista. Instead, it is meant to cover the search needs for a single company, campus, or ... [More] even a particular sub-section of a Web site. [Less]

0 lines of code

0 current contributors

0 since last commit

19 users on Open Hub

Activity Not Available
3.4
   
I Use This
Mostly written in language not available
Licenses: gpl

YaCy

Compare

  Analyzed about 11 hours ago

YaCy is a P2P search engine for the WWW including a crawler and HTTP proxy

232K lines of code

8 current contributors

13 days since last commit

12 users on Open Hub

Moderate Activity
4.71429
   
I Use This

Grub

Compare

  No analysis available

Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain free (as in freedom) index of the Web. At this moment we have very simple search engine (as a proof of concept) too.

0 lines of code

0 current contributors

0 since last commit

7 users on Open Hub

Activity Not Available
4.0
   
I Use This
Mostly written in language not available
Licenses: BSD-3-Clause, gpl3_or_l...

Apache ManifoldCF

Compare

Claimed by Apache Software Foundation Analyzed 1 day ago

ManifoldCF is an effort to provide an open source framework for connecting source content repositories like Microsoft Sharepoint and EMC Documentum, to target repositories or indexes, such as Apache Solr. ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.

353K lines of code

0 current contributors

about 1 month since last commit

5 users on Open Hub

Low Activity
0.0
 
I Use This

crawler4j

Compare

  Analyzed about 3 hours ago

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes! Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and ... [More] handles the downloaded page. The following is a sample implementation: import java.util.ArrayList; import java.util.regex.Pattern; import edu.uci.ics.crawler4j.crawler.Page; import edu.uci.ics.crawler4j.crawler.WebCrawler; import edu.uci.ics.crawler4j.url.WebURL; public class MyCrawler extends WebCrawler { Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g" + "|png|tiff?|mid|mp2|mp3|mp4" + "|wav|avi|mov|mpeg|ram|m4v|pdf" + "|rm|smil|wmv|swf|wma|zip|rar|gz))$"); public My [Less]

8.29K lines of code

5 current contributors

over 3 years since last commit

4 users on Open Hub

Inactive
5.0
 
I Use This

OpenSearchServer

Compare

  No analysis available

OpenSearchServer is a powerful, enterprise-class, search engine program. With the web user interface, the crawlers (web, file, database, ...) and its REST API you will be able to integrate quickly and easily advanced full-text search capabilities in your application. OpenSearchServer runs on Windows ... [More] and Linux/Unix/BSD Multilingual lemmatization, spellcheck, stop words, synonyms, facet, filters, web crawler, database crawler, local and remote file system crawler, documents indexation with OCR, REST with XML or JSON and SOAP API. [Less]

0 lines of code

0 current contributors

0 since last commit

4 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: gpl3

LinkChecker

Compare

  Analyzed 4 months ago

Check websites and HTML documents for broken links. * recursive and multithreaded checking * output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats * HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support * restriction of link ... [More] checking with regular expression filters for URLs * proxy support * username/password authorization for HTTP and FTP and Telnet [Less]

45.2K lines of code

10 current contributors

4 months since last commit

3 users on Open Hub

Activity Not Available
3.0
   
I Use This

Ronin

Compare

  No analysis available

Ronin is a Ruby platform for exploit development and security research. Ronin allows for the rapid development and distribution of code, exploits or payloads over many common Source-Code-Management (SCM) systems.

0 lines of code

0 current contributors

0 since last commit

2 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: gpl3, lgpl3

Monkey-Spider

Compare

  Analyzed about 23 hours ago

The Monkey-Spider is a crawler based low-interaction Honeyclient Project. It is not only restricted to this use but it is developed as such. The Monkey-Spider crawles Web sites to expose their threats to Web clients.

344 lines of code

0 current contributors

about 14 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

Smart and Simple Web Crawler

Compare

  No analysis available

Simple framework to implement crawling technolgy in own programs and libraries.

0 lines of code

0 current contributors

0 since last commit

2 users on Open Hub

Activity Not Available
3.0
   
I Use This
Mostly written in language not available
Licenses: No declared licenses