Projects tagged ‘crawler’

htDig2

No analysis available

The ht://Dig system is a complete WWW indexing and searching system for a domain or intranet. This system is not meant to replace the need for internet-wide search systems like Lycos, Infoseek, Google, and AltaVista. Instead, it is meant to cover the search needs for a single company, campus, or ... [More]

0 lines of code

0 current contributors

0 since last commit

19 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: gpl

Tags c crawler database development index indexing intranet networking search search-engine searchengine tools 1 more...

YaCy

Analyzed about 11 hours ago

YaCy is a P2P search engine for the WWW including a crawler and HTTP proxy

232K lines of code

8 current contributors

13 days since last commit

12 users on Open Hub

Moderate Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: gpl

Tags crawler decentralized dht distrubuted-hash-table java linux macosx opensource p2p peer-to-peer proxy search 4 more...

Grub

No analysis available

Grub Next Generation is distributed web crawling system (clients/servers) which helps to build and maintain free (as in freedom) index of the Web. At this moment we have very simple search engine (as a proof of concept) too.

0 lines of code

0 current contributors

0 since last commit

7 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: BSD-3-Clause, gpl3_or_l...

Tags crawler distributed indexing search-engine

Apache ManifoldCF

Claimed by Apache Software Foundation Analyzed 1 day ago

ManifoldCF is an effort to provide an open source framework for connecting source content repositories like Microsoft Sharepoint and EMC Documentum, to target repositories or indexes, such as Apache Solr. ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.

353K lines of code

0 current contributors

about 1 month since last commit

5 users on Open Hub

Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags crawler ecm enterprisesearch indexing searchengine tools

crawler4j

C

Analyzed about 3 hours ago

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes! Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and ... [More]

8.29K lines of code

5 current contributors

over 3 years since last commit

4 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags crawler java multi-threaded opensource web webcrawler

OpenSearchServer

No analysis available

OpenSearchServer is a powerful, enterprise-class, search engine program. With the web user interface, the crawlers (web, file, database, ...) and its REST API you will be able to integrate quickly and easily advanced full-text search capabilities in your application. OpenSearchServer runs on Windows ... [More]

0 lines of code

0 current contributors

0 since last commit

4 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: gpl3

Tags crawler engine full_text fulltext fulltext_search index indexation indexer indexing information_retrieval knowledgemanagement rest 5 more...

LinkChecker

Analyzed 4 months ago

Check websites and HTML documents for broken links. * recursive and multithreaded checking * output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats * HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support * restriction of link ... [More]

45.2K lines of code

10 current contributors

4 months since last commit

3 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Python

Licenses: gpl

Tags aref crawler html link-checker link_checking loadtest robot spider w3c web-crawler webcrawler web-spider 1 more...

Ronin

No analysis available

Ronin is a Ruby platform for exploit development and security research. Ronin allows for the rapid development and distribution of code, exploits or payloads over many common Source-Code-Management (SCM) systems.

0 lines of code

0 current contributors

0 since last commit

2 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: gpl3, lgpl3

Tags asm attacks console crawler csrf developerfriendly distributed documentation exploit exploits framework free 44 more...

Monkey-Spider

M

Analyzed about 23 hours ago

The Monkey-Spider is a crawler based low-interaction Honeyclient Project. It is not only restricted to this use but it is developed as such. The Monkey-Spider crawles Web sites to expose their threats to Web clients.

344 lines of code

0 current contributors

about 14 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Python

Licenses: gpl3_or_l...

Tags client crawler honeyclient webav webscanner

Smart and Simple Web Crawler

S

No analysis available

Simple framework to implement crawling technolgy in own programs and libraries.

0 lines of code

0 current contributors

0 since last commit

2 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: No declared licenses

Tags crawler java search search-engine seo sitemap

Tags : Browse Projects