crawler4j

I Use This!

Inactive

Analyzed about 23 hours ago. based on code collected about 23 hours ago.

Project Summary

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!

Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. The following is a sample implementation:

import java.util.ArrayList;
import java.util.regex.Pattern;

import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;

public class MyCrawler extends WebCrawler {

Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

public My

In a Nutshell, crawler4j...

...
has had 515 commits made by 43 contributors
representing 8,292 lines of code
...
is mostly written in Java
with an average number of source code comments
...
has a well established, mature codebase
maintained by nobody
with stable Y-O-Y commits
...
took an estimated 2 years of effort (COCOMO model)
starting with its first commit in December, 2011
ending with its most recent commit almost 5 years ago

Quick Reference

Project Links:

Homepage

Code Locations:

https://github.com/yasserg/crawl...

Similar Projects:

Managers:

Become the first manager for crawler4j

Licenses

Apache License 2.0

Permitted

Commercial Use

Modify

Distribute

Place Warranty

Sub-License

Private Use

Use Patent Claims

Forbidden

Hold Liable

Use Trademarks

Required

Include Copyright

State Changes

Include License

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

All Licenses

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record

Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities

Few reported vulnerabilities

About Project Vulnerability Report

Did You Know...

...
in 2016, 47% of companies did not have formal process in place to track OS code
...
you can subscribe to e-mail newsletters to receive update from the Open Hub blog
...
there are over 3,000 projects on the Open Hub with security vulnerabilities reported against them
...
by exploring contributors within projects, you can view details on every commit they have made to that project

About Project Security