4
I Use This!
Low Activity
Analyzed 2 days ago. based on code collected 2 days ago.

Project Summary

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!

Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. The following is a sample implementation:

import java.util.ArrayList;
import java.util.regex.Pattern;

import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;

public class MyCrawler extends WebCrawler {

Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

public My

Tags

crawler java multi-threaded opensource web webcrawler

In a Nutshell, crawler4j...

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Private Use

Use Patent Claims

Sub-License

Forbidden

Use Trademarks

Hold Liable

Required

Include Copyright

Include License

State Changes

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

All Licenses

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    Black Duck offers a free trial so you can discover if there are open source vulnerabilities in your code
  • ...
    learn about Open Hub updates and features on the Open Hub blog
  • ...
    65% of companies leverage OSS to speed application development in 2016
  • ...
    check out hot projects on the Open Hub

Languages

Languages?height=75&width=75
Java
80%
Groovy
11%
XML
9%
SQL
<1%

30 Day Summary

Sep 15 2018 — Oct 15 2018

12 Month Summary

Oct 15 2017 — Oct 15 2018
  • 88 Commits
    Down -34 (27%) from previous 12 months
  • 9 Contributors
    Down -7 (43%) from previous 12 months