4
I Use This!
Moderate Activity
Analyzed about 3 hours ago. based on code collected about 3 hours ago.

Project Summary

Crawler4j is an open source Java Crawler which provides a simple interface for crawling the web. Using it, you can setup a multi-threaded web crawler in 5 minutes!

Sample UsageFirst, you need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. The following is a sample implementation:

import java.util.ArrayList;
import java.util.regex.Pattern;

import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.url.WebURL;

public class MyCrawler extends WebCrawler {

Pattern filters = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

public My

Tags

crawler java multi-threaded opensource web webcrawler

Apache License 2.0
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Private Use

Use Patent Claims

Sub-License

Forbidden

Use Trademarks

Hold Liable

Required

Include Copyright

Include License

State Changes

Include Notice

These details are provided for information only. No information here is legal advice and should not be used as such.

All Licenses

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    Black Duck offers a free trial so you can discover if there are open source vulnerabilities in your code
  • ...
    check out hot projects on the Open Hub
  • ...
    55% of companies leverage OSS for production infrastructure
  • ...
    data presented on the Open Hub is available through our API

Languages

Languages?height=75&width=75
Java
75%
Groovy
13%
XML
9%
3 Other
3%

30 Day Summary

Nov 16 2018 — Dec 16 2018

12 Month Summary

Dec 16 2017 — Dec 16 2018
  • 145 Commits
    Up + 44 (43%) from previous 12 months
  • 8 Contributors
    Down -9 (52%) from previous 12 months