47
I Use This!
Very High Activity
Analyzed 7 days ago. based on code collected 19 days ago.

Project Summary

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with disk-based systems like Hadoop.

To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells.

Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source.

Tags

apache bigdata cluster clustercomputing distributed distributed_computing ec2 graph_computing hadoop hdfs in_memory java machine_learning mapreduce ml python scala sql streaming streamingdata

In a Nutshell, Apache Spark...

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    Black Duck offers a free trial so you can discover if there are open source vulnerabilities in your code
  • ...
    you can embed statistics from Open Hub on your site
  • ...
    65% of companies leverage OSS to speed application development in 2016
  • ...
    search using multiple tags to find exactly what you need

Languages

Languages?height=75&width=75
Scala
68%
Java
16%
Python
8%
11 Other
8%

30 Day Summary

Feb 8 2017 — Mar 10 2017

12 Month Summary

Mar 10 2016 — Mar 10 2017
  • 8238 Commits
    Down -1818 (18%) from previous 12 months
  • 451 Contributors
    Down -195 (30%) from previous 12 months

Ratings

8 users rate this project:
5.0
 
5.0/5.0
Click to add your rating
   Spinner
Review this Project!