Projects tagged ‘hadoop’

Apache Spark

Claimed by Apache Software Foundation Analyzed about 4 hours ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More]

1.52M lines of code

374 current contributors

about 6 hours since last commit

56 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in Scala

Licenses: apache_2

Apache HBase

Claimed by Apache Software Foundation Analyzed 1 day ago

HBase is the Hadoop database. Its an open-source, distributed, column-oriented store modeled after the Google paper, Bigtable: A Distributed Storeage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides ... [More]

1.02M lines of code

120 current contributors

2 days since last commit

31 users on Open Hub

High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bigtable columnstore database hadoop hbase

Apache Mahout

Claimed by Apache Software Foundation Analyzed 1 day ago

Apache Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. ... [More]

146K lines of code

0 current contributors

over 1 year since last commit

25 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags algorithms classifiers clustering collaborative_filtering data_mining datamining dimension_reduction distributed distributed_computing hadoop java library 5 more...

Apache Accumulo

Claimed by Apache Software Foundation Analyzed about 10 hours ago

Apache Accumulo is a sorted, distributed key/value store based on Google's BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can ... [More]

511K lines of code

34 current contributors

about 20 hours since last commit

24 users on Open Hub

High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bigtable database distributed hadoop key_value scalability

Apache Hive

Claimed by Apache Software Foundation No analysis available

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More]

0 lines of code

0 current contributors

0 since last commit

23 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags apache bigdata cluster clustercomputing distributed_computing hadoop hdfs java mapreduce orc spark sql 4 more...

Apache Pig

Claimed by Apache Software Foundation Analyzed about 6 hours ago

Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which ... [More]

762K lines of code

4 current contributors

5 days since last commit

10 users on Open Hub

Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags hadoop pig

Apache Flink

Claimed by Apache Software Foundation Analyzed about 2 hours ago

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Learn more about Flink at http://flink.apache.org/

2.12M lines of code

323 current contributors

1 day since last commit

9 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags apache bigdata cluster distributed hadoop java machinelearning mapreduce scala streaming

Apache Avro

Claimed by Apache Software Foundation No analysis available

Avro is a serialization system.

0 lines of code

75 current contributors

0 since last commit

8 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags hadoop hdfs serialization

AppScale

Analyzed about 22 hours ago

AppScale is an open-source implementation of the Google AppEngine (GAE) cloud computing interface. AppScale enables execution of GAE applications on virtualized cluster systems. In particular, AppScale enables users to execute GAE applications using their own clusters with greater scalability and ... [More]

1.23M lines of code

10 current contributors

over 5 years since last commit

7 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Python

Licenses: apache_2, bsd

Tags appengine appscale cassandra cloudcomputing hadoop hbase hdfs hypertable memcachedb mongodb mysql platform-as-a-service 1 more...

Apache Impala

Claimed by Apache Software Foundation Analyzed 43 minutes ago

Apache Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive. This ... [More]

897K lines of code

64 current contributors

1 day since last commit

7 users on Open Hub

High Activity

0 Reviews

I Use This

Mostly written in C++

Licenses: apache_2

Tags cloudera distributed hadoop impala query sql

Tags : Browse Projects