Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed 7 months ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.4M lines of code

380 current contributors

7 months since last commit

55 users on Open Hub

Activity Not Available
5.0
 
I Use This

Apache Hive

Compare

Claimed by Apache Software Foundation Analyzed 7 months ago

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More] Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language. [Less]

1.72M lines of code

114 current contributors

7 months since last commit

24 users on Open Hub

Activity Not Available
5.0
 
I Use This

AppScale

Compare

  Analyzed about 16 hours ago

AppScale is an open-source implementation of the Google AppEngine (GAE) cloud computing interface. AppScale enables execution of GAE applications on virtualized cluster systems. In particular, AppScale enables users to execute GAE applications using their own clusters with greater scalability and ... [More] reliability than the GAE SDK provides. Moreover, AppScale executes automatically and transparently over cloud infrastructures such as the Amazon Web Services (AWS) Elastic Compute Cloud (EC2) and Eucalyptus, the open-source implementation of the AWS interfaces. [Less]

1.14M lines of code

11 current contributors

1 day since last commit

7 users on Open Hub

High Activity
5.0
 
I Use This

Apache Avro

Compare

Claimed by Apache Software Foundation Analyzed about 24 hours ago

Avro is a serialization system.

184K lines of code

65 current contributors

4 days since last commit

6 users on Open Hub

High Activity
0.0
 
I Use This

StreamSets Data Collector

Compare

Claimed by StreamSets Analyzed about 15 hours ago

Open source software for the rapid development and ​reliable​ operation of complex data flows.

841K lines of code

57 current contributors

6 days since last commit

4 users on Open Hub

High Activity
5.0
 
I Use This

Apache Flume

Compare

Claimed by Apache Software Foundation Analyzed 5 months ago

Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.

103K lines of code

8 current contributors

6 months since last commit

4 users on Open Hub

Activity Not Available
0.0
 
I Use This

Apache Hama

Compare

Claimed by Apache Software Foundation Analyzed about 7 hours ago

Hama is a distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations, Currently being incubated as one of the incubator project by the Apache Software Foundation

54.7K lines of code

0 current contributors

over 3 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

Apache Whirr

Compare

Claimed by Apache Software Foundation Analyzed about 3 hours ago

Apache Whirr is a set of libraries for running cloud services. Whirr provides: * A cloud-neutral way to run services. You don't have to worry about the idiosyncrasies of each provider. * A common service API. The details of provisioning are particular to the service. * Smart defaults for ... [More] services. You can get a properly configured system running quickly, while still being able to override settings as needed. You can also use Whirr as a command line tool for deploying clusters. [Less]

26.9K lines of code

0 current contributors

almost 4 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

DevOps Perl Tools

Compare

  Analyzed about 7 hours ago

DevOps CLI Tools for Hadoop, Hive, HDFS file/snapshot age out, Solr / SolrCloud CLI, Ambari FreeIPA Kerberos, Config / Log Anonymizer, URL watcher for load balanced web farms, SQL ReCaser (Hive, Impala, Cassandra CQL, Couchbase N1QL, MySQL, PostgreSQL, Apache Drill, Microsoft SQL Server, Oracle, Pig ... [More] Latin, Neo4j, InfluxDB, Dockerfiles), Nginx stats watcher, Datameer, Linux tools... [Less]

4.7K lines of code

1 current contributors

2 months since last commit

1 users on Open Hub

Very Low Activity
0.0
 
I Use This
Licenses: No declared licenses

archon

Compare

  Analyzed over 1 year ago

It is a OSGi based distributed system controler used to build/manage linux boxes

463 lines of code

0 current contributors

almost 5 years since last commit

1 users on Open Hub

Activity Not Available
5.0
 
I Use This