Projects tagged ‘bigdata’

deeplearning4j

D

Analyzed about 6 hours ago

Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library; designed to be used in business environments. Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. Vast ... [More]

1.1M lines of code

17 current contributors

4 months since last commit

5 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Apache Ignite

Claimed by Apache Software Foundation Analyzed about 18 hours ago

Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.

1.51M lines of code

0 current contributors

2 days since last commit

4 users on Open Hub

High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags apache bigdata cache cluster cluster_computing distributed distributed_computing elastic grid grid_computing in_memory java 2 more...

Facebook Presto

Claimed by Facebook Analyzed about 12 hours ago

Distributed SQL query engine for big data Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and ... [More]

2.65M lines of code

125 current contributors

4 days since last commit

4 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bigdata distributed engine facebook hadoop query sql

Apache Airavata

Claimed by Apache Software Foundation Analyzed about 3 hours ago

Apache Airavata is a software toolkit currently used to build science gateways but that has a much wider potential use. It provides features to compose, manage, execute, and monitor small to large scale applications and workflows on computational resources ranging from local clusters to national ... [More]

2.78M lines of code

15 current contributors

15 days since last commit

4 users on Open Hub

Moderate Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags apache bigdata clustercomputing distributed_computing highperformancecomputing workflow

StreamSets Data Collector

Claimed by StreamSets No analysis available

Open source software for the rapid development and reliable operation of complex data flows.

0 lines of code

60 current contributors

0 since last commit

4 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags azure bigdata cassandra cluster dataflow ec2 etl hadoop hdfs ingest jdbc kafka 5 more...

Apache Flume

Claimed by Apache Software Foundation Analyzed about 7 hours ago

Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.

83.7K lines of code

3 current contributors

25 days since last commit

4 users on Open Hub

Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags apache apache-software-foundation bigdata data hadoop hdfs java mapreduce streamingdata

snowplow

Analyzed about 21 hours ago

Code base for computer science projects.

5.14K lines of code

15 current contributors

24 days since last commit

3 users on Open Hub

Moderate Activity

0 Reviews

I Use This

Mostly written in SQL

Licenses: apache_2

Tags analytics bigdata emr event hadoop redshift s3 scala scalding

Apache Giraph

Claimed by Apache Software Foundation Analyzed 4 months ago

Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service. Its implemented a graph-processing framework that is launched as a typical Hadoop job to leverage existing ... [More]

141K lines of code

5 current contributors

about 2 years since last commit

3 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bigdata bsp fault_tolerance graph_processing graphs hadoop large_scale zookeeper

Crate Data

Claimed by Crate.IO Analyzed 1 day ago

A massively scalable SQL data store. Zero administration required.

566K lines of code

30 current contributors

2 days since last commit

3 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bigdata cloud cluster database distributed fulltext_search java lucene nosql realtime scaleable search 2 more...

HPCC Systems

Analyzed 1 day ago

HPCC (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform that solves Big Data problems. The HPCC Systems architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client ... [More]

1.67M lines of code

33 current contributors

1 day since last commit

3 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in C++

Licenses: apache_2, Creative_...

Tags bigdata ec2 ecl hadoop high_performance_computing highperformancecomputing hpcc

Tags : Browse Projects