Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

scikit-learn

Compare

  Analyzed 4 days ago

scikit-learn is a Python module integrating various machine learning algorithms under a common interface. It offers a wide range of methods such as Support Vector Machines, linear models (L1, L2 penalized), logistic regression, gaussian mixture models and more. The large number of algorithms aleady ... [More] implemented allows for easy comparison of accuracy and performance of various algorithms. [Less]

8.66M lines of code

398 current contributors

5 days since last commit

80 users on Open Hub

Very High Activity
5.0
 
I Use This

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed 3 days ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.28M lines of code

374 current contributors

4 days since last commit

56 users on Open Hub

Very High Activity
5.0
 
I Use This

Natural Language Toolkit (NLTK)

Compare

  Analyzed 3 days ago

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.

234K lines of code

42 current contributors

about 2 months since last commit

45 users on Open Hub

Moderate Activity
5.0
 
I Use This

WEKA

Compare

  Analyzed 4 days ago

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is ... [More] also well-suited for developing new machine learning schemes. [Less]

780K lines of code

3 current contributors

over 1 year since last commit

38 users on Open Hub

Very Low Activity
3.93333
   
I Use This
Licenses: No declared licenses

Apache Mahout

Compare

Claimed by Apache Software Foundation Analyzed 3 days ago

Apache Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. ... [More] However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms [Less]

146K lines of code

0 current contributors

about 1 month since last commit

25 users on Open Hub

Low Activity
3.6
   
I Use This

TensorFlow

Compare

  Analyzed 4 days ago

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy ... [More] computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. [Less]

3.79M lines of code

798 current contributors

4 days since last commit

23 users on Open Hub

Very High Activity
5.0
 
I Use This

YALE Open-Source Java Data Mining

Compare

  Analyzed 4 days ago

YALE (Yet Another Learning Environment) is the most comprehensive open-source software for intelligent data analysis, data mining, knowledge discovery, machine learning, predictive analytics, forecasting, and analytics in business intelligence (BI). YALE provides more than 400 data mining operators ... [More] , a graphical user interface (GUI), an online tutorial with hands-on data mining applications, a comprehensive PDF tutorial, many visualization schemes for data sets and data mining results, many different learning and meta-learning schemes ranging from decision tree and rule learners to neural networks, SVMs, ensemble methods, etc. YALE is implemented in Java and available under GPL (GNU General Public License) as well as under a developer license (OEM license) for closed-source developers. [Less]

751K lines of code

0 current contributors

over 9 years since last commit

17 users on Open Hub

Inactive
4.25
   
I Use This
Licenses: No declared licenses

gCube

Compare

  Analyzed 4 days ago

gCube is a software system specifically designed and developed to enact the building and operation of *large scale infrastructures* providing their users with a rich array of services suitable for supporting the co-creation of *Virtual Research Environments* and promoting the implementation of *open ... [More] science* workflows and practices. It is at the heart of the D4Science.org infrastructure (www.d4science.org). [Less]

1.49M lines of code

15 current contributors

5 days since last commit

14 users on Open Hub

High Activity
4.66667
   
I Use This

Apache OpenNLP

Compare

Claimed by Apache Software Foundation Analyzed 3 days ago

Apache OpenNLP is a Java machine learning toolkit for natural language processing (NLP).

157K lines of code

8 current contributors

13 days since last commit

12 users on Open Hub

Moderate Activity
5.0
 
I Use This

dlib C++ Library

Compare

  Analyzed 4 days ago

This project is a modern C++ library with a focus on portability and program correctness. It strives to be easy to use right and hard to use wrong. Thus, it comes with extensive documentation and thorough debugging modes. The library provides a platform abstraction layer for common tasks such as ... [More] interfacing with network services, handling threads, or creating graphical user interfaces. Additionally, the library implements many useful algorithms such as data compression routines, linked lists, binary search trees, linear algebra and matrix utilities, machine learning algorithms, XML and text parsing, and many other general utilities. [Less]

450K lines of code

25 current contributors

6 days since last commit

11 users on Open Hub

Low Activity
4.75
   
I Use This