0
I Use This!
Activity Not Available
Analyzed about 1 year ago. based on code collected about 1 year ago.

Project Summary

Indus Learning Framework(ILF)The indus learning framework is a suite of machine learning algorithms that learn from datasets using sufficient statistics. This framework is particularly useful in the following scenarios:

When the data set is huge and the it cannot be fit into memory (e.g arff file is huge and weka runs out of memory)

When access to underlying data instances is not available (due to considerations such as security or cost) but the datasources provides some statistics (like count queries) The current implementation of the framework provides Naive Bayes and Decison Trees. The framework has been written so that it can be extended to include more classifiers that are amenable to the sufficient statistics approach.

Refer the user's guide wiki for how to run the various classifiers in the framework. Besides being run from command line the ILF allows provides API which can be used to integrated into a target application.

Integration Samplessample 1import airldm2.core.datatypes.relational.SingleRelationDataDescriptor;
import airldm2.core.datatypes.relational.RelationalDataSource;
import airldm2.util.SimpleArffFileReader;
import airldm2.classifiers.Evaluation
import weka.classifiers.evaluation.ConfusionMatrix;
import weka.core.Utils;

........
.......

String[] options= {"-b", "-trainTable", "votes_train", "-testFile","sample/HouseVotesTrain.arff"};


String trainTableName = Utils.getOption("trainTable", options);
String testFile = Utils.getOption("testFile", options);

NaiveBayesClassifier classifier = new NaiveBayesClassifier();

SingleRelationDataDescriptor desc = null;


SimpleArffFileReader readTest = new SimpleArffFileReader(testFile);
LDTestInstances testInst = readTest.getTestInstances();
desc = (SingleRelationDataDescriptor )testInst.getDesc();

SSDataSource dataSource = new RelationalDataSource(trainTableName);
// Create a Large DataSet Instance and set its descriptor and source
LDInstances trainData = new LDInstances();
trainData.setDesc(desc);
trainData.setDataSource(dataSource);

ConfusionMatrix matrix = Evaluation.evlauateModel2(classifier, trainData, testInst, options);
System.out.println(matrix.toString("===Confusion Matrix==="));

Extension With Indus Integration FrameworkThe system can use a data integration system to be able to learn from multiple disparate data sources. The current implementation has been extended to use Indus Integration Framework. User's are referred to the code and an example included in the source tree induse_extension_src

For feature requests contact neeraj.kaul@gmail.com

Tags

arfffiles decisiontrees largedatasets naivebayes sufficientstatistics weka

In a Nutshell, induslearningframework...

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    Black Duck offers a free trial so you can discover if there are open source vulnerabilities in your code
  • ...
    learn about Open Hub updates and features on the Open Hub blog
  • ...
    55% of companies leverage OSS for production infrastructure
  • ...
    compare projects before you chose one to use

Languages

Languages?height=75&width=75
XML
90%
Java
9%
2 Other
1%

30 Day Summary

Apr 9 2016 — May 9 2016

12 Month Summary

May 9 2015 — May 9 2016