0
I Use This!
Activity Not Available
Analyzed 10 months ago. based on code collected 10 months ago.

Project Summary

Indus Learning Framework(ILF)The indus learning framework is a suite of machine learning algorithms that learn from datasets using sufficient statistics. This framework is particularly useful in the following scenarios:

When the data set is huge and the it cannot be fit into memory (e.g arff file is huge and weka runs out of memory)

When access to underlying data instances is not available (due to considerations such as security or cost) but the datasources provides some statistics (like count queries) The current implementation of the framework provides Naive Bayes and Decison Trees. The framework has been written so that it can be extended to include more classifiers that are amenable to the sufficient statistics approach.

Refer the user's guide wiki for how to run the various classifiers in the framework. Besides being run from command line the ILF allows provides API which can be used to integrated into a target application.

Integration Samplessample 1import airldm2.core.datatypes.relational.SingleRelationDataDescriptor;
import airldm2.core.datatypes.relational.RelationalDataSource;
import airldm2.util.SimpleArffFileReader;
import airldm2.classifiers.Evaluation
import weka.classifiers.evaluation.ConfusionMatrix;
import weka.core.Utils;

........
.......

String[] options= {"-b", "-trainTable", "votes_train", "-testFile","sample/HouseVotesTrain.arff"};


String trainTableName = Utils.getOption("trainTable", options);
String testFile = Utils.getOption("testFile", options);

NaiveBayesClassifier classifier = new NaiveBayesClassifier();

SingleRelationDataDescriptor desc = null;


SimpleArffFileReader readTest = new SimpleArffFileReader(testFile);
LDTestInstances testInst = readTest.getTestInstances();
desc = (SingleRelationDataDescriptor )testInst.getDesc();

SSDataSource dataSource = new RelationalDataSource(trainTableName);
// Create a Large DataSet Instance and set its descriptor and source
LDInstances trainData = new LDInstances();
trainData.setDesc(desc);
trainData.setDataSource(dataSource);

ConfusionMatrix matrix = Evaluation.evlauateModel2(classifier, trainData, testInst, options);
System.out.println(matrix.toString("===Confusion Matrix==="));

Extension With Indus Integration FrameworkThe system can use a data integration system to be able to learn from multiple disparate data sources. The current implementation has been extended to use Indus Integration Framework. User's are referred to the code and an example included in the source tree induse_extension_src

For feature requests contact neeraj.kaul@gmail.com

Tags

arfffiles naivebayes largedatasets weka decisiontrees sufficientstatistics

In a Nutshell, induslearningframework...

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    Black Duck offers a free trial so you can discover if there are open source vulnerabilities in your code
  • ...
    data presented on the Open Hub is available through our API
  • ...
    use of OSS increased in 65% of companies in 2016
  • ...
    compare projects before you chose one to use

Languages

Languages?height=75&width=75
XML
90%
Java
9%
2 Other
1%

30 Day Summary

Apr 9 2016 — May 9 2016

12 Month Summary

May 9 2015 — May 9 2016