Activity Not Available

Project Summary

  Analyzed about 1 month ago based on code collected about 1 month ago.

Indus Learning Framework(ILF)The indus learning framework is a suite of machine learning algorithms that learn from datasets using sufficient statistics. This framework is particularly useful in the following scenarios:

When the data set is huge and the it cannot be fit into memory (e.g arff file is huge and weka runs out of memory)

When access to underlying data instances is not available (due to considerations such as security or cost) but the datasources provides some statistics (like count queries) The current implementation of the framework provides Naive Bayes and Decison Trees. The framework has been written so that it can be extended to include more classifiers that are amenable to the sufficient statistics approach.

Refer the user's guide wiki for how to run the various classifiers in the framework. Besides being run from command line the ILF allows provides API which can be used to integrated into a target application.

Integration Samplessample 1import airldm2.core.datatypes.relational.SingleRelationDataDescriptor;
import airldm2.core.datatypes.relational.RelationalDataSource;
import airldm2.util.SimpleArffFileReader;
import airldm2.classifiers.Evaluation
import weka.classifiers.evaluation.ConfusionMatrix;
import weka.core.Utils;

........
.......

String[] options= {"-b", "-trainTable", "votes_train", "-testFile","sample/HouseVotesTrain.arff"};


String trainTableName = Utils.getOption("trainTable", options);
String testFile = Utils.getOption("testFile", options);

NaiveBayesClassifier classifier = new NaiveBayesClassifier();

SingleRelationDataDescriptor desc = null;


SimpleArffFileReader readTest = new SimpleArffFileReader(testFile);
LDTestInstances testInst = readTest.getTestInstances();
desc = (SingleRelationDataDescriptor )testInst.getDesc();

SSDataSource dataSource = new RelationalDataSource(trainTableName);
// Create a Large DataSet Instance and set its descriptor and source
LDInstances trainData = new LDInstances();
trainData.setDesc(desc);
trainData.setDataSource(dataSource);

ConfusionMatrix matrix = Evaluation.evlauateModel2(classifier, trainData, testInst, options);
System.out.println(matrix.toString("===Confusion Matrix==="));

Extension With Indus Integration FrameworkThe system can use a data integration system to be able to learn from multiple disparate data sources. The current implementation has been extended to use Indus Integration Framework. User's are referred to the code and an example included in the source tree induse_extension_src

For feature requests contact neeraj.kaul@gmail.com

Share

In a Nutshell, induslearningframework...

Languages

XML
90%
Java
9%
2 Other
1%
 
 

Lines of Code

 

Activity

30 Day Summary

Apr 26 2015 — May 26 2015

12 Month Summary

May 26 2014 — May 26 2015
  • 0 Commits Down -1 (100%) from previous 12 months
  • 0 Contributors Down -1 (100%) from previous 12 months

Community

Ratings

Be the first to rate this project
 
Click to add your rating
 
Review this Project!