Large-scale, powerful and battery included!
HadoopLDA can train LDA model with large corpus in parallel on a Hadoop cluster. It use distributed Gibbs Sampling technique, with built-in vocabulary selection. HadoopLDA is easy to use, a single command can turn huge amount of documents into a compact topic model file, and a Java class(LdaModel) is included for use the model easily in your code.
Scientific research? Comparing document similarity? Matching ads to users? Discover corpus structure? You choose. With HadoopLDA and a Hadoop cluster, it will be an easy task.
Source code, binary package and Getting Started doc will be uploaded soon.
Use Patent Claims