Robin Anil

Apache Mahout 0.2 Released – Now classify, cluster and generate recommendations!

Posted by Robin Anil | Posted in classification, clustering, datamining, java, lucene, machine learning, mahout, map/reduce, recommendations | Posted on 18-11-2009

0

Apache Mahout

For the past two years, I have been working with this amazing bunch of people whilst, being paid by Google in their summer of code program in a project called Mahout. And like the name says, it is trying to tame the young beast known as Hadoop. I have received a lot from the community. Being part of the project, I have got some real exposure to Java, data mining, machine learning and hands on experience over distributed systems like Hadoop, Hbase, Pig. The project is still in its infancy, but, its ambitions are high in the sky. I am happy to announce the second release of the project, and proud to be a part of it. I hope people will adapt it in their projects and that it becomes the defacto standard machine learning library the way lucene and hadoop has become in their respective focus areas.

If you are already excited and want to take it for a ride, read Grant’s article on IBM developerworks here
The release announcement below

Apache Mahout 0.2 has been released and is now available for public download at http://www.apache.org/dyn/closer.cgi/lucene/mahout

Up to date maven artifacts can be found in the Apache repository at
https://repository.apache.org/content/repositories/releases/org/apache/mahout/

Apache Mahout is a subproject of Apache Lucene with the goal of delivering scalable machine learning algorithm implementations under the Apache license. http://www.apache.org/licenses/LICENSE-2.0

Mahout is a machine learning library meant to scale: Scale in terms of community to support anyone interested in using machine learning. Scale in terms of business by providing the library under a commercially friendly, free software license. Scale in terms of computation to the size of data we manage today.

Built on top of the powerful map/reduce paradigm of the Apache Hadoop project, Mahout lets you solve popular machine learning problem settings like clustering, collaborative filtering and classification
over Terabytes of data over thousands of computers.

Implemented with scalability in mind the latest release brings many performance optimizations so that even in a single node setup the library performs well.

The complete changelist can be found here:

http://issues.apache.org/jira/browse/MAHOUT/fixforversion/12313278

New Mahout 0.2 features include

Major performance enhancements in Collaborative Filtering, Classification and Clustering
New: Latent Dirichlet Allocation(LDA) implementation for topic modelling
New: Frequent Itemset Mining for mining top-k patterns from a list of transactions
New: Decision Forests implementation for Decision Tree classification (In Memory & Partial Data)
New: HBase storage support for Naive Bayes model building and classification
New: Generation of vectors from Text documents for use with Mahout Algorithms
Performance improvements in various Vector implementations
Tons of bug fixes and code cleanup

Getting started: New to Mahout?

Download Mahout at http://www.apache.org/dyn/closer.cgi/lucene/mahout
Check out the Quick start: http://cwiki.apache.org/MAHOUT
Read the Mahout Wiki: http://cwiki.apache.org/MAHOUT
Join the community by subscribing to mahout-user@lucene.apache.org
Give back: http://www.apache.org/foundation/getinvolved.html
Consider adding yourself to the power by Wiki page:http://cwiki.apache.org/MAHOUT/poweredby.html

For more information on Apache Mahout, see http://lucene.apache.org/mahout

Featured Post

My first book

Apache Mahout 0.2 Released – Now classify, cluster and generate recommendations!

Posted by Robin Anil | Posted in classification, clustering, datamining, java, lucene, machine learning, mahout, map/reduce, recommendations | Posted on 18-11-2009

0

My Book

Robin Anil