Return to labs index
Lab #3 - Introduction to Hadoop
Due: Tuesday, April 6, 2010 at 11:59PM
Overview
The purpose of this assignment is just to get you set up with the tools
of the trade. You are asked to work your way through the Hadoop tutorial
and to perform some basic tasks. Project #4 will involve a much more
extravagent application of Hadoop.
No Partners
As the only goal of this assignment is to "Turn a very small wheel once",
you are asked to do it by yourself. You'll have trouble contributing
as an equal partner in the next lab, unless you can do something small-ish
by yourself, to be sure.
Links
Datasets
Download and unzip your choice of the following:
Your Tasks
- Carefully follow the Hadoop tutorial
- Get yourself set up to use Hadoop in Standalone mode via the Eclipse
Plug-In (You don't have to, but you can)
- Get the canonical "Word Count" example working. Turn in
not only the source code, but the Output when run upon the
Declaration of Independence of the United States.
- Write your own Hadoop/Map-Reduce Program that processes your
choice of Web traces and determines the most frequently accessed
object (Much like Word Count, eh?)
Submission
- Submit just your JAVA source files and your Makefile (don't give us the jar files)
- Submit to AFS (same place as your homework - there will be a directory for you)
We're Here To Help!
As always -- remember, we're here to help!