Overview
Web-Based Information Management entails the design, creation,
instrumentation and usage of web sites and related indexing and searching
software. The course focuses first on web-based search engines: how to use
them optimally, how to design e-commerce sites that maximize customer attraction
via search engines, how to analyze competition, and how to architect both
topological and key-term page access paths in service of successful e-commerce
infrastructures. Then, the course focuses on key technological underpinnings,
primarily the hands-on creation of a search engine, including inverted-indexing,
partial matching, query-expansion and spidering technology. Subsequently,
the course addresses related issues in web-based information architectures,
including: automated text categorization (e.g. indexing web pages into
Yahoo-like taxonomies or auction-site catalogs), information extraction from
web-pages, and a glimpse into larger-scale text and data mining methods.
Time permitting, the course will survey issues such as multi-lingual web access
and distributed information retrieval.
Course Information
Location: GSIA 152
Time:
8:30am - 9:50am Tue/Thu
Instructor: Prof. Jaime G. Carbonell (augmented with
other expert guest lecturers)
- Office: NSH 4519
- Email:jgc@cs.cmu.edu
- Tel:268-7279
Teaching assistant: Yan Liu
- Office: NSH 4506
- Email:yanliu@andrew.cmu.edu
- Tel:268-8692
- Office Hours: 4:00pm-5:00pm Thu
Course secretary:
TBA
- Office: NSH 4517
- Email:TBA
- Tel:268-4788
Prerequisite Skills
-
- Basic programming skills (Preferably JAVA)
-
- Familiarity with the Web (HTML, browsing, etc)
-
- Fundamentals of Web Programming
Textbook
-
- Required: Class notes and handouts
-
- Required:
-
- Understanding
Search Engines: Mathematical Modeling and Text Retrieval
-
- by Michael W. Berry, Murray Browne
-
- Also Available at http://www.siam.org/
or call their number 1-800-447-7426
-
- Optional Course Materials:
-
- 1. Understanding Search Engines: Mathematical Modeling and Text Retrieval
(chapter 1-3)
-
- 2. Large-Scale, Component-Based Development (chapter 2)
-
- 3. Databases and Transaction Processing: An Application-Oriented Approach
(chapter 4)
-
- 4. The Digital Economy Fact Book (chapter 5)
-
- There are two copies placed on reserve in the Hunt Library.
-
- Optional:
-
- Advances in Information Retrieval
-
- Edited by Croft, Kluwer Academic Publishers, 2000
-
- [A more detailed state-of-the-art IR book]
-
- Optional:
-
- Machine Learning
-
- by Tom M. Mitchell, WCB McGraw-Hill
-
- [Tools for text categorization and data mining]
Grading
-
- 30% homeworks (2 programming assignments)
-
- 30% mini-project (optional presentation with extra-credits)
-
- 15% midterm (closed book, caculator OK, no laptop, you can bring up to 5
page notes)
-
- 25% final exam (close book, no laptops, you can bring up to 10 page notes)
-
[ Home | Schedule |
Announcements
]
Last Modified:
Friday, October 24, 2003
yanliu@andrew.cmu.edu