Information Retrieval and Web Search
Instructor: Rada Mihalcea
Class mailing list

Spring 2008

TTh 12:30P-01:50PM




  • The project description is now posted.
  • There is no class on 03/27. Instead, we will have a seminar on 03/28 at 10am in F223.
  • Mandatory reading of the paper by Salton et al. on term weighting approaches for Thursday, 02/21. A link to an online version of the paper has been sent to the class mailing list.
  • Assignment 1 is due on 02/21.
  • Welcome to CSCE5200. Sign up to the class mailing list.


Click here for a syllabus.


Instructor: Rada Mihalcea
Office: Research Park, F228, tel: 940-369-7630
Email: rada at cs unt edu
Class hours: TTh 12:30-01:50pm
Office hours: TTh 03:00-04:00pm or by appointment. Anytime electronically.
Teaching assistant: Satya Mudunuru
Email: chandu at unt edu
Office hours: MW 12-2pm, F205
Course description: This course will cover traditional material, as well as recent advances in Information Retrieval (IR), the study of indexing, processing, and querying textual data. Basic retrieval models, algorithms, and IR system implementations will be covered. The course will also address more advanced topics in "intelligent" IR, including Natural Language Processing techniques, and "smart" Web agents.



Date Lecture Reading material NB
01/15/2008 Course overview (ppt) - -
01/17/2008 Introduction to IR models and methods [ppt] - -
01/22/2008 Perl tutorial (ppt) - -
01/24/2008 Perl tutorial (ppt) - -
01/29/2008 Text processing [ppt] Porter stemmer
[CM] Chap.2: The term vocabulary & postings lists
-
01/31/2008 Text properties [ppt]
Web Spidering [ppt]
[CM] Chap.5: Index compression, sect.5.1
[CM] Chap.20: Web crawling and indexes
Optional reading: [BY] chapter 6.3
Lecturer: Andras Csomai
02/05/2008 Practical problems in web spidering [ppt] - Lecturer: Andras Csomai
Assignment 1 issued
02/07/2008 Vector space model [ppt] [CM] Chap.6: Scoring, term weighting and the vector space model Lecturer: Andras Csomai
02/12/2008 Boolean model and extensions [ppt] [CM] Chap.1: Boolean retrieval Lecturer: Hakan Ceylan
02/14/2008 Alternative IR models. [ppt] [CM] Chap.11: Probabilistic IR
[CM] Chap.18: LSA
Lecturer: Hakan Ceylan
02/19/2008 Review IR models
IR evaluation and IR test collections. [ppt]
[CM] Chap.8: Evaluation in information retrieval
-
02/21/2008 Term weighting schemes [CM] Chap.6: Scoring, term weighting and the vector space model
[KSJ] Term weigthing approaches, pg. 323
Assignment 1 due.
Assignment 2 issued
02/26/2008 Relevance feedback. [ppt] [CM] Chap.9. -
02/28/2008 Query expansion [ppt]
Text classification [ppt]
[CM] Chapter 13. -
03/04/2008 Text classification [ppt]
See also: Intro Machine Learning [ppt]
[CM] Chapter 13. -
03/07/2008 Learning language from its perceptual context
Invited speaker: Ray Mooney, University of Texas at Austin
F228, 11am
- Note the unusual day/time/room
03/11/2008 Review: exam I preparation All the material studied so far -
03/13/2008 Exam I - Assignment 2 due on 03/14
03/18/2008 Spring break - -
03/20/2008 Spring break - -
03/25/2008 Link analysis. HITS. PageRank. . [CM] Chapter 21.
Page L. et. al Page Rank Citation Ranking: Bringing Order to the Web
Also check this page.
Assignment 3 issued on 03/20.
03/28/2008 Keyword Extraction and Back-of-the-Book Indexing
Andras Csomai
F228, 10am
- Note the unusual day/time/place
04/01/2008 Question Answering [ppt]
Check the TREC Q&A site -
04/03/2008 Topic Sensitive PageRank Haveliwala. "Topic-Sensitive PageRank" [pdf] -
04/08/2008 Special Topics: Topic Sensitive PageRank
Special Topics: Introduction to Information Extraction. (ppt)
- -
04/10/2008 Special Topics: Cross language Information Retrieval (ppt) Check the Cross Language Evaluation Forum CLEF Assignment 3 due.
04/15/2008 Special Topics: Web 2.0: Wikis and Blogs - -
04/17/2008 Special topics: Recommender Systems - -
04/22/2008 Search Engine Technologies
Exam II preparation
- All material studied so far (papers included) -
04/24/2008 Exam II - All material studied so far (papers included) -
04/29/2008 Project presentations I. - -
05/01/2008 Project presentations II. - -



[CM] = Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze
[BY] = Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval
[KSJ] = Karen Sparck Jones and Peter Willett, Readings in Information Retrieval







Textbook

Introduction to Information Retrieval
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze

Recommended reading

While these books are recommended, they are not mandatory.
You are not required to buy them (for some of them, you can consult them at the UNT library).
Readings in Information Retrieval
K.Sparck Jones and P. Willett

Get a quote from BestBookBuys
Modern Information Retrieval
Ricardo Baeza-Yates and Berthier Ribeiro-Neto

Buy this book (new) from Amazon.
Compare prices (new or used) at BestBookBuys
Information Retrieval: Data Structures and Algorithms
W.Frakes and R. Baeza-Yates

Get a quote from BestBookBuys



Resources

Other Information Retrieval courses on the Web

Search engines

Programming resources