Machine Learning: Clustering & Retrieval Course Syllabus
Full curriculum breakdown — modules, lessons, estimated time, and outcomes.
Overview (80-120 words) describing structure and time commitment. This course provides a comprehensive introduction to clustering and retrieval methods in machine learning, with a focus on document retrieval and topic modeling. Learners will gain hands-on experience implementing algorithms such as k-nearest neighbors, k-means, and latent Dirichlet allocation. The course spans approximately 30 hours of learning, divided into six modules, each combining theory and practical programming assignments. Designed for self-paced learning, it includes real-world applications in text analysis and information retrieval, culminating in a final project that integrates all techniques covered.
Module 1: Introduction to Clustering and Retrieval
Estimated time: 4 hours
- Overview of clustering tasks in machine learning
- Introduction to information retrieval systems
- Course structure and learning objectives
- Prerequisites and technical background review
Module 2: Nearest Neighbor Search
Estimated time: 6 hours
- Implementing k-NN for document retrieval
- Measuring similarity in text data using various metrics
- Optimizing k-NN search with KD-trees
- Scaling search using locality-sensitive hashing (LSH)
Module 3: Clustering
Estimated time: 7 hours
- Applying k-means to cluster documents by topic
- Understanding convergence and initialization in k-means
- Parallelizing k-means using MapReduce for large datasets
Module 4: Mixture Models and EM
Estimated time: 6 hours
- Introduction to probabilistic clustering
- Fitting mixture of Gaussian models
- Understanding and implementing the expectation maximization (EM) algorithm
Module 5: Topic Modeling with LDA
Estimated time: 7 hours
- Performing mixed membership modeling with LDA
- Understanding the structure of latent Dirichlet allocation
- Implementing Gibbs sampling for inference in topic models
Module 6: Case Study and Applications
Estimated time: 6 hours
- Applying retrieval and clustering techniques to real-world datasets
- Building a document retrieval system end-to-end
- Comparing supervised and unsupervised learning in retrieval contexts
Prerequisites
- Familiarity with basic machine learning concepts
- Intermediate programming skills in Python
- Basic understanding of probability and statistics
What You'll Be Able to Do After
- Implement document retrieval systems using k-NN
- Apply k-means and LDA for document clustering and topic modeling
- Optimize similarity search with KD-trees and LSH
- Use Gibbs sampling for inference in probabilistic models
- Design scalable clustering solutions using MapReduce