Data Science Capstone Course Syllabus
Full curriculum breakdown — modules, lessons, estimated time, and outcomes.
Overview: This capstone course guides learners through the end-to-end development of a real-world data science project using the SwiftKey dataset, which includes text from blogs, news, and tweets. You'll apply skills in data cleaning, natural language processing, predictive modeling, and data product deployment. The course emphasizes independent work, peer review, and the creation of a portfolio-ready interactive application. With approximately 14–17 hours of total effort, this project integrates concepts from the entire Data Science Specialization and culminates in a professional presentation and functional app.
Module 1: Getting Started & Understanding the Project
Estimated time: 1 hour
- Introduction to the capstone objectives and expectations
- Overview of the SwiftKey dataset and its sources
- Understanding project milestones and deliverables
- Setting up the project environment
Module 2: Data Exploration and Cleaning
Estimated time: 3–4 hours
- Load and inspect raw text data from blogs, news, and tweets
- Identify and handle missing or inconsistent data
- Analyze language patterns and word frequency distributions
- Preprocess text using cleaning and normalization techniques
Module 3: Model Building and Prediction
Estimated time: 3–4 hours
- Apply tokenization and n-gram methods to text data
- Build a next-word prediction model using NLP techniques
- Evaluate model performance and accuracy
- Optimize model parameters for better predictions
Module 4: Developing a Data Product
Estimated time: 3 hours
- Design an interactive application using Shiny or similar tools
- Integrate the predictive model into the user interface
- Ensure responsiveness, usability, and real-time functionality
Module 5: Communicating Your Results
Estimated time: 2 hours
- Summarize project methodology and approach
- Highlight key findings and model performance
- Create a professional slide deck for presentation
Module 6: Final Project
Estimated time: 1–2 hours
- Submit the interactive data application
- Submit the final presentation deck
- Participate in peer review by evaluating others' projects
Prerequisites
- Completion of the Data Science Specialization courses
- Familiarity with R programming and RStudio
- Understanding of exploratory data analysis, regression, and basic machine learning
What You'll Be Able to Do After
- Build and deploy a complete data science project from start to finish
- Apply natural language processing techniques to real-world text data
- Create interactive, user-facing data applications
- Communicate technical results effectively through presentations
- Work independently on open-ended data challenges with peer feedback