Data Science Capstone Course Syllabus

Full curriculum breakdown — modules, lessons, estimated time, and outcomes.

Overview: This capstone course guides learners through the end-to-end development of a real-world data science project using the SwiftKey dataset, which includes text from blogs, news, and tweets. You'll apply skills in data cleaning, natural language processing, predictive modeling, and data product deployment. The course emphasizes independent work, peer review, and the creation of a portfolio-ready interactive application. With approximately 14–17 hours of total effort, this project integrates concepts from the entire Data Science Specialization and culminates in a professional presentation and functional app.

Module 1: Getting Started & Understanding the Project

Estimated time: 1 hour

  • Introduction to the capstone objectives and expectations
  • Overview of the SwiftKey dataset and its sources
  • Understanding project milestones and deliverables
  • Setting up the project environment

Module 2: Data Exploration and Cleaning

Estimated time: 3–4 hours

  • Load and inspect raw text data from blogs, news, and tweets
  • Identify and handle missing or inconsistent data
  • Analyze language patterns and word frequency distributions
  • Preprocess text using cleaning and normalization techniques

Module 3: Model Building and Prediction

Estimated time: 3–4 hours

  • Apply tokenization and n-gram methods to text data
  • Build a next-word prediction model using NLP techniques
  • Evaluate model performance and accuracy
  • Optimize model parameters for better predictions

Module 4: Developing a Data Product

Estimated time: 3 hours

  • Design an interactive application using Shiny or similar tools
  • Integrate the predictive model into the user interface
  • Ensure responsiveness, usability, and real-time functionality

Module 5: Communicating Your Results

Estimated time: 2 hours

  • Summarize project methodology and approach
  • Highlight key findings and model performance
  • Create a professional slide deck for presentation

Module 6: Final Project

Estimated time: 1–2 hours

  • Submit the interactive data application
  • Submit the final presentation deck
  • Participate in peer review by evaluating others' projects

Prerequisites

  • Completion of the Data Science Specialization courses
  • Familiarity with R programming and RStudio
  • Understanding of exploratory data analysis, regression, and basic machine learning

What You'll Be Able to Do After

  • Build and deploy a complete data science project from start to finish
  • Apply natural language processing techniques to real-world text data
  • Create interactive, user-facing data applications
  • Communicate technical results effectively through presentations
  • Work independently on open-ended data challenges with peer feedback
View Full Course Review

Course AI Assistant Beta

Hi! I can help you find the perfect online course. Ask me something like “best Python course for beginners” or “compare data science courses”.