Getting and Cleaning Data Course Syllabus

Full curriculum breakdown — modules, lessons, estimated time, and outcomes.

This course provides a comprehensive introduction to acquiring, cleaning, and transforming real-world data using R. Designed for beginners in data science, it emphasizes practical skills in handling diverse data formats and sources such as APIs, databases, and web pages. You'll learn how to convert raw data into tidy, analysis-ready datasets while applying principles of reproducibility and documentation. With approximately 19 hours of content, the course combines hands-on exercises with real-world examples to build strong foundational data preparation skills.

Module 1: Introduction and Getting Raw Data

Estimated time: 2 hours

  • Understanding the difference between raw and tidy data
  • Downloading and reading data from local and online sources
  • Introduction to using data.table for fast data manipulation

Module 2: Reading and Cleaning Data

Estimated time: 1 hour

  • Accessing data from MySQL databases and web APIs
  • Importing and handling data in multiple formats (Excel, XML, JSON)
  • Preprocessing steps including trimming, renaming, filtering

Module 3: Data Tidying and Transformation

Estimated time: 10 hours

  • Reshaping data using functions like melt and dcast
  • Merging datasets using merge
  • Dealing with missing values and inconsistent formatting
  • Practical cleaning and transformation with real-world datasets

Module 4: Reproducible Research and Final Project

Estimated time: 6 hours

  • Writing clean, reproducible code for data workflows
  • Creating R scripts and markdown documentation for analysis
  • Final project to demonstrate cleaning, transforming, and documenting data

Prerequisites

  • Basic knowledge of R programming
  • Familiarity with fundamental programming concepts
  • Access to R and RStudio environment

What You'll Be Able to Do After

  • Acquire data from sources such as web pages, APIs, databases, and flat files
  • Clean and reshape datasets into tidy formats ready for analysis
  • Perform data manipulation using R and essential libraries like data.table
  • Work with different file formats: CSV, XML, JSON, Excel, HDF5
  • Apply principles of reproducible research in data processing workflows
View Full Course Review

Course AI Assistant Beta

Hi! I can help you find the perfect online course. Ask me something like “best Python course for beginners” or “compare data science courses”.