What will you in the Getting and Cleaning Data Course
Acquire data from sources such as web pages, APIs, databases, and flat files
Clean and reshape datasets into tidy formats ready for analysis
Perform data manipulation using R and essential libraries like
data.table
Work with different file formats: CSV, XML, JSON, Excel, HDF5
Apply principles of reproducible research in data processing workflows
Program Overview
1. Introduction and Getting Raw Data
Duration: 2 hours
Understanding the difference between raw and tidy data
Downloading and reading data from local and online sources
Introduction to using
data.table
for fast data manipulation
2. Reading and Cleaning Data
Duration: 1 hour
Accessing data from MySQL databases and web APIs
Importing and handling data in multiple formats (Excel, XML, JSON)
Preprocessing steps including trimming, renaming, filtering
3. Data Tidying and Transformation
Duration: 10 hours
Reshaping data using functions like
melt
,dcast
, andmerge
Dealing with missing values and inconsistent formatting
Practical cleaning and transformation with real-world datasets
4. Reproducible Research and Final Project
Duration: 6 hours
Writing clean, reproducible code for data workflows
Creating R scripts and markdown documentation for analysis
Final project to demonstrate cleaning, transforming, and documenting data
Get certificate
Job Outlook
Data Analysts: Improve reliability and integrity of analysis pipelines
Data Scientists: Gain strong foundational skills in preprocessing
Researchers: Support reproducibility in scientific data workflows
Students and Beginners: Build readiness for advanced data science or machine learning
Specification: Getting and Cleaning Data
|