What will you learn in Data Science Projects with Python Course
Gain hands-on experience exploring, cleaning, and visualizing real-world datasets with pandas and Matplotlib
Build and evaluate logistic regression models, addressing overfitting through regularization and cross-validation
Train and tune decision tree and random forest classifiers to improve predictive accuracy
Master gradient boosting with XGBoost and interpret model outputs using SHAP values
Program Overview
Module 1: Introduction
⏳ 30 minutes
Topics: Role of ML in data science; essential Python libraries (pandas, scikit-learn)
Hands-on: Get set up in Jupyter, load the case-study data, and verify basic data integrity
Module 2: Data Exploration & Cleaning
⏳ 4 hours
Topics: Data-quality checks, handling missing values, categorical encoding
Hands-on: Perform end-to-end data cleaning and exploratory analysis on the credit dataset
Module 3: Introduction to scikit-learn & Model Evaluation
⏳ 3.5 hours
Topics: Synthetic data generation, train/test splitting, evaluation metrics (accuracy, ROC)
Hands-on: Train logistic regression, compute confusion matrix and ROC curve
Module 4: Details of Logistic Regression & Feature Extraction
⏳ 4 hours
Topics: Feature-response relationships, univariate selection (F-test), sigmoid function
Hands-on: Implement feature selection, plot decision boundaries, and interpret coefficients
Module 5: The Bias-Variance Trade-Off
⏳ 3.5 hours
Topics: Gradient descent optimization, L1/L2 regularization, cross-validation pipelines
Hands-on: Apply regularization techniques and hyperparameter tuning in scikit-learn
Module 6: Decision Trees & Random Forests
⏳ 3.25 hours
Topics: Tree-based learning, node impurity, hyperparameter grid search, ensemble methods
Hands-on: Train and tune decision tree and random forest models; visualize performance
Module 7: Gradient Boosting, XGBoost & SHAP Values
⏳ 3 hours
Topics: XGBoost hyperparameters (learning rate, early stopping), SHAP interpretability
Hands-on: Perform randomized grid search and generate SHAP explanations for case-study data
Module 8: Test-Set Analysis, Financial Insights & Delivery
⏳ 2.5 hours
Topics: Probability calibration, decile cost charts, business-impact analysis
Hands-on: Derive financial metrics (cost savings, ROI) and prepare client-ready deliverables
Module 9: Appendix – Local Jupyter Setup
⏳ 15 minutes
Topics: Recommended environment setup, Anaconda installation
Hands-on: Create and configure a local Jupyter Notebook for offline work
Get certificate
Job Outlook
Median annual wage for data scientists in the U.S.: $112,590
Projected data science job growth of 36% from 2023 to 2033, far outpacing average for all occupations
Roles include Data Scientist, ML Engineer, and Analytics Consultant across finance, healthcare, and tech
Expertise in end-to-end ML workflows unlocks opportunities in startups and enterprise data teams
Specification: Data Science Projects with Python
|