Big Data Integration and Processing Course Syllabus

Full curriculum breakdown — modules, lessons, estimated time, and outcomes.

This course provides a beginner-friendly introduction to big data integration and processing, designed to equip learners with practical skills using industry-standard tools. Through hands-on exercises, you'll learn to retrieve, manipulate, and analyze data from both relational and NoSQL databases, use integration platforms like Splunk and Datameer, and process large datasets on Hadoop and Spark. The course spans approximately 10–12 hours of content, divided into six modules, and includes assignments, discussions, and a final project. You’ll gain foundational experience applicable to real-world data engineering tasks, with lifetime access to materials and a certificate upon completion.

Module 1: Welcome

Estimated time: 1 hour

  • Introduction to big data integration and processing concepts
  • Setting up the learning environment using Docker
  • Working with Jupyter notebooks for hands-on exercises
  • Accessing course materials and navigating the platform

Module 2: Retrieving Big Data (Part 1)

Estimated time: 1 hour

  • Understanding relational databases in big data contexts
  • Connecting to PostgreSQL databases
  • Querying data using SQL in PostgreSQL
  • Retrieving structured data for analysis

Module 3: Retrieving Big Data (Part 2)

Estimated time: 2 hours

  • Introduction to NoSQL databases: MongoDB and Aerospike
  • Querying and aggregating data in MongoDB
  • Working with key-value data in Aerospike
  • Data manipulation using Pandas data frames

Module 4: Big Data Integration

Estimated time: 2 hours

  • Introduction to data integration concepts
  • Using Splunk for real-time data monitoring and analysis
  • Applying Datameer for large-scale data integration
  • Practical examples of integrating heterogeneous data sources

Module 5: Big Data Processing

Estimated time: 3 hours

  • Introduction to Hadoop for distributed data processing
  • Running processing tasks on Spark
  • Understanding when to use Hadoop vs. Spark
  • Hands-on exercises with big data processing workflows

Module 6: Final Project

Estimated time: 3 hours

  • Integrate data from PostgreSQL, MongoDB, and Aerospike
  • Process and aggregate data using Pandas and Spark
  • Submit a comprehensive report with insights and methodology

Prerequisites

  • Basic understanding of databases and data structures
  • Prior exposure to big data concepts (recommended)
  • Ability to install and configure Docker and virtual machines

What You'll Be Able to Do After

  • Retrieve and query data from relational and NoSQL databases
  • Manipulate and analyze large datasets using Pandas
  • Apply data integration tools like Splunk and Datameer
  • Execute big data processing tasks on Hadoop and Spark
  • Understand data integration needs in large-scale analytics
View Full Course Review

Course AI Assistant Beta

Hi! I can help you find the perfect online course. Ask me something like “best Python course for beginners” or “compare data science courses”.