Bioinformatics Algorithms Course Syllabus
Full curriculum breakdown — modules, lessons, estimated time, and outcomes.
Overview: This course offers a hands-on introduction to core bioinformatics algorithms, guiding you through sequence analysis, genome assembly, and evolutionary inference using real biological data and Python implementations. With approximately 60-70 hours of content across 8 modules, each taking about a week to complete, you'll build algorithmic understanding while gaining practical skills in parsing biological data, performing alignments, assembling genomes, and reconstructing phylogenetic trees. The course concludes with a capstone project integrating multiple techniques into a functional genomics workflow.
Module 1: Introduction to Bioinformatics & Sequence Data
Estimated time: 8 hours
- Biological sequence formats (FASTA, FASTQ)
- Scoring matrices (PAM, BLOSUM)
- Parsing DNA/RNA sequences from FASTA files
- Computing simple sequence similarity scores
Module 2: Pairwise Alignment with Dynamic Programming
Estimated time: 8 hours
- Global alignment using Needleman–Wunsch algorithm
- Local alignment using Smith–Waterman algorithm
- Implementation of affine gap penalties
- Python implementation of alignment algorithms
Module 3: Heuristic Alignment & BLAST
Estimated time: 8 hours
- Overview of the BLAST algorithm
- Word-size seeding and high-scoring segment pairs (HSPs)
- Using Biopython for BLAST searches
- Parsing BLAST output from custom databases
Module 4: Multiple Sequence Alignment
Estimated time: 8 hours
- Progressive alignment methods (ClustalW)
- Iterative refinement techniques
- Consistency-based alignment strategies
- Visualizing conserved motifs across aligned proteins
Module 5: Genome Assembly Algorithms
Estimated time: 8 hours
- Overlap–layout–consensus approach
- De Bruijn graph-based assembly
- Error correction in sequencing reads
- Building de Bruijn graphs and extracting contigs
Module 6: Hidden Markov Models in Bioinformatics
Estimated time: 8 hours
- Components of hidden Markov models (HMMs)
- Viterbi and forward–backward algorithms
- Profile HMMs for protein family detection
- Training an HMM for gene prediction on bacterial sequences
Module 7: Phylogenetic Inference & Tree Reconstruction
Estimated time: 8 hours
- Distance-based methods: UPGMA and neighbor-joining
- Character-based methods: maximum parsimony and maximum likelihood
- Constructing phylogenetic trees from aligned sequences
- Using scikit-bio for tree comparison and visualization
Module 8: Advanced Topics & Capstone Project
Estimated time: 10 hours
- Sequence clustering techniques
- Basics of variant calling
- Scalable algorithms for big genomic data
- End-to-end annotation of a draft bacterial genome
- Detection of gene models and variant sites
Prerequisites
- Basic proficiency in Python programming
- Familiarity with fundamental biological concepts (DNA, RNA, proteins)
- Understanding of basic data structures and algorithms
What You'll Be Able to Do After
- Implement core bioinformatics algorithms from scratch in Python
- Analyze biological sequences using dynamic programming and heuristic methods
- Assemble genomes using de Bruijn graph approaches
- Apply hidden Markov models for gene and protein family prediction
- Reconstruct and interpret phylogenetic trees from sequence data