- Develop parallel code using OpenMP for multithreading.
- Implement MPI programs for distributed-memory communication.
- Write GPU kernels using CUDA for accelerated computation.
- Test performance and speedup for different architectures.
- Combine knowledge for full-stack HPC application workflows.

