This structured learning path guides you through the essential steps required to become proficient in CUDA programming, starting from foundational programming knowledge to advanced GPU computing concepts. The path emphasizes building a strong base in programming, understanding data structures, mastering C++, and diving into GPU architecture and CUDA-specific optimizations. Resources include both English and Polish materials, offering flexibility based on language preference.
-
C Programming:
Begin with C programming if you are unfamiliar with it. A solid understanding of C is mandatory before transitioning to C++ programming.- 🇵🇱 Podstawy programowania. Język C
- 🇵🇱 Zaawansowane programowanie w języku C
- The C Programming Language (ANSI C) by Brian Kernighan and Dennis Ritchie
-
Data Structures:
Learn essential data structures and algorithms, a prerequisite for effective problem-solving and programming.- 🎥C++ Data Structures & Algorithms + LEETCODE Exercises
- Data Structures and Algorithms -> Leetcode
- 🇵🇱 📖 Algorytmy, struktury danych i techniki programowania by Paweł Wróblewski
- 🇵🇱 📖 C++. Algorytmy i struktury danych by Adam Drozdek
-
C++ Programming:
Master C++ programming as it serves as a foundation for CUDA development.- 🎥 Beginning C++ Programming - From Beginner to Beyond
- 🇵🇱 🎥 C++ od Podstaw do Eksperta
- 🇵🇱 📖 Opus magnum C++11
- 🇵🇱 📖 Język C++ Kompendium Wiedzy by Bjarne Stroustrup
- 🎥 Back to Basics
- 📖Modern C++ Tutorial: C++11/14/17/20 On the Fly
-
Parallel Computing:
Understand the basics of parallel computing and modern hardware architectures. -
CUDA Programming:
Dive into CUDA, learning GPU programming techniques, optimizations, and advanced performance tuning.- 🇵🇱 🎥 GPU Programming
- CUDA C++ Programming Guide
- 🇵🇱 CUDA - Tomasz Jasiukiewicz
- 🎥 CUDA Parallel Programming on NVIDIA GPUs - HW and SW
- CUDA Samples
- 🎥 CUDA Programming Course – High-Performance Computing with GPUs
- 📖 Programming Massively Parallel Processors by David B. Kirk, Wen-mei W. Hwu
- 🎥 CUDA training series
-
Triton:
Explore the Triton framework for GPU programming with efficient performance. -
GPU Architecture and Glossary:
Familiarize yourself with GPU architecture and terminology to deepen your understanding of hardware capabilities.
This comprehensive learning path equips you with the skills needed to progress from foundational programming to advanced CUDA development, paving the way for a career in GPU-accelerated computing.
This section focuses on understanding the fundamentals and optimization of matrix multiplication (Matmul), a cornerstone operation in CUDA programming and high-performance computing (HPC). The provided resources cover both CPU implementations and GPU optimizations, including the use of Tensor Cores on architectures like Ampere and Ada. These materials are essential for building a strong foundation in writing optimized CUDA code.
- Matmul on CPU: Analysis of efficient matrix multiplication implementations on CPUs, with detailed examples of optimizations:
- CUDA Matmul Optimizations:
- Theory and Basics:
These resources provide a comprehensive theoretical and practical foundation in matrix multiplication, enabling you to master CUDA learning and better understand algorithm optimization in GPU environments.
This section provides a curated collection of resources for learning, exploring, and mastering GPU programming. It covers various aspects of GPU development, including community engagement, architectural insights, tutorials, example implementations, benchmarking, and advanced tools. These resources cater to developers at different expertise levels, offering a pathway to build and optimize high-performance GPU applications.
Engage with fellow developers and experts in the field of GPU programming:
Understand the underlying architecture of GPUs to optimize code efficiently:
Learn the practical aspects of GPU programming with these tutorials:
Comprehensive courses to deepen your GPU programming skills:
Explore video tutorials and insights on GPU programming:
- Programming Massively Parallel Processors
- Simon Oz - GPU Programming
- CUDA Programming
- George Hotz Archive
Explore real-world examples and implementations:
- llm.c
- Fast LLM Inference From Scratch
- MNIST CUDA
- Softmax
- YALM: LLM Inference in C++/CUDA
- llm.cpp: Training and Inference
- CUTLASS Tutorial: Fast Matrix Multiplication with WGMMA on Hopper GPUs
Track performance and benchmarks of GPU kernels:
Compare GPU performance and analyze benchmarks:
- MI300X vs H100 vs H200 Training Benchmarks
- Forecasting GPU Performance
- Benchmarking Nvidia Hopper GPU Architecture
- Maximum Achievable Matmul FLOPS Finder
Understand key HPC algorithms like matrix multiplication:
Insights into GPU performance and its nuances:
- The GPU is Not Always Faster
- Series of articles explaining GPU programming -> Demystifying CPUs and GPUs: What You Need to Know, How the way a computer works, Terminology in parallel programming, Hello world Cuda-C, The operational mechanism of CPU-GPU, Memory Types in GPU, Using GPU memory, Synchronization and Asynchronization, Unified memory, Pinned memory, Streaming, Data Hazard, Warp Scheduler, Global Memory Coalescing, Atomic Function,Bandwidth — Throughput — Latency,Occupancy in GPU Part 1, Occupancy in GPU Part 2
Explore CUDA-based frameworks for specific use cases:
Explore state-of-the-art research in GPU programming:
Useful tools for tuning and analyzing GPU performance:
This resource list offers a comprehensive set of tools, tutorials, and materials to help developers advance their GPU programming expertise, from beginner to professional levels.