Day 1
The focus of Day 1 is vector addition, with an initial naive approach and an improved version.
Code Descriptions
1️vector_add.cu
(Naive Vector Addition)
- Implements element-wise vector addition (
C = A + B
).
- Uses global memory and assigns one thread per element.
2️vector_add_optim.cu
(Improved Vector Addition)
- Uses an improved approach with grid-stride loops for better memory access patterns.
- Kernel optimizations:
- Uses
grid-stride loops
to ensure all elements are efficiently computed.
- Launches the initWith() function on the GPU instead of running it on the CPU.
- dynamically selects the number of blocks based on the number of streaming multiprocessors (SMs).
- Result: Better GPU utilization, reduced memory latency, and improved performance.
Profiling and Running
To compile and profile the CUDA codes, use:
nvcc -o compiled_code_name source_code.cu
nsys profile --stats=true compiled_code_name
My Blog
https://medium.com/@poornima31298/cuda-an-easy-introduction-for-beginners-5e897b62d1bd