cuda-100-days / day1 / Readme.md
Readme.md
Raw

Day 1

The focus of Day 1 is vector addition, with an initial naive approach and an improved version.


Code Descriptions

1️vector_add.cu (Naive Vector Addition)

  • Implements element-wise vector addition (C = A + B).
  • Uses global memory and assigns one thread per element.

2️vector_add_optim.cu (Improved Vector Addition)

  • Uses an improved approach with grid-stride loops for better memory access patterns.
  • Kernel optimizations:
    • Uses grid-stride loops to ensure all elements are efficiently computed.
    • Launches the initWith() function on the GPU instead of running it on the CPU.
    • dynamically selects the number of blocks based on the number of streaming multiprocessors (SMs).
  • Result: Better GPU utilization, reduced memory latency, and improved performance.

Profiling and Running

To compile and profile the CUDA codes, use:

nvcc -o compiled_code_name source_code.cu
nsys profile --stats=true compiled_code_name

My Blog

https://medium.com/@poornima31298/cuda-an-easy-introduction-for-beginners-5e897b62d1bd