# Day 1 The focus of **Day 1** is **vector addition**, with an **initial naive approach** and an **improved version**. --- ## Code Descriptions ### 1️`vector_add.cu` (Naive Vector Addition) - Implements **element-wise vector addition** (`C = A + B`). - Uses **global memory** and assigns **one thread per element**. ### 2️`vector_add_optim.cu` (Improved Vector Addition) - Uses an **improved approach with grid-stride loops** for better **memory access patterns**. - **Kernel optimizations:** - Uses `grid-stride loops` to ensure all elements are efficiently computed. - Launches the **initWith()** function on the **GPU** instead of running it on the CPU. - dynamically selects the **number of blocks** based on the number of streaming multiprocessors (SMs). - - **Result:** Better GPU utilization, reduced memory latency, and improved performance. --- ## Profiling and Running To compile and profile the CUDA codes, use: ``` nvcc -o compiled_code_name source_code.cu nsys profile --stats=true compiled_code_name ``` ## My Blog https://medium.com/@poornima31298/cuda-an-easy-introduction-for-beginners-5e897b62d1bd