# Day 3 The focus of **Day 3** is **tiled matrix multiplication** --- ## Code Descriptions ### 1️`mat_mul_tiled.cu` (Improved matrix multiplication) - Uses **shared memory** to store tiles of matrices. - Reduces global memory access latency. - Uses **tiling and thread-level parallelism** to optimize computation. ## Profiling and Running To compile and profile the CUDA codes, use: ``` nvcc -o compiled_code_name source_code.cu nsys profile --stats=true compiled_code_name ``` ## Resources referred for matrix mul: Youtube video by OMean1Sigma https://www.youtube.com/watch?v=QmKNE3viwIE&t=172s Youtube video by OMean1Sigma https://www.youtube.com/watch?v=Q3GgbfGTnVc&t=313s