cuda-100-days / day3
Readme.md

Day 3

The focus of Day 3 is tiled matrix multiplication


Code Descriptions

1️mat_mul_tiled.cu (Improved matrix multiplication)

  • Uses shared memory to store tiles of matrices.
  • Reduces global memory access latency.
  • Uses tiling and thread-level parallelism to optimize computation.

Profiling and Running

To compile and profile the CUDA codes, use:

nvcc -o compiled_code_name source_code.cu
nsys profile --stats=true compiled_code_name

Resources referred for matrix mul:

Youtube video by OMean1Sigma https://www.youtube.com/watch?v=QmKNE3viwIE&t=172s

Youtube video by OMean1Sigma https://www.youtube.com/watch?v=Q3GgbfGTnVc&t=313s