computing-systems-212 / Lab 4: Optimizing Caches / task2 / ANALYSIS.txt
ANALYSIS.txt
Raw
For the implementation of tiling to optimizing both L1D and LLC caching, I introduced a tiling size of 8 for each loop variable.
A mathematical approach can explain why.
Since each tile is 8×8, and each element is 8B and it should be noted that the elements are accessed in cache line order
Some math here shoes that 8B × 8B × 8B = 512B.
We also have 3 tiles so that means 3 × 512 = 1536B.
Since the cache is 4096B, it is a safe choice since 4096B > 1536B.
My approach to implementation made it difficult to expand to larger tile sizes since L1D was limited to 4KB.

The first process I implemented with regard to optimizing cache for L1D misses was swapping loops.
Simply swapping the order of nested for loops in the .c file can allow for reduced miss rates when accessing memory.
The process focuses on taking advantage of temporal reuse and locality to access data in the order they are present in memory.
The loop interchange helped reduce L1D miss rate since by accessing i,j,k, it was best to access this backward as kk,jj,ii in the next three loops.
I determined this by using gdb and reviewing memory addresses, and utilizing cachegrind's instruction and detailed outputs.
I verified the logic of my findings by ultimately reviewing the total D accesses. In this case, D accesses increased, however,
that is acceptable since I also looped significantly more and the process of tiling involved splitting into submatrices, resulting in more accesses.

In addition, tiling can be a valuable approach to reducing misses drastically for any level.
This takes advantage of both temporal and spatial reuse since it involves meticulously organizing the nested loops to ensure recently used memory is accessed first.
It also involves spatial reuse as it involves loops incremented by specific amounts to make sure closest memory is accessed next.
The tiling followed the common-implemented style introduced in lecture, however, was organized following the choices used for loop interchanging.

==3582011== I   refs:      101,275,503
==3582011== I1  misses:            356
==3582011== LLi misses:            353
==3582011== I1  miss rate:        0.00%
==3582011== LLi miss rate:        0.00%
==3582011==
==3582011== D   refs:       37,006,279  (29,604,757 rd   + 7,401,522 wr)
==3582011== D1  misses:        275,861  (   275,733 rd   +       128 wr)
==3582011== LLd misses:        181,231  (   181,116 rd   +       115 wr)
==3582011== D1  miss rate:         0.7% (       0.9%     +       0.0%  )
==3582011== LLd miss rate:         0.5% (       0.6%     +       0.0%  )
==3582011==
==3582011== LL refs:           276,217  (   276,089 rd   +       128 wr)
==3582011== LL misses:         181,584  (   181,469 rd   +       115 wr)
==3582011== LL miss rate:          0.1% (       0.1%     +       0.0%  )