C/OptimizationMatrixChallenge - C/P4-WRITEUP.txt · AvaSmith-CodingSamples

                              ____________

                               P4 WRITEUP
                              ____________


- Name: (FILL THIS in)
- NetID: (THE kauf0095 IN kauf0095@umn.edu)

Answer the questions below according to the project specification. Write
your answers directly in this text file and submit it along with your
code.


PROBLEM 1: sumdiag_OPTM()
=========================

  Do your timing study on loginNN.cselabs.umn.edu


(A) Paste Source Code
~~~~~~~~~~~~~~~~~~~~~

  Paste a copy of your source code for the function `sumdiag_OPTM()'

  ####################### YOUR ANSWER HERE #########################
    // OPTIMIZED CODE HERE
  // Initialize Variables
  long mrow = mat->rows;
  long lenV = vec->len;
  int minus = mrow-1;
  matrix_t matM = *mat;
  vector_t vecX = *vec;
                                                             //Checking for bad length of Vector
  if(lenV != (mrow + minus)){
    printf("sumdiag_base: bad sizes\n");
    return 1;
  }
  for(int i=0; i<(lenV); i++){                               // initialize vector of diagonal sums
   VSET(vecX,i,0);                                          // to all 0s
  }

  int dtemp;                                                 //keeps track of current diagonal
  int j;
  for(int i =0; i<mrow; i++){                                //go through the 2D array sequentially, adds to correct diagnoal each time
    dtemp = minus-i;                                         //begining diagonal decreases each time
    for(j = 0; j<mrow-4; j+=4){ 
      VSET(vecX,dtemp, VGET(vecX, dtemp) + MGET(matM,i,j));  //gets the previous value in the sum for a diagonal and add the current num to it
      dtemp++;                                               //continue to next diagonal
      VSET(vecX,dtemp,VGET(vecX,dtemp) + MGET(matM,i,j+1));  //unrolled to increase pipeline efficency
      dtemp++;
      VSET(vecX, dtemp,VGET(vecX,dtemp) + MGET(matM,i,j+2));
      dtemp++;
      VSET(vecX,dtemp,VGET(vecX,dtemp)+ MGET(matM,i,j+3));
      dtemp++;
    }
    for(;j<mrow;j++){                                        //clean up if the rows/cols are not divisible by 4
      VSET(vecX,dtemp,VGET(vecX,dtemp) + MGET(matM,i,j));
      dtemp++;
    }
  }
  return 0; 
 
  ##################################################################
 


(B) Timing on loginNN.cselabs.umn.edu
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Paste a copy of the results of running `sumdiag_benchmark' on
  loginNN.cselabs.umn.edu (like login01, login02, ..., login06) in the
  space below which shows how your performance optimizations improved on
  the baseline codes.

  ####################### YOUR ANSWER HERE #########################
smi01868@csel-remote-lnx-01:/home/smi01868/CSCI2021/project/p4-code $ ./sumdiag_benchmark 5
==== Matrix Diagonal Sum Benchmark Version 3 ====
------ Tuned for csel-remote-lnx-NN machines --------
  SIZE       BASE       OPTM  SPDUP POINTS 
   512 1.2932e-02 2.2270e-03   5.81   0.63 
  1024 5.8467e-02 9.1880e-03   6.36   1.33 
  1101 6.6553e-02 1.0414e-02   6.39   1.44 
  2048 3.2850e-01 3.7550e-02   8.75   3.13 
  4099 1.5549e+00 1.5326e-01  10.15   6.69 
  6001 3.5301e+00 3.2744e-01  10.78  10.05 
  8192 6.7258e+00 6.1834e-01  10.88  13.77 
RAW POINTS: 37.05
TOTAL POINTS: 35 / 35

  ##################################################################


(C) Optimizations
~~~~~~~~~~~~~~~~~

  Describe in some detail the optimizations you used to speed the code
  up.  THE CODE SHOULD CONTAIN SOME COMMENTS already to describe these
  but in the section below, describe in English the techniques you used
  to make the code run faster.  Format your descriptions into discrete
  chunks such as.
        Optimization 1: Blah bla blah... This should make run
        faster because yakkety yakeety yak.

        Optimization 2: Blah bla blah... This should make run
        faster because yakkety yakeety yak.

        ...  Optimization N: Blah bla blah... This should make run
        faster because yakkety yakeety yak.
  Full credit solutions will have a least THREE optimizations and
  describe WHY these improved performance in at least a couple
  sentences.

  ####################### YOUR ANSWER HERE #########################
  Optimization 1: Re-ordering memory accsess. In the original function,
  the code jumped around the matrix when accsessing values. I utilized the 
  spacial quality of cache by going through the matrix sequentially rather that jumping around.
  The spacial quality places the elements in the matrix
  that are close to the one I am currently accsessing (for example Element X)  into
  the cache, and therefore it would be faster to accsess the values close to 
  element X as cache memory has minimal accsess time.
   Thus, going through the matrix sequentially utilizes cache memory 
  more than jumping through the matrix, results in more efficent code.

  Optimization 2: Creating variables for references. Originally, 
  every time the program needed to accsess data in the matrix or
  the vextor it would go through main memory to locate the value (for 
  example mat->row, mat->col). For the optimization, I put the value 
  m->row into a variable to increase accsess speed. The varibale will
  likely be put into a register; memory in a register can be 
  accsessed way faster than main memory as you dont have to travel as far
  to get it. Thus, creating variables for repeated references results in more 
  efficent code.

  Optimization 3: Removing function calls. In the original method,
  the program used the method mget(), vget(), vset(). To optimize this code
  I changed each function call to their macro counterparts VGET(), VSET(),
  and MGET(). Reducing function calls increases performance as it eliminates
  barriers to natural compiler organizations. Thus, removing function calls
  results in more efficent code.
  ##################################################################


PROBLEM 2: Timing Search Algorithms
===================================

  Do your timing study on loginNN.cselabs.umn.edu. In most cases, report
  times larger than 1e-03 seconds as times shorter than this are
  unreliable. Run searches for more repetitions to lengthen run times.


(A) Min Size for Algorithmic Differences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Determine the size of input array where one starts to see a measurable
  difference in the performance of the linear and logarithmic
  algorithms.  Produce a timing table which includes all algorithms
  which clearly demonstrates an uptick in the times associated with some
  while others remain much lower.  Identify what size this appears to be
  a occur.

  SHOW A TIMING TABLE to support your conclusions and ensure that the
  times reported are larger that 1e-03.

  ####################### YOUR ANSWER HERE #########################
smi01868@csel-remote-lnx-04:/home/smi01868/CSCI2021/project/p4-code $ ./search_benchmark 1 15 4
    SIZE  NSEARCH          array           list         binary           tree
       2       16     3.0000e+00     3.0000e+00     3.0000e+00     3.0000e+00
       4       32     3.0000e+00     3.0000e+00     3.0000e+00     3.0000e+00
       8       64     4.0000e+00     4.0000e+00     4.0000e+00     5.0000e+00
      16      128     8.0000e+00     8.0000e+00     8.0000e+00     8.0000e+00
      32      256     2.2000e+01     2.4000e+01     1.8000e+01     1.9000e+01
      64      512     7.4000e+01     8.7000e+01     4.1000e+01     4.1000e+01
     128     1024     2.8200e+02     3.5900e+02     9.4000e+01     9.8000e+01
     256     2048     1.0650e+03     8.5700e+02     7.5000e+01     7.0000e+01
     512     4096     1.4240e+03     1.8540e+03     1.5700e+02     1.3200e+02
    1024     8192     5.5820e+03     9.3400e+03     3.1700e+02     2.8600e+02
    2048    16384     2.2020e+04     8.9738e+04     6.6600e+02     7.0500e+02
    4096    32768     8.7143e+04     4.7379e+05     1.3870e+03     1.3860e+03
    8192    65536     3.4963e+05     2.4436e+06     2.8180e+03     3.0020e+03
   16384   131072     1.3946e+06     1.6408e+07     6.1480e+03     6.5960e+03
   32768   262144     5.6127e+06     8.1282e+07     1.2150e+04     1.3252e+04

  Around size  8192 and 65536 number of searches, 
  there seems to be a great uptick in bothe the array and list
  while binary and tree relatively stay the same. In the table above, at 8192,
  both binary and tree stay around the +03 range, while array and list
  continue to uptick in time as they jump from +04 -> +05 -> +06
  ##################################################################


(B) Linear Search in List vs Array
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Determine whether the linear array and linked list search remain
  approximately at the same performance level as size increases to large
  data or whether one begins to become favorable over other. Determine
  the approximate size at which this divergence becomes obvious. Discuss
  reasons WHY this difference arises.

  SHOW A TIMING TABLE to support your conclusions and ensure that the
  times reported are larger that 1e-03.


  ####################### YOUR ANSWER HERE #########################
smi01868@csel-remote-lnx-04:/home/smi01868/CSCI2021/project/p4-code $ ./search_benchmark 1 30 2 la ll
    SIZE  NSEARCH          array           list
       2        8     3.0000e+00     3.0000e+00
       4       16     3.0000e+00     3.0000e+00
       8       32     3.0000e+00     4.0000e+00
      16       64     5.0000e+00     6.0000e+00
      32      128     1.4000e+01     1.6000e+01
      64      256     3.8000e+01     4.5000e+01
     128      512     1.4100e+02     1.8100e+02
     256     1024     5.2700e+02     7.0900e+02
     512     2048     1.6220e+03     9.6500e+02
    1024     4096     2.8880e+03     4.5830e+03
    2048     8192     1.1410e+04     5.0843e+04
    4096    16384     4.3753e+04     2.2121e+05
    8192    32768     1.7460e+05     1.2309e+06
   16384    65536     6.9682e+05     8.2710e+06
   32768   131072     2.7883e+06     4.0188e+07
   65536   262144     1.1197e+07     1.0418e+08
  131072   524288     4.4754e+07     2.7049e+08
   As the size increases, the array becomes more favorable than the list.
   this diverge approximately becomes more obvious at around 16384 number of searchers and  4096 size ,
  when the array is at a 4.+04, and the list is at 2.+05.

   In the linnear search array, the method uses brackets[] to accsess the information
   in the array. Ex: array[i] == query
   On the other hand, in the linkedlist_search method, the method uses -> to 
   accsess the information in the linked list Ex: int data = n->data;
   When using ->, that information is likely stored in main memory, which takes a
   long time to accsess. Where with the linnear search array, due to the spacial
   mechanic of cache, there is a high chance that the next value in the array is
   in the cache which has fast memory accsessing. Therefore, the difference in how
   the two methods are getting their memory, via cache sometimes for array and via
   main memory for linked lists, results in the diverging performance.
  ##################################################################


(C) Binary Search in Tree vs Array
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Compare the binary array search and binary tree search on small to
  very large arrays. Determine if there is a size at which the
  performance of these two begins to diverge. If so, describe why this
  might be happening based on your understanding of the data structures
  and the memory system. If not, describe why you believe there is
  little performance difference between the two.

  SHOW A TIMING TABLE to support your conclusions and ensure that the
  times reported are larger that 1e-03.

  ####################### YOUR ANSWER HERE #########################
smi01868@csel-remote-lnx-04:/home/smi01868/CSCI2021/project/p4-code $ ./search_benchmark 1 100 900  ba bt
    SIZE  NSEARCH         binary           tree
       2     3600     3.2000e+01     3.4000e+01
       4     7200     7.2000e+01     8.1000e+01
       8    14400     1.9500e+02     1.9600e+02
      16    28800     4.4000e+02     4.4100e+02
      32    57600     1.0630e+03     1.0430e+03
      64   115200     9.9100e+02     9.0000e+02
     128   230400     2.3270e+03     4.6440e+03
     256   460800     1.2398e+04     1.3786e+04
     512   921600     3.1359e+04     2.6952e+04
    1024  1843200     6.8069e+04     6.4004e+04
    2048  3686400     1.5046e+05     1.4501e+05
    4096  7372800     3.2167e+05     3.2753e+05
    8192 14745600     6.4016e+05     6.6616e+05
   16384 29491200     1.3187e+06     1.4061e+06
   32768 58982400     2.7361e+06     3.0113e+06
   65536117964800     5.7120e+06     7.7553e+06
  131072235929600     1.1749e+07     2.1240e+07
  262144471859200     2.4283e+07     5.0962e+07
  524288943718400     5.0166e+07     1.1486e+08
10485761887436800     1.0387e+08     2.4530e+08

    With very large numbers inputted, there was a very slight divergence
    in the performance between the binary array and binary tree. This perfomance
    difference may be do to the cache. Elements in an array are more likley to be
    stored in cache from caches spacial mechanic. Therefore elemets in an array
    will have realitively fast accsess since they are in cache memory. With trees,
    there memory is not as likley to be in cache since you have to accses it using
    arrows ->. These arrows signify you are getting data from main memeory which can
    be much slower than getting values from cache memory. Thus, the divergence
    seen in the performance between the bianry array and tree is due to the 
    area in which values can be accsesed for both.
  ##################################################################


(D) Caching Effects on Algorithms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  It is commonly believed that memory systems that feature a Cache will
  lead to arrays performing faster than linked structures such as Linked
  Lists and Binary Search Trees. Describe whether your timings confirm
  or refute this belief.  Address both types of algorithms in your
  answer:
  - What effects does Cache have on Linear Search in arrays and lists
    and why?
  - What effects does Cache have on Binary Search in arrays and trees
    and why?

  ####################### YOUR ANSWER HERE #########################
  On linnear search in arrays and lists, due to the spacial mechanism with cache,
  more items in an array are likely to be in cache, which results in faster memory
  accsess. With a linked list, you always need to use an -> to get the data from the 
  node. This accsess is much slower than cache memory accsess since its stores in
  main memory.

  Similarly to the arrays and lists, caches spacial mechanics cause more items in
  an array to likely be in the cache, resulting in faster memory accsess. With 
  a binary search tree, to traverse through the tree and get data you must use ->
  which is slower than cache accsess since you are getting informaiton from
  main memory.

  Therefore overall, my timings confirm the belief that memory systems that
  feature to cache will make array sturtcures perform faster than linked structures
  as the values in an array are more likely to be in cache than the 
  values in a linked structure.
  ##################################################################


(E) OPTIONAL MAKEUP CREDIT
~~~~~~~~~~~~~~~~~~~~~~~~~~

  If you decided to make use of a table of function pointers/structs
  which is worth makeup credit, describe your basic design for this
  below.

  ####################### YOUR ANSWER HERE #########################
  I created a struct that has the following fields:
    name : to check is the pointer is null
    func: holds the address to the search function
    create: holds the address to the make_evens function for that particular types
    destroy: holds the address to the free funciton for that particular type

    when checking the arguments inputted into main, I would add the types particular
    function pointer struct to an array holding all of the function poiters I would
    need for that specific run of code. Then, I took the code structure from one
    of my original program to alter to use function pointers instead.

    One major difference is that there is an extra for loop used to 
    iterate through the index's of the array of function pointers. This for loop
    lands inside the for loop for (maxpow-minpow), since we want to call each 
    seperate funciton once per output line on the console.

    Then at any time during the method in which there was a function call,
    I replaced it with the Function[indexOfFunctionArray].func_field
    in addition to this, instead of having typed varibales for the tree/list/array,
    I used a void pointer so that it could generalize to all three types of structures.

    After making these few small tweaks, I added in the clock methods
    to time the function calls, and printed the results in the same
    fasion as I did originally.
    
  ##################################################################