Assignment 2 of CSC148 (Intro to Computer Science) University of Toronto 2023 A2 - Treemaps
Learning goals After completing this assignment, you will be able to:
Introduction
As we’ve discussed in class, trees are a fundamental data structure used to model hierarchical data. Often, a tree represents a hierarchical categorization of a base set of data, where the leaves represent the data values themselves, and internal nodes represent groupings of this data. Files on a computer can be viewed this way: regular files (e.g., PDF documents, video files, Python source code) are grouped into directories, which are then grouped into larger directories, etc.
Sometimes the data in a tree has some useful notion of size. For example, in a tree representing the departments of a company, the size of a node could be the dollar amount of all sales for that department, or the number of employees in that department. Or in a tree representing a computer’s file system, the size of a node could be the size of the file.
A treemap is a visualisation technique to show a tree’s structure according to the weights (or sizes) of its data values. It uses rectangles to show subtrees, scaled to reflect the proportional sizes of each piece of data.
Treemaps are used in many contexts. Among the more popular uses are news headlines, various kinds of financial information, and computer disk usage.
For this assignment, you will write an interactive treemap visualisation tool that you can use to visualize hierarchical data.
It will have a general API (implemented with inheritance, naturally!) and you will define specific subclasses that will allow you to visualize different kinds of data.
Task 1: TMTree basics In this task, you will implement:
The TMTree class is similar to the basic Tree class that you have already worked with, but with some differences. A few of the most conceptually important ones are discussed here.
First, the class has a data_size instance attribute that is used by the treemap algorithm to determine the size of the rectangle that represents this tree. data_size is defined recursively as follows:
Second, the class not only stores its subtrees, but it also stores its parent as an attribute. This allows us to traverse both up and down the tree, which you will see is necessary when we want to mutate the tree.
Third, we’ll also track some information inside our tree related to how to display it. More on this shortly.
Fourth, we don’t bother with a representation for an empty TMTree, as we’ll always be visualizing non-empty trees in this assignment.
To get started, you should do the following:
In the subsequent parts of this assignment, your program will allow the user to interact with the visualizer to change the way the data is displayed to them. At any one time, parts of the tree will be displayed and others not, depending on what nodes are expanded.
We’ll use the terminology displayed-tree to refer to the part of the data tree that is displayed in the visualizer. It is defined as follows:
We will require that (1) if a tree is expanded, then its parent is expanded, and (2) if a tree is not expanded, then its children are not expanded. Note that this requirement is naturally encoded as a representation invariant in the TMTree class.
Note: the displayed-tree is not a separate tree! It is just the part of the data tree that is displayed.
Task 2: The Treemap algorithm
In this task, you will implement:
The next component of our program is the treemap algorithm itself. It operates on the data tree and a 2-D window to fill with the visualization, and generates a list of rectangles to display, based on the tree structure and data_size attribute for each node.
For all rectangles in this program, we’ll use the pygame representation of a rectangle, which is a tuple of four integers (x, y, width, height), where (x, y) represents the upper-left corner of the rectangle. Note that in pygame, the origin is in the upper-left corner and the y-axis points down. So, the lower-right corner of a rectangle has coordinates (x + width, y + height). Each unit on both axes is a pixel on the screen.
We are now ready to see the treemap algorithm.
Note: We’ll use “size” to refer to the data_size attribute in the algorithm below.
Input: An instance of TMTree, which is part of the displayed-tree, and a pygame rectangle (the display area to fill).
Output: A list of pygame rectangles and each rectangle’s colour, where each rectangle corresponds to a leaf in the displayed-tree for this data tree, and has area proportional to the leaf’s data_size.
Treemap Algorithm:
Note I: Unless explicitly written as “displayed-tree”, all occurrences of the word “tree” below refer to a data tree.
Note II: (Very Important) The overall algorithm, as described, actually corresponds to the combination of first calling TMTree.update_rectangles to set the rect attribute of all nodes in the tree and then later calling TMTree.get_rectangles to actually obtain the rectangles and colours corresponding to only the displayed-tree’s leaves.
If the tree is a leaf in the displayed-tree, return a list containing just a single rectangle that covers the whole display area, as well as the colour of that leaf.
Otherwise, if the display area’s width is greater than its height: divide the display area into vertical rectangles (by this, we mean the x-coordinates vary but the y-coordinates are fixed), one smaller rectangle per subtree of the displayed-tree, in proportion to the sizes of the subtrees (don’t use this tree’s data_size attribute, as it may actually be larger than the sizes of the subtrees because of how we defined it!)
Example: suppose the input rectangle is (0, 0, 200, 100), and the displayed-tree for the input tree has three subtrees, with sizes 10, 25, and 15.
The first subtree has 20% of the total size, so its smaller rectangle has width 40 (20% of 200): (0, 0, 40, 100). The second subtree should have width 100 (50% of 200), and starts immediately after the first one: (40, 0, 100, 100). The third subtree has width 60, and takes up the remaining space: (140, 0, 60, 100). Note that the three rectangles’ edges overlap, which is expected.
If the height is greater than or equal to the width, then make horizontal rectangles (by this, we mean the y-coordinates vary, but the x-coordinates are fixed) instead of vertical ones, and do the analogous operations as in step 2.
Recurse on each of the subtrees in the displayed-tree, passing in the corresponding smaller rectangle from step 2 or 3.
Note about rounding: because you’re calculating proportions here, the exact values will often be floats instead of integers. For all but the last rectangle, always truncate the value (round down to the nearest integer). In other words, if you calculate that a rectangle should be (0, 0, 10.9, 100), “round” (or truncate) the width down to (0, 0, 10, 100). Then a rectangle below it would start at (0, 100), and a rectangle beside it would start at (10, 0).
However, the final (rightmost or bottommost) edge of the last smaller rectangle must always be equal to the outer edge of the input rectangle. This means that the last subtree might end up a bit bigger than its true proportion of the total size, but that’s okay for this assignment.
Important followup to Note II above: You will implement this algorithm in the update_rectangles method in TMTree. Note that rather than returning the rectangles for only the leaves of the displayed-tree, the code will instead set the rect attribute of each node in the entire tree to correspond to what its rectangle would be if it were a leaf in the displayed-tree. Later, we can retrieve the rectangles for only the leaf nodes of the displayed-tree by using the get_rectangles method.
Note: Sometimes a TMTree will be sufficiently small that its rectangle has very little area (it has a width or height very close to zero). That is to be expected, and you don’t need to do anything special to deal with such cases. Just be aware that sometimes you won’t be able to actually see all the displayed rectangles when running the visualizer, especially if the window is small.
Tips: For the recursive step, a good stepping stone is to think about when the tree has height 2, and each leaf has the same data_size value. In this case, the original rectangle should be partitioned into equal-sized rectangles.
Warning: make sure to follow the exact rounding procedure in the algorithm description above. If you deviate from this, you might get slightly incorrect rectangle bounds.
Important: The docstring of the get_rectangles method refers to the displayed-tree. For now, the whole tree is the displayed-tree, as all trees start expanded. Make sure that you come back later and further test this method on trees where the displayed-tree is a subset of the full data tree to ensure that your code is fully correct.
Task 3: Selecting a tree In this task, you will implement:
The first step to making our treemap interactive is to allow the user to select a node in the tree. To do this, complete the body of the get_tree_at_position method. This method takes a position on the display and returns the displayed-tree leaf corresponding to that position in the treemap.
IMPORTANT: Ties should be broken by choosing the rectangle on the left for a vertical boundary, or the rectangle above for a horizontal boundary. That is, you should return the first leaf of the displayed-tree that contains the position, when traversing the tree in the natural order.
In the visualizer, the user can click on a rectangle to select it. The text display updates to show two things:
Clicking on the same rectangle again unselects it. Clicking on a different rectangle selects that one instead.
Reminder: Each rectangle corresponds to a leaf of the displayed-tree, so we can only select leaves of the displayed-tree.
Task 4: Making the display interactive
In this task, you will implement:
Now that we can select nodes, we can move on to making the treemap more interactive.
In addition to displaying the treemap, the pygame graphical display responds to a few different events (by event, we mean the user presses a key, hovers, or clicks on the window). The first such event was actually clicking on a rectangle to select it, which we implemented in the previous task. We will now implement the rest of the actions associated with events, by implementing methods in the TMTree class which the visualizer calls:
The next four events change which rectangles are included in the displayed-tree.
The following two events allow the user to actually mutate the data tree, and see the changes reflected in the display.
Details:
Very Important: Whenever you modify a tree’s data_size, the data_size attributes of all its ancestors need to be updated too! You must make use of the parent_tree attribute to do this: it’s a widely-used technique for traversing up a tree (this should feel a lot like traversing a linked list!).
Task 5: File System Data In this task, you will implement:
You will also:
Consider a directory on your computer (e.g., the assignments directory you have in your csc148 directory). It contains a mixture of other directories (“subdirectories”) and some files; these subdirectories themselves contain other directories, and even more files! This is a natural hierarchical structure, and can be represented by a tree.
The _name attribute will always store the name of the directory or file (e.g. preps or prep8.py), not its path; e.g., /Users/courses/csc148/preps/prep8/prep8.py.
The data_size attribute of a file is simply how much space (in bytes) the file takes up on the computer. Note: to prevent empty files from being represented by a rectangle with zero area, we will always add a +1 to file sizes. The data_size of a directory corresponds to the size of all files contained in the directory, including its subdirectories. As with files, we’ll also add a +1 to directory sizes for this assignment.
In our code, we will represent this filesystem data using the FileTree and DirectoryTree classes.
To complete step 1, you will need to read the Python os module documentationLinks to an external site. to learn how to get data about your computer’s files in a Python program. You may also find the provided print_dirs.py to be a useful example to base your implementation on.
We recommend using the following functions:
Note: Since the order that os.listdir returns file names is dependent on your operating system, we have provided a helper, ordered_listdir which you MUST use in your implementation. You MUST add the subtrees in the order they are produced from ordered_listdir.
You may NOT use the os.path.walk function — it does a lot of the recursive work for you, which defeats the purpose of this part!
Warning: Make sure to either use os.path.join (recommended) or make use of string concatenation and the os.sep string to build larger paths from smaller ones, because different operating systems use different separators for file paths.
Requirements for step 3:
Task 6: Chess! In this task, you will implement:
You will also:
Our last dataset consists of games extracted from a database of chess games played by WGMs. While you don’t have to know the exact rules of chess for this part, there are a few important things to note:
Similar to the filesystem data initialization process, we will again have you do this in two steps:
A few important notes about the structure:
We also need to update the functionality of our ChessTree class to reflect what events should be ignored for this new kind of data. In particular, given the interpretation of the nodes, we don’t want to support either changing the sizes of nodes or moving nodes. As you did with the file system related subclasses, appropriately override the relevant methods in a way consistent with the provided code.
This project was completed on Pycharm. This project was given a 78.52% grading (class avg: 77.19%)