CSC401 Writing Assignment 2

Sample Solution to Writing Assignment 2, CSC401, Spring 2007

1. Trees

a. (Exercise C-2.20) Define the internal path length, I(T), of a tree T to be the sum of the depths of all the internal nodes in T. Likewise, define the external path length, E(T), of a tree T to the sum of the depths of all external nodes in T. Show that if T is a binary tree with n internal nodes, then E(T)= I(T) + 2n.

Solution: Proof by induction on the number of internal nodes, n.

Base case: n=1, I(T) = 0, E(T) = 2 = I(T) + 2n

Assumption: when 0< n < k, it holds.

Induction: when n = k > 1, partition the tree into three parts: the root, the left subtree L, the right subtree R. Assume L has kl internal nodes, and R has kr internal nodes, then k = kl + kr + 1 (the root). Since each nodes (internal and external) in L and R has one more length of the depth in T, and the number of external nodes is 1 plus the number of internal nodes for all binary tree,

        I(T) = (I(L) + kl) + (I(R) + kr);        E(T) = (E(L) + k1 + 1) + (E(R) + kr + 1).

Thus, E(T) = E(L) + E(R) + k + 1

                  = I(L) + 2kl + I(R) + 2kr + k + 1

                  = I(T) + kl + kr + k + 1

                  = I(T) + 2k.

b. (Exercise C-2.22) Let T be a tree with n nodes, and for any node in T, let d_v denote the depth of v in T. The distance between two nodes v and w in T is d_v+d_w-2d_u, where u is the LCA of v and w (where the LCA (the lowest common ancestor) u between two nodes v and w is defined as the lowest node in T that has both v and w as descendents, allowing a node to be a descendent of itself). The diameter of T is the maximum distance between two nodes in T. Describe an efficient algorithm for finding the diameter of T. What is the running time of your method?

Solution: Let d_uv be the diameter of T. First observe that both u and v are tree leaves; if not, we can find a path of higher length only by considering one of u’s or v’s children. Moreover, one of u and v has to be a leaf of highest depth. This can be proved by contradiction. Consider a tree and some leaf-to-leaf path P that does not include a leaf of highest depth. The consider a leaf of greatest depth v; it is always possible to find a new path P’ that starts at v and is longer than P. This yields the desired contradiction.

The main idea of the algorithm is to find a leaf v of highest depth and starting from this leaf to keep moving towards the tree's root. At each visited node u (including v, but excluding the tree's root), the height of u's sibling is computed and the current length L_max of the longest path in which v belongs is updated accordingly. When we are at T's root, L_max contains the diameter of T.

With this idea, an algorithm of time O(n) can be developed.

2. Heaps

a. (Exercise C-2.30) Show that the problem of finding the kth smallest element in a heap takes at least Ω(k) time in the worst case.

Hint: You should visit at least k nodes in a heap.

b. (Exercise C-2.32) Let T be a heap storing n keys. Give an efficient algorithm for reporting all the keys in T that are smaller than or equal to a given query key x (which is not necessarily in T). Note that the keys do not need to be reported in sorted order. Ideally, your algorithm should run in O(k) time, where k is the number of keys reported.

Solution: Starting at the root of the tree, recursively search the left and right subtrees if the root of the subtree has a key value less than x. This algorithm takes O(k) time, because there is no node in T storing a key larger than x that has a descendent storing a key less than x.

3. AVL and (2,4) Trees

a. (Exercise C-3.11) Let D be an ordered dictionary with n items implemented with an AVL tree. Show how to implement the following method for D in time O(logn):

countAllInRange(k₁, k₂): Compute and return the number of items in D

with key k such that k₁ ≤ k ≤ k₂.

Note that this method returns a single integer.

Hint: You will need to extend the AVL tree data structure, adding a new field to each internal node and ways of maintaining this field during updates.

Solution: For each node of the tree, maintain the size of the corresponding subtree, defined as the number of internal nodes in that subtree. While performing the search operation in both the insertion and deletion, the subtree sizes can be either incremented or decremented. During the rebalancing, care must be taken to update the subtree sizes of the three nodes involved (labeled a, b, and c by the restructure algorithm).
To calculate the number of nodes in a range (k1, k2), search for both k1 and k2, and let P1 and P2 be the associated search paths. Call v the last node common to the two paths. Traverse path P1 from v to k1. For each internal node w ≠ v encountered, if the right child of w is in not in P1, add one plus the size of the subtree of the child to the current sum. Similarly, traverse path P2 from v to k2. For each internal node w ≠ v encountered, if the left child of w is in not in P2, add one plus the size of the subtree of the left to the current sum. Finally, add one to the current sum (for the key stored at node v).

b. (Exercise C-3.14) Let T and U be (2,4) trees storing n and m items, respectively, such that all the items in T have keys less than the keys of all the items in U. Describe an O(logn + logm) time method for joining T and U into a single tree that stores all the items in T and U (destroying the old versions of T and U).

Solution: Assume, without loss of generality, that n ≠ m. Clearly, we can not use the general insertion/deletion tree operations to perform the joining; such a solution would take O(nlogn) time.

Let T and U have height ht and hu respectively. Our task is to "manually" join the two trees into one (2-4) tree V in logarithmic time. V must have all the properties of a )2-4) tree (that is, the size property, the depth property and the property of V being a multi-way search tree). The idea here is very simple.

Remove either the largest element of T or the minimum element of U, say element e. Without loss of generality, assume that after this removal ht >= hu. If ht=hu, create a new node v that stores e that was initially removed and make T's and U's root v's left and right children. If ht > hu, insert e into the rightmost node of thee T at height ht - hu - 1 and link this node to the root of tree U. If ht < hu, the approach is symmetric to the one described above.

Readily, the time complexity is O(logn). Only one element is removed and re-inserted; it can be located in time proportional to tree's height and the insertion and deletion operations take each O(logn) time. The height of a tree - if it is not stored separately - can be computed in O(logn) time.

The algorithm is omitted.