Abstract This paper defines M-Trees, describes their characteristics, shows the relationship between M-Trees and conventional Binary Trees, then presents a Pascal program for constructing an M-Tree. The algorithm in this paper can be used to build, efficiently, an M-balanced Binary Tree given ordered input. Trees A tree, as referenced in this paper, is one of the fundamental abstractions in Computer Science. When drawing a tree, one usually abandons natures time honored tradition and places the root of the tree at the top, with the leaves at the bottom. The term 'node' is used to designate an item in the tree that contains information. A node is a record that contains an information portion and two or more link portions. Formally, a tree is defined to be: 0 or more nodes, such that, the link portions of the nodes point to roots of sub-trees. In the purest sense, a given node in a tree may have 0 or more sub-trees. A tree is usually referred to as an n-ary tree, indicating that there may be as many as N subtrees coming off of a given node (where N >= 2). binary trees Most definitions of trees refer to binary trees (little b, little t). They include a definition of a node that is usually quite similar to the following Pascal description: type node_ptr = ^ node; type node = record left_link : node_ptr; information_portion : any_type; right_link : node_ptr end Note that the definition is directly recursive. In fact, the link fields point to other nodes (which are themselves roots of subtrees). Also note, that there are two (2) link fields, therefore, there will be (at most) two (2) subtrees coming off of a given node. This type of tree is also known as a bifurcating arborescence (bifurcate means to divide into halves, and an arborescence is a tree-like structure). Binary Trees This paper differentiates between binary trees (lower case b and lower case t) and Binary Trees (upper case B and upper case T). A Binary Tree has the same structure as a binary tree, but the definition incorporates an additional rule. For any given node (x), all the the information portions of all of the nodes in the left-subtree of x are less than the information portion of x, and all the the information portions of all of the nodes in the right-subtree of x are greater than the information portion of x. (Nodes whose information portion are equal to the information portion of X can be on the left OR on the right, just as long as the rule is consistently applied. For this paper, nodes, whose information portions are equal, will not be considered.) Height of trees The height of a tree is defined to be the number of links from the root of the tree to the bottom (i.e. a leaf node) of the tree FOLLOWING THE LONGEST PATH. Balance A subtree is said to be balanced, within a factor of K, if, for all nodes X, the height of the left-subtree of X differs from the height of the right-subtree of X by, at most, K. For AVL trees, K is unity (1). A prefectly balanced tree is a tree with the following property: The height of the left-subtree of X is equal to the height of the right-subtree of X, for all nodes X. M-Trees An M-Tree is, first of all, a Binary Tree. It's uniqueness is in it's structure. A tree is said to be M-Balanced, if, and only if, the following conditions are true: 1) The left-subtree of the root is perfectly balanced. 2) The right-subtree of the root contains whatever nodes are left over after balancing the left subtree, and 3) The height of the right-subtree of the root is less than or equal to the height of the left-subtree of the root. In a typical M-Tree, the left-subtree of the root contains the majority of the nodes, and the right-subtree of the root is a 'stub of a subtree' containing just a few nodes. Formally, given M nodes to be entered into the tree, and given N (which is the largest power of 2 <= N), N-1 nodes will be in the left-subtree of the root, and M-N-1 nodes will be in the right-subtree of the root. The algorithm The algorithm, recursively coded in Pascal, turned out to be surprisingly succinct. The reader's attention should be directed to the program listing below. (For non-Pascal programmers, the line numbers are not part of the language, and are used for explanatory purposes only.) 1 program mtrees (input,ouput); 2 const 3 max_depth = 26; { for a maximum of 67,108,864 records } 4 n = 200; { number of items to be read in } 5 6 type 7 node_ptr = ^ node; 8 node = 9 record 10 ll : node_ptr; 11 info : integer; 12 rl : node_ptr; 13 end; 14 var 15 i,count,value : integer; 16 root : node_ptr; At lines 8 through 13, the record 'node' is seen to have a structure typical of any binary tree - an information portion (info) and two link portions. At line 16, the variable that points to the root of the tree is declared. 17 procedure get_a_number (var current_node : node_ptr); 18 { generate the next number { in ascending sequence }, fetch a node, and 19 put the number into the info portion of the node.} 20 var 21 p : node_ptr; 22 begin 23 count := count + 1; 24 value := value + trunc(random*5)+1; 25 new(p); 26 with p^ do 27 begin 28 info := value; 29 ll := nil; 30 rl := nil 31 end; 32 current_node := p; 33 end; Lines 17 through 33 make up a procedure that generates a number (larger than the previous number), fetches a node, and places the number into the information portion of the node. Also, the two links of the node are set to nil (i.e. they point to nothing). 34 procedure create_a_subtree (level :integer; var ret : node_pointer); 35 var 36 this_node : node_pointer; 37 begin 38 if level = max_depth then 39 { this is code for the bottom level } 40 if count < n then get_a_number (ret) 41 else ret := nil 42 else 43 begin 44 { this is code for a middle (or top) level } 45 create_a_subtree (level+1, ret); 46 if (ret <> nil) and (count < n) then 47 begin 48 get_a_number (this_node); 49 this_node^.left_link := ret; 50 ret := this_node; 51 if count < n 52 then create_a_subtree (level+1, this_node^.right_link) 53 end 54 end 55 end; Line numbers 34 through 54 make up the procedure create_a_subtree. All of the meaningful work is done in this procedure. There are essentially two cases: either a node is to be inserted at the bottom of the tree (becoming a leaf) or it is not. At line 38, if level equals max_depth then the node is to be inserted at the bottom. The only requirement (for this case) is to generate the number, put it into a node then send the address of the node back to the caller. If the new node is NOT to be placed at the bottom of the tree (as a leaf node), then recurse and create a subtree (line 45), fetch another node with a new number (line 48), hook the subtree onto the left link (line 49), then recurse and create a right subtree for this node (line 52). 56 begin 57 count := 0; 58 value := 0; 59 create_a_subtree (1,root); 60 end. The last 5 lines make up the 'main' program. Some initialization is required before the procedure create_a_subtree is invoked. 'Count' is a variable used to count nodes, and 'value' is used to ensure that the generated numbers will always be in order (see line 24). This completes the description of the algorithm. Variations The procedure 'get a number' can be modified to cut nodes out of an existing Binary Tree. These nodes can then be placed into an M-Tree using the same algorithm as above. Using this approach, a skewed binary tree can be converted to an M-Tree effeciently, both in terms of time AND space. 1 procedure get_a_node ( j, father : node_ptr; var cut_node : node_ptr); 2 begin 3 while j^.left_link <> nil do 4 begin 5 father := j; 6 j := j^.left_link 7 end; 8 father^.left_link := j^.rl; 9 cut_node := j; 10 end; The while loop (lines 3 through 7), moves a pointer (j) to the first node (logically) in the tree. Line 8 disconnects that node from the tree, and line 9 ensures that the address is returned to the caller. Application The first question that might arise is: Why build a tree if the data is already in order? One particular application comes to mind, that of adding records to a masterfile. Normally, the masterfile will already be in order, and just a few records are to be added to the existing file (new customers, and the like). The masterfile could be read using the M-Tree algorithm, and then the new records could be inserted using a conventional Binary Tree insertion algorithm. References References to Binary Trees can be found in almost any book on Data Structures. The first authoritative publication on Data Structures was Fundamental Algorithms (Volume 1 of a 3 volume set) written by Donald Knuth (1968). The notation in Knuth's book is now dated; however, the three volume set is still widely used and referenced by many researchers in Computer Science.