Abstract
This paper defines M-Trees, describes their characteristics, shows the
relationship between M-Trees and conventional Binary Trees, then presents a
Pascal program for constructing an M-Tree.  The algorithm in this paper can
be used to build, efficiently, an M-balanced Binary Tree given ordered input.

Trees
A tree, as referenced in this paper, is one of the fundamental abstractions in
Computer Science.  When drawing a tree, one usually abandons natures time
honored tradition and places the root of the tree at the top, with the leaves
at the bottom.  The term 'node' is used to designate an item in the tree that
contains information.  A node is a record that contains an information portion
and two or more link portions.  Formally, a tree is defined to be: 0 or more
nodes, such that, the link portions of the nodes point to roots of sub-trees.
In the purest sense, a given node in a tree may have 0 or more sub-trees.  A
tree is usually referred to as an n-ary tree, indicating that there may be as
many as N subtrees coming off of a given node (where N >= 2).

binary trees

Most definitions of trees refer to binary trees (little b, little t). They
include a definition of a node that is usually quite similar to the
following Pascal description:

  type
    node_ptr = ^ node;
  type
    node =
      record
        left_link : node_ptr;
        information_portion : any_type;
        right_link : node_ptr
      end

Note that the definition is directly recursive.  In fact, the link fields
point to other nodes (which are themselves roots of subtrees).  Also note,
that there are two (2) link fields, therefore, there will be (at most) two (2)
subtrees coming off of a given node.  This type of tree is also known as a
bifurcating arborescence (bifurcate means to divide into halves, and an
arborescence is a tree-like structure).

Binary Trees

This paper differentiates between binary trees (lower case b and lower case t)
and Binary Trees (upper case B and upper case T).  A Binary Tree has the same
structure as a binary tree, but the definition incorporates an additional rule.
For any given node (x), all the the information portions of all of the nodes in
the left-subtree of x are less than the information portion of x, and all the
the information portions of all of the nodes in the right-subtree of x are
greater than the information portion of x.  (Nodes whose information portion
are equal to the information portion of X can be on the left OR on the right,
just as long as the rule is consistently applied.  For this paper,  nodes,
whose information portions are equal, will not be considered.)

Height of trees

The height of a tree is defined to be the number of links from the root of the
tree to the bottom (i.e. a leaf node) of the tree FOLLOWING THE LONGEST PATH.

Balance

A subtree is said to be balanced, within a factor of K, if, for all nodes X,
the height of the left-subtree of X differs from the height of the
right-subtree of X by, at most, K.  For AVL trees, K is unity (1). A prefectly
balanced tree is a tree with the following property: The height of the
left-subtree of X is equal to the height of the right-subtree of X, for all
nodes X.

M-Trees

An M-Tree is, first of all, a Binary Tree.  It's uniqueness is in it's
structure.  A tree is said to be M-Balanced, if, and only if, the following
conditions are true:

  1)  The left-subtree of the root is perfectly balanced.

  2)  The right-subtree of the root contains whatever nodes are left over
      after balancing the left subtree, and

  3)  The height of the right-subtree of the root is less than or equal to
      the height of the left-subtree of the root. In a typical M-Tree, the
      left-subtree of the root contains the majority of the nodes, and the
      right-subtree of the root is a 'stub of a subtree' containing just a
      few nodes.  Formally, given M nodes to be entered into the tree, and
      given N (which is the largest power of 2 <= N), N-1 nodes will be in
      the left-subtree of the root, and M-N-1 nodes will be in the
      right-subtree of the root.

The algorithm

The algorithm, recursively coded in Pascal, turned out to be surprisingly
succinct.  The reader's attention should be directed to the program listing
below.  (For non-Pascal programmers, the line numbers are not part of the
language, and are used for explanatory purposes only.)

 1   program mtrees (input,ouput);
 2     const
 3       max_depth = 26;  { for a maximum of 67,108,864 records }
 4       n = 200;         { number of items to be read in }
 5
 6     type
 7       node_ptr = ^ node;
 8       node =
 9         record
10           ll   : node_ptr;
11           info : integer;
12           rl   : node_ptr;
13         end;
14     var
15       i,count,value : integer;
16       root : node_ptr;


At lines 8 through 13, the record 'node' is seen to have a structure typical
of any binary tree - an information portion (info) and two link portions.
At line 16, the variable that points to the root of the tree is declared.

17     procedure get_a_number (var current_node : node_ptr);
18     { generate the next number { in ascending sequence }, fetch a node, and
19       put the number into the info portion of the node.}
20       var
21         p : node_ptr;
22       begin
23         count := count + 1;
24         value := value + trunc(random*5)+1;
25         new(p);
26         with p^ do
27           begin
28             info := value;
29             ll   := nil;
30             rl   := nil
31           end;
32         current_node := p;
33       end;

Lines 17 through 33 make up a procedure that generates a number (larger than
the previous number), fetches a node, and places the number into the
information portion of the node.  Also, the two links of the node are set to
nil (i.e. they point to nothing).

34     procedure create_a_subtree (level :integer; var ret : node_pointer);
35       var
36         this_node : node_pointer;
37       begin
38         if level = max_depth then
39          { this is code for the bottom level }
40            if count < n then get_a_number (ret)
41             else ret := nil
42         else
43           begin
44           { this is code for a middle (or top) level }
45             create_a_subtree (level+1, ret);
46             if (ret  <> nil) and (count < n) then
47               begin
48                 get_a_number (this_node);
49                 this_node^.left_link := ret;
50                 ret := this_node;
51                 if count < n
52                 then create_a_subtree (level+1, this_node^.right_link)
53               end
54           end
55     end;

Line numbers 34 through 54 make up the procedure create_a_subtree.  All of
the meaningful work is done in this procedure. There are essentially two
cases: either a node is to be inserted at the bottom of the tree (becoming a
leaf) or it is not.  At line 38, if level equals max_depth then the node is
to be inserted at the bottom.  The only requirement (for this case) is to
generate the number, put it into a node then send the address of the node
back to the caller.  If the new node is NOT to be placed at the bottom of
the tree (as a leaf node), then recurse and create a subtree (line 45),
fetch another node with a new number (line 48), hook the subtree onto the
left link (line 49), then recurse and create a right subtree for this node
(line 52).

56     begin
57       count := 0;
58       value := 0;
59       create_a_subtree (1,root);
60     end.

The last 5 lines make up the 'main' program.  Some initialization is required
before the procedure create_a_subtree is invoked.  'Count' is a variable used
to count nodes, and 'value' is used to ensure that the generated numbers will
always be in order (see line 24).

This completes the description of the algorithm.

Variations

The procedure 'get a number' can be modified to cut nodes out of an existing
Binary Tree.  These nodes can then be placed into an M-Tree using the same
algorithm as above.  Using this approach, a skewed binary tree can be
converted to an M-Tree effeciently, both in terms of time AND space.


 1   procedure get_a_node ( j, father : node_ptr; var cut_node : node_ptr);
 2     begin
 3       while j^.left_link <> nil do
 4         begin
 5           father := j;
 6           j := j^.left_link
 7         end;
 8       father^.left_link := j^.rl;
 9       cut_node := j;
10     end;

The while loop (lines 3 through 7), moves a pointer (j) to the first node
(logically) in the tree.  Line 8 disconnects that node from the tree, and
line 9 ensures that the address is returned to the caller.

Application

The first question that might arise is: Why build a tree if the data is
already in order?  One particular application comes to mind, that of adding
records to a masterfile.  Normally, the masterfile will already be in order,
and just a few records are to be added to the existing file (new customers,
and the like).  The masterfile could be read using the M-Tree algorithm, and
then the new records could be inserted using a conventional Binary Tree
insertion algorithm.

References

References to Binary Trees can be found in almost any book on Data Structures.
The first authoritative publication on Data Structures was Fundamental
Algorithms (Volume 1 of a 3 volume set) written by Donald Knuth (1968).
The notation in Knuth's book is now dated; however, the three volume set
is still widely used and referenced by many researchers in Computer Science.