Skip to main content

Implementing IntMap in MoonBit

· 8 min read

Key-value containers are an essential part of the standard library in modern programming languages, and their widespread use means that the performance of their basic operations is very important. Most key-value containers in functional languages are implemented based on some kind of balanced binary search tree, which performs well in lookup and insertion operations, but poorly when merging two key-value containers. Hash tables, commonly used in imperative languages, are also not good at merging operations.

IntMap is an immutable key-value container specialized for integers. It can only use integers as keys, and by sacrificing some generality, it achieves efficient merge/intersection operations. This article will start from the simplest binary trie and gradually improve it to IntMap.

Binary Trie

A binary trie is a binary tree that uses the binary representation of each key to determine its position. The binary representation of a key is a finite string of 0s and 1s. If the current bit is 0, it recurses to the left child; if the current bit is 1, it recurses to the right child.

///|
enum BinTrie[T] {
  
BinTrie[T]
Empty
(T) -> BinTrie[T]
Leaf
(

type parameter T

T
)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch
(
BinTrie[T]
left
~ :
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
],
BinTrie[T]
right
~ :
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
])
}

To find the value corresponding to a key in a binary trie, you simply read the binary bits of the key one by one, moving left or right according to their value, until you reach a leaf node.

Here, the order of reading binary bits is from the least significant bit to the most significant bit of the integer.

fn[T] 
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
::
fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup
(
BinTrie[T]
self
:
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
],
UInt
key
:
UInt
UInt
) ->

type parameter T

T
? {
match
BinTrie[T]
self
{
BinTrie[T]
Empty
=>
T?
None
(T) -> BinTrie[T]
Leaf
(
T
value
) =>
(T) -> T?
Some
(
T
value
)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch
(
BinTrie[T]
left
~,
BinTrie[T]
right
~) =>
if
UInt
key
fn Mod::mod(self : UInt, other : UInt) -> UInt

Calculates the remainder of dividing one unsigned integer by another.

Parameters:

  • self : The unsigned integer dividend.
  • other : The unsigned integer divisor.

Returns the remainder of the division operation.

Throws a panic if other is zero.

Example:

let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
%
2U
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
0 {
BinTrie[T]
left
.
fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup
(
UInt
key
fn Div::div(self : UInt, other : UInt) -> UInt

Performs division between two unsigned 32-bit integers. The operation follows standard unsigned integer division rules, where the result is truncated towards zero.

Parameters:

  • self : The dividend (the number to be divided).
  • other : The divisor (the number to divide by).

Returns an unsigned 32-bit integer representing the quotient of the division.

Example:

let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/
2)
} else {
BinTrie[T]
right
.
fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup
(
UInt
key
fn Div::div(self : UInt, other : UInt) -> UInt

Performs division between two unsigned 32-bit integers. The operation follows standard unsigned integer division rules, where the result is truncated towards zero.

Parameters:

  • self : The dividend (the number to be divided).
  • other : The divisor (the number to divide by).

Returns an unsigned 32-bit integer representing the quotient of the division.

Example:

let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/
2)
} } }

To avoid creating too many empty trees, we don't directly call the value constructor, but instead use the branch method.

fn[T] 
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
::
fn[T] BinTrie::br(left : BinTrie[T], right : BinTrie[T]) -> BinTrie[T]
br
(
BinTrie[T]
left
:
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
],
BinTrie[T]
right
:
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
]) ->
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
] {
match (
BinTrie[T]
left
,
BinTrie[T]
right
) {
(
BinTrie[T]
Empty
,
BinTrie[T]
Empty
) =>
BinTrie[T]
Empty
_ =>
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch
(
BinTrie[T]
left
~,
BinTrie[T]
right
~)
} }

Patricia Tree

The Patricia Tree stores more information than a binary trie to speed up lookups. At each fork, it retains the common prefix of all keys in the subtree (although here it's calculated from the least significant bit, we still use the term prefix) and marks the current branching bit with an unsigned integer. This greatly reduces the number of branches that need to be traversed during a lookup.

///|
enum PatriciaTree[T] {
  
PatriciaTree[T]
Empty
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~ :
Int
Int
,
T
value
~ :

type parameter T

T
)
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~ :
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
,
PatriciaTree[T]
left
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
]
) } ///| fn[T]
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup
(
PatriciaTree[T]
self
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
Int
key
:
Int
Int
) ->

type parameter T

T
? {
match
PatriciaTree[T]
self
{
PatriciaTree[T]
Empty
=>
T?
None
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
~) => if
Int
k
fn Eq::equal(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
==
Int
key
{
(T) -> T?
Some
(
T
value
) } else {
T?
None
}
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
~,
PatriciaTree[T]
right
~) =>
if
Bool
!
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
Bool
(
UInt
key
Bool
=
Int
key
Bool
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
Bool
(),
UInt
prefix
Bool
~,
UInt
mask
Bool
~)
{
T?
None
} else if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
Int
key
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
UInt
mask
~) {
PatriciaTree[T]
left
.
fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup
(
Int
key
)
} else {
PatriciaTree[T]
right
.
fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup
(
Int
key
)
} } } ///| fn
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
key
:
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
) ->
UInt
UInt
{
UInt
key
fn BitAnd::land(self : UInt, other : UInt) -> UInt

Performs a bitwise AND operation between two unsigned 32-bit integers. For each bit position, the result is 1 if the bits at that position in both operands are 1, and 0 otherwise.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns an unsigned 32-bit integer representing the result of the bitwise AND operation.

Example:

let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
&
(
UInt
mask
fn Sub::sub(self : UInt, other : UInt) -> UInt

Performs subtraction between two unsigned 32-bit integers. When the result would be negative, the function wraps around using modular arithmetic (2^32).

Parameters:

  • self : The first unsigned 32-bit integer (minuend).
  • other : The second unsigned 32-bit integer to subtract from the first (subtrahend).

Returns a new unsigned 32-bit integer representing the difference between the two numbers. If the result would be negative, it wraps around to a positive number by adding 2^32 repeatedly until the result is in range.

Example:

let a = 5U
let b = 3U
inspect(a - b, content="2")
let c = 3U
let d = 5U
inspect(c - d, content="4294967294") // wraps around to 2^32 - 2
-
1U)
} ///| fn
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
~ :
UInt
UInt
,
UInt
prefix
~ :
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
) ->
Bool
Bool
{
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
key
,
UInt
mask
~)
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
prefix
} ///| fn
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
k
:
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
) ->
Bool
Bool
{
(
UInt
k
fn BitAnd::land(self : UInt, other : UInt) -> UInt

Performs a bitwise AND operation between two unsigned 32-bit integers. For each bit position, the result is 1 if the bits at that position in both operands are 1, and 0 otherwise.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns an unsigned 32-bit integer representing the result of the bitwise AND operation.

Example:

let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
&
UInt
mask
)
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
0
}

Now the branch method can be further optimized to ensure that Branch nodes do not contain Empty subtrees.

///|
fn[T] 
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
branch
(
UInt
prefix
~ :
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
,
PatriciaTree[T]
left
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
match (
PatriciaTree[T]
left
,
PatriciaTree[T]
right
) {
(
PatriciaTree[T]
Empty
,
PatriciaTree[T]
right
) =>
PatriciaTree[T]
right
(
PatriciaTree[T]
left
,
PatriciaTree[T]
Empty
) =>
PatriciaTree[T]
left
_ =>
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
~,
PatriciaTree[T]
right
~)
} }

Insertion and Merging

Now that the type definitions are established, the next step is to implement the insertion and merging operations. Since an insertion operation can also be viewed as merging a tree with only one leaf node into an existing tree, we will prioritize introducing the implementation of the merge operation.

We first discuss a shortcut: suppose we have two non-empty trees, t0 and t1, whose longest common prefixes are p0 and p1, respectively, and p0 and p1 do not contain each other. In this case, no matter how large t0 and t1 are, the cost of merging them is the same, because only a new Branch node needs to be created. We implement this using the helper function join.

The gen_mask function, which generates a mask, utilizes a property of two's complement representation of integers to find the lowest branching bit.

Assume the binary representation of the input x is

00100100000

Then, x.lnot() gives

11011011111

Adding one gives

11011100000

After a bitwise AND with the original x, we get:

00000100000
///|
fn[T] 
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
UInt
p0
:
UInt
UInt
,
PatriciaTree[T]
t0
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
UInt
p1
:
UInt
UInt
,
PatriciaTree[T]
t1
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
let
UInt
mask
=
fn gen_mask(p0 : UInt, p1 : UInt) -> UInt
gen_mask
(
UInt
p0
,
UInt
p1
)
if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
p0
,
UInt
mask
~) {
PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
p0
,
UInt
mask
~),
UInt
mask
~,
PatriciaTree[T]
left
=
PatriciaTree[T]
t0
,
PatriciaTree[T]
right
=
PatriciaTree[T]
t1
)
} else { PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
p0
,
UInt
mask
~),
UInt
mask
~,
PatriciaTree[T]
left
=
PatriciaTree[T]
t1
,
PatriciaTree[T]
right
=
PatriciaTree[T]
t0
)
} } ///| fn
fn gen_mask(p0 : UInt, p1 : UInt) -> UInt
gen_mask
(
UInt
p0
:
UInt
UInt
,
UInt
p1
:
UInt
UInt
) ->
UInt
UInt
{
fn
(UInt) -> UInt
lowest_bit
(
UInt
x
:
UInt
UInt
) ->
UInt
UInt
{
UInt
x
fn BitAnd::land(self : UInt, other : UInt) -> UInt

Performs a bitwise AND operation between two unsigned 32-bit integers. For each bit position, the result is 1 if the bits at that position in both operands are 1, and 0 otherwise.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns an unsigned 32-bit integer representing the result of the bitwise AND operation.

Example:

let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
&
(
UInt
x
.
fn UInt::reinterpret_as_int(self : UInt) -> Int

reinterpret the unsigned int as signed int For number within the range of 0..=2^31-1, the value is the same. For number within the range of 2^31..=2^32-1, the value is negative

reinterpret_as_int
().
fn Neg::neg(self : Int) -> Int

Performs arithmetic negation on an integer value, returning its additive inverse.

Parameters:

  • self : The integer value to negate.

Returns the negation of the input value. For all inputs except Int::min_value(), returns the value with opposite sign. When the input is Int::min_value(), returns Int::min_value() due to two's complement representation.

Example:

inspect(-42, content="-42")
inspect(42, content="42")
inspect(2147483647, content="2147483647") // negating near min value
neg
().
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
())
}
(UInt) -> UInt
lowest_bit
(
UInt
p0
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
UInt
p1
)
}

Everything is ready, and we can now start writing the insert_with function. The handling of Empty and Leaf branches is very straightforward, while for Branch, we call join when the prefixes do not contain each other, and otherwise, we recursively descend into one of the branches based on the branch bit.

///|
fn[T] 
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with
(
PatriciaTree[T]
self
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
Int
k
:
Int
Int
,
T
v
:

type parameter T

T
,
(T, T) -> T
combine
~ : (

type parameter T

T
,

type parameter T

T
) ->

type parameter T

T
,
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
fn
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
tree
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
]) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
match
PatriciaTree[T]
tree
{
PatriciaTree[T]
Empty
=>
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
=
T
v
)
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
PatriciaTree[T]
(
Int
key
PatriciaTree[T]
~,
T
value
PatriciaTree[T]
~) as tree
=>
if
Int
key
fn Eq::equal(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
==
Int
k
{
PatriciaTree::
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~,
T
value
=
(T, T) -> T
combine
(
T
v
,
T
value
))
} else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
=
T
v
),
Int
key
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
PatriciaTree[T]
tree
,
) }
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
PatriciaTree[T]
(
UInt
prefix
PatriciaTree[T]
~,
UInt
mask
PatriciaTree[T]
~,
PatriciaTree[T]
left
PatriciaTree[T]
~,
PatriciaTree[T]
right
PatriciaTree[T]
~) as tree
=>
if
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
=
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
UInt
prefix
~,
UInt
mask
~) {
if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
UInt
mask
~) {
PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
=
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
left
),
PatriciaTree[T]
right
~)
} else { PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
~,
PatriciaTree[T]
right
=
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
right
))
} } else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
=
T
v
),
UInt
prefix
,
PatriciaTree[T]
tree
)
} } }
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
self
)
}

Merge operations generally follow the same logic, with the slight difference that they also consider cases where the prefix and mask are identical.

///|
fn[T] 
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::union_with(combine~ : (T, T) -> T, left : PatriciaTree[T], right : PatriciaTree[T]) -> PatriciaTree[T]
union_with
(
(T, T) -> T
combine
~ : (

type parameter T

T
,

type parameter T

T
) ->

type parameter T

T
,
PatriciaTree[T]
left
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
fn
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
left
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
]) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
match (
PatriciaTree[T]
left
,
PatriciaTree[T]
right
) {
(
PatriciaTree[T]
Empty
,
PatriciaTree[T]
t
) | (
PatriciaTree[T]
t
,
PatriciaTree[T]
Empty
) =>
PatriciaTree[T]
t
(
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~,
T
value
~),
PatriciaTree[T]
t
) =>
PatriciaTree[T]
t
.
fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with
(
Int
key
,
T
value
,
(T, T) -> T
combine
~)
(
PatriciaTree[T]
t
,
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~,
T
value
~)) =>
PatriciaTree[T]
t
.
fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with
(
Int
key
,
T
value
,
(T, T) -> T
combine
=fn(
T
x
,
T
y
) {
(T, T) -> T
combine
(
T
y
,
T
x
) })
(
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
PatriciaTree[T]
(
UInt
prefix
PatriciaTree[T]
=
UInt
p
PatriciaTree[T]
,
UInt
mask
PatriciaTree[T]
=
UInt
m
PatriciaTree[T]
,
PatriciaTree[T]
left
PatriciaTree[T]
=
PatriciaTree[T]
s0
PatriciaTree[T]
,
PatriciaTree[T]
right
PatriciaTree[T]
=
PatriciaTree[T]
s1
PatriciaTree[T]
) as s
,
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
PatriciaTree[T]
(
UInt
prefix
PatriciaTree[T]
=
UInt
q
PatriciaTree[T]
,
UInt
mask
PatriciaTree[T]
=
UInt
n
PatriciaTree[T]
,
PatriciaTree[T]
left
PatriciaTree[T]
=
PatriciaTree[T]
t0
PatriciaTree[T]
,
PatriciaTree[T]
right
PatriciaTree[T]
=
PatriciaTree[T]
t1
PatriciaTree[T]
) as t
,
) => if
UInt
m
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
n
(Bool, Bool) -> Bool
&&
UInt
p
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
q
{
// The trees have the same prefix. Merge the subtrees PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
,
PatriciaTree[T]
left
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s0
,
PatriciaTree[T]
t0
),
PatriciaTree[T]
right
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s1
,
PatriciaTree[T]
t1
),
) } else if
UInt
m
fn Compare::op_lt(x : UInt, y : UInt) -> Bool
<
UInt
n
(Bool, Bool) -> Bool
&&
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
=
UInt
q
,
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
) {
// q contains p. Merge t with a subtree of s if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
q
,
UInt
mask
=
UInt
m
) {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
,
PatriciaTree[T]
left
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s0
,
PatriciaTree[T]
t
),
PatriciaTree[T]
right
=
PatriciaTree[T]
s1
)
} else {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
,
PatriciaTree[T]
left
=
PatriciaTree[T]
s0
,
PatriciaTree[T]
right
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s1
,
PatriciaTree[T]
t
))
} } else if
UInt
m
fn Compare::op_gt(x : UInt, y : UInt) -> Bool
>
UInt
n
(Bool, Bool) -> Bool
&&
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
=
UInt
p
,
UInt
prefix
=
UInt
q
,
UInt
mask
=
UInt
n
) {
// p contains q. Merge s with a subtree of t. if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
p
,
UInt
mask
=
UInt
n
) {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
q
,
UInt
mask
=
UInt
n
,
PatriciaTree[T]
left
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s
,
PatriciaTree[T]
t0
),
PatriciaTree[T]
right
=
PatriciaTree[T]
t1
)
} else {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
q
,
UInt
mask
=
UInt
n
,
PatriciaTree[T]
left
=
PatriciaTree[T]
t0
,
PatriciaTree[T]
right
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s
,
PatriciaTree[T]
t1
))
} } else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
UInt
p
,
PatriciaTree[T]
s
,
UInt
q
,
PatriciaTree[T]
t
)
} } }
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
left
,
PatriciaTree[T]
right
)
}

Big-endian Patricia Tree

The Big-endian Patricia Tree changes the order of calculating branching bits from the most significant bit to the least significant bit, building upon the Little-endian Patricia Tree.

What are the benefits of doing this?

  • Better locality. In a Big-endian Patricia Tree, integer keys of similar size are placed close to each other.

  • Facilitates efficient sequential traversal of keys, simply by implementing a standard pre-order/post-order traversal.

  • Merging is often faster. In practice, integer keys in an intmap are usually contiguous. In this case, a Big-endian Patricia Tree will have longer common prefixes, making merge operations faster.

  • In a Big-endian Patricia Tree, if keys are treated as unsigned integers, every key in the right subtree is greater than the key of the current node (conversely, the left subtree contains smaller keys). When writing a lookup function, you only need to use unsigned integer comparison to determine which branch to follow next. On most machines, this can be done with a single instruction, which is low-cost.

Since the final version of the IntMap implementation is not significantly different from the Little Endian Patricia Tree described earlier, it will not be elaborated on here. Readers who are interested can refer to the implementation in this repository: https://github.com/moonbit-community/intmap

A Bug in the Original Implementation

Although the idea behind IntMap's implementation is quite concise and clear, it is still possible to make some very subtle mistakes when writing the specific implementation code. Even the original paper's author was not immune when writing the SML implementation of IntMap, and this issue was later inherited by OCaml's Ptset/Ptmap modules. It wasn't until the paper QuickChecking Patricia Trees, published in 2018, that this problem was discovered.

Specifically, because SML and OCaml languages do not provide unsigned integer types, the masks in the IntMap type were stored as int in the implementations of these two languages. However, when comparing masks in the union_with function, they all forgot that unsigned integer comparison should be used.