Implementing IntMap in MoonBit

Key-value containers are an essential part of the standard library in modern programming languages, and their widespread use means that the performance of their basic operations is very important. Most key-value containers in functional languages are implemented based on some kind of balanced binary search tree, which performs well in lookup and insertion operations, but poorly when merging two key-value containers. Hash tables, commonly used in imperative languages, are also not good at merging operations.
IntMap is an immutable key-value container specialized for integers. It can only use integers as keys, and by sacrificing some generality, it achieves efficient merge/intersection operations. This article will start from the simplest binary trie and gradually improve it to IntMap.
Binary Trie
A binary trie is a binary tree that uses the binary representation of each key to determine its position. The binary representation of a key is a finite string of 0s and 1s. If the current bit is 0, it recurses to the left child; if the current bit is 1, it recurses to the right child.
///|
enum BinTrie[T] {
BinTrie[T]
Empty
(T) -> BinTrie[T]
Leaf(type parameter T
T)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch(BinTrie[T]
left~ : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T], BinTrie[T]
right~ : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T])
}
To find the value corresponding to a key in a binary trie, you simply read the binary bits of the key one by one, moving left or right according to their value, until you reach a leaf node.
Here, the order of reading binary bits is from the least significant bit to the most significant bit of the integer.
fn[T] enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie::fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup(BinTrie[T]
self : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T], UInt
key : UInt
UInt) -> type parameter T
T? {
match BinTrie[T]
self {
BinTrie[T]
Empty => T?
None
(T) -> BinTrie[T]
Leaf(T
value) => (T) -> T?
Some(T
value)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch(BinTrie[T]
left~, BinTrie[T]
right~) =>
if UInt
key fn Mod::mod(self : UInt, other : UInt) -> UInt
Calculates the remainder of dividing one unsigned integer by another.
Parameters:
self : The unsigned integer dividend.
other : The unsigned integer divisor.
Returns the remainder of the division operation.
Throws a panic if other is zero.
Example:
let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
% 2U fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== 0 {
BinTrie[T]
left.fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup(UInt
key fn Div::div(self : UInt, other : UInt) -> UInt
Performs division between two unsigned 32-bit integers. The operation follows
standard unsigned integer division rules, where the result is truncated
towards zero.
Parameters:
self : The dividend (the number to be divided).
other : The divisor (the number to divide by).
Returns an unsigned 32-bit integer representing the quotient of the division.
Example:
let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/ 2)
} else {
BinTrie[T]
right.fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup(UInt
key fn Div::div(self : UInt, other : UInt) -> UInt
Performs division between two unsigned 32-bit integers. The operation follows
standard unsigned integer division rules, where the result is truncated
towards zero.
Parameters:
self : The dividend (the number to be divided).
other : The divisor (the number to divide by).
Returns an unsigned 32-bit integer representing the quotient of the division.
Example:
let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/ 2)
}
}
}
To avoid creating too many empty trees, we don't directly call the value constructor, but instead use the branch method.
fn[T] enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie::fn[T] BinTrie::br(left : BinTrie[T], right : BinTrie[T]) -> BinTrie[T]
br(BinTrie[T]
left : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T], BinTrie[T]
right : enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T]) -> enum BinTrie[T] {
Empty
Leaf(T)
Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie[type parameter T
T] {
match (BinTrie[T]
left, BinTrie[T]
right) {
(BinTrie[T]
Empty, BinTrie[T]
Empty) => BinTrie[T]
Empty
_ => (left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch(BinTrie[T]
left~, BinTrie[T]
right~)
}
}
Patricia Tree
The Patricia Tree stores more information than a binary trie to speed up lookups. At each fork, it retains the common prefix of all keys in the subtree (although here it's calculated from the least significant bit, we still use the term prefix) and marks the current branching bit with an unsigned integer. This greatly reduces the number of branches that need to be traversed during a lookup.
///|
enum PatriciaTree[T] {
PatriciaTree[T]
Empty
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key~ : Int
Int, T
value~ : type parameter T
T)
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(
UInt
prefix~ : UInt
UInt,
UInt
mask~ : UInt
UInt,
PatriciaTree[T]
left~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
PatriciaTree[T]
right~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T]
)
}
///|
fn[T] enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree::fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup(PatriciaTree[T]
self : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T], Int
key : Int
Int) -> type parameter T
T? {
match PatriciaTree[T]
self {
PatriciaTree[T]
Empty => T?
None
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key=Int
k, T
value~) => if Int
k fn Eq::equal(self : Int, other : Int) -> Bool
Compares two integers for equality.
Parameters:
self : The first integer to compare.
other : The second integer to compare.
Returns true if both integers have the same value, false otherwise.
Example:
inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
== Int
key { (T) -> T?
Some(T
value) } else { T?
None }
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix~, UInt
mask~, PatriciaTree[T]
left~, PatriciaTree[T]
right~) =>
if Bool
!fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefixBool
(UInt
keyBool
=Int
keyBool
.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uintBool
(), UInt
prefixBool
~, UInt
maskBool
~) {
T?
None
} else if fn zero(k : UInt, mask~ : UInt) -> Bool
zero(Int
key.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint(), UInt
mask~) {
PatriciaTree[T]
left.fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup(Int
key)
} else {
PatriciaTree[T]
right.fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup(Int
key)
}
}
}
///|
fn fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix(UInt
key : UInt
UInt, UInt
mask~ : UInt
UInt) -> UInt
UInt {
UInt
key fn BitAnd::land(self : UInt, other : UInt) -> UInt
Performs a bitwise AND operation between two unsigned 32-bit integers. For
each bit position, the result is 1 if the bits at that position in both
operands are 1, and 0 otherwise.
Parameters:
self : The first unsigned 32-bit integer operand.
other : The second unsigned 32-bit integer operand.
Returns an unsigned 32-bit integer representing the result of the bitwise AND
operation.
Example:
let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
& (UInt
mask fn Sub::sub(self : UInt, other : UInt) -> UInt
Performs subtraction between two unsigned 32-bit integers. When the result
would be negative, the function wraps around using modular arithmetic (2^32).
Parameters:
self : The first unsigned 32-bit integer (minuend).
other : The second unsigned 32-bit integer to subtract from the first
(subtrahend).
Returns a new unsigned 32-bit integer representing the difference between the
two numbers. If the result would be negative, it wraps around to a positive
number by adding 2^32 repeatedly until the result is in range.
Example:
let a = 5U
let b = 3U
inspect(a - b, content="2")
let c = 3U
let d = 5U
inspect(c - d, content="4294967294") // wraps around to 2^32 - 2
- 1U)
}
///|
fn fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix(UInt
key~ : UInt
UInt, UInt
prefix~ : UInt
UInt, UInt
mask~ : UInt
UInt) -> Bool
Bool {
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix(UInt
key, UInt
mask~) fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== UInt
prefix
}
///|
fn fn zero(k : UInt, mask~ : UInt) -> Bool
zero(UInt
k : UInt
UInt, UInt
mask~ : UInt
UInt) -> Bool
Bool {
(UInt
k fn BitAnd::land(self : UInt, other : UInt) -> UInt
Performs a bitwise AND operation between two unsigned 32-bit integers. For
each bit position, the result is 1 if the bits at that position in both
operands are 1, and 0 otherwise.
Parameters:
self : The first unsigned 32-bit integer operand.
other : The second unsigned 32-bit integer operand.
Returns an unsigned 32-bit integer representing the result of the bitwise AND
operation.
Example:
let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
& UInt
mask) fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== 0
}
Now the branch method can be further optimized to ensure that Branch nodes do not contain Empty subtrees.
///|
fn[T] enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree::fn[T] PatriciaTree::branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
branch(
UInt
prefix~ : UInt
UInt,
UInt
mask~ : UInt
UInt,
PatriciaTree[T]
left~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
PatriciaTree[T]
right~ : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
) -> enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T] {
match (PatriciaTree[T]
left, PatriciaTree[T]
right) {
(PatriciaTree[T]
Empty, PatriciaTree[T]
right) => PatriciaTree[T]
right
(PatriciaTree[T]
left, PatriciaTree[T]
Empty) => PatriciaTree[T]
left
_ => (prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix~, UInt
mask~, PatriciaTree[T]
left~, PatriciaTree[T]
right~)
}
}
Insertion and Merging
Now that the type definitions are established, the next step is to implement the insertion and merging operations. Since an insertion operation can also be viewed as merging a tree with only one leaf node into an existing tree, we will prioritize introducing the implementation of the merge operation.
We first discuss a shortcut: suppose we have two non-empty trees, t0 and t1, whose longest common prefixes are p0 and p1, respectively, and p0 and p1 do not contain each other. In this case, no matter how large t0 and t1 are, the cost of merging them is the same, because only a new Branch node needs to be created. We implement this using the helper function join.
The gen_mask function, which generates a mask, utilizes a property of two's complement representation of integers to find the lowest branching bit.
Assume the binary representation of the input x is
00100100000
Then, x.lnot() gives
11011011111
Adding one gives
11011100000
After a bitwise AND with the original x, we get:
00000100000
///|
fn[T] fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join(
UInt
p0 : UInt
UInt,
PatriciaTree[T]
t0 : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
UInt
p1 : UInt
UInt,
PatriciaTree[T]
t1 : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
) -> enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T] {
let UInt
mask = fn gen_mask(p0 : UInt, p1 : UInt) -> UInt
gen_mask(UInt
p0, UInt
p1)
if fn zero(k : UInt, mask~ : UInt) -> Bool
zero(UInt
p0, UInt
mask~) {
PatriciaTree::(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix=fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix(UInt
p0, UInt
mask~), UInt
mask~, PatriciaTree[T]
left=PatriciaTree[T]
t0, PatriciaTree[T]
right=PatriciaTree[T]
t1)
} else {
PatriciaTree::(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix=fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix(UInt
p0, UInt
mask~), UInt
mask~, PatriciaTree[T]
left=PatriciaTree[T]
t1, PatriciaTree[T]
right=PatriciaTree[T]
t0)
}
}
///|
fn fn gen_mask(p0 : UInt, p1 : UInt) -> UInt
gen_mask(UInt
p0 : UInt
UInt, UInt
p1 : UInt
UInt) -> UInt
UInt {
fn (UInt) -> UInt
lowest_bit(UInt
x : UInt
UInt) -> UInt
UInt {
UInt
x fn BitAnd::land(self : UInt, other : UInt) -> UInt
Performs a bitwise AND operation between two unsigned 32-bit integers. For
each bit position, the result is 1 if the bits at that position in both
operands are 1, and 0 otherwise.
Parameters:
self : The first unsigned 32-bit integer operand.
other : The second unsigned 32-bit integer operand.
Returns an unsigned 32-bit integer representing the result of the bitwise AND
operation.
Example:
let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
& (UInt
x.fn UInt::reinterpret_as_int(self : UInt) -> Int
reinterpret the unsigned int as signed int
For number within the range of 0..=2^31-1,
the value is the same. For number within the range of 2^31..=2^32-1,
the value is negative
reinterpret_as_int().fn Neg::neg(self : Int) -> Int
Performs arithmetic negation on an integer value, returning its additive
inverse.
Parameters:
self : The integer value to negate.
Returns the negation of the input value. For all inputs except
Int::min_value(), returns the value with opposite sign. When the input is
Int::min_value(), returns Int::min_value() due to two's complement
representation.
Example:
inspect(-42, content="-42")
inspect(42, content="42")
inspect(2147483647, content="2147483647") // negating near min value
neg().fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint())
}
(UInt) -> UInt
lowest_bit(UInt
p0 fn BitXOr::lxor(self : UInt, other : UInt) -> UInt
Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit
integers. Each bit in the result is set to 1 if the corresponding bits in the
operands are different, and 0 if they are the same.
Parameters:
self : The first unsigned 32-bit integer operand.
other : The second unsigned 32-bit integer operand.
Returns the result of the bitwise XOR operation.
Example:
let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^ UInt
p1)
}
Everything is ready, and we can now start writing the insert_with function. The handling of Empty and Leaf branches is very straightforward, while for Branch, we call join when the prefixes do not contain each other, and otherwise, we recursively descend into one of the branches based on the branch bit.
///|
fn[T] enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree::fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with(
PatriciaTree[T]
self : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
Int
k : Int
Int,
T
v : type parameter T
T,
(T, T) -> T
combine~ : (type parameter T
T, type parameter T
T) -> type parameter T
T,
) -> enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T] {
fn (PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
tree : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T]) -> enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T] {
match PatriciaTree[T]
tree {
PatriciaTree[T]
Empty => (key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key=Int
k, T
value=T
v)
(key~ : Int, value~ : T) -> PatriciaTree[T]
LeafPatriciaTree[T]
(Int
keyPatriciaTree[T]
~, T
valuePatriciaTree[T]
~) as tree =>
if Int
key fn Eq::equal(self : Int, other : Int) -> Bool
Compares two integers for equality.
Parameters:
self : The first integer to compare.
other : The second integer to compare.
Returns true if both integers have the same value, false otherwise.
Example:
inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
== Int
k {
PatriciaTree::(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key~, T
value=(T, T) -> T
combine(T
v, T
value))
} else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join(
Int
k.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint(),
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key=Int
k, T
value=T
v),
Int
key.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint(),
PatriciaTree[T]
tree,
)
}
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
BranchPatriciaTree[T]
(UInt
prefixPatriciaTree[T]
~, UInt
maskPatriciaTree[T]
~, PatriciaTree[T]
leftPatriciaTree[T]
~, PatriciaTree[T]
rightPatriciaTree[T]
~) as tree =>
if fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix(UInt
key=Int
k.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint(), UInt
prefix~, UInt
mask~) {
if fn zero(k : UInt, mask~ : UInt) -> Bool
zero(Int
k.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint(), UInt
mask~) {
PatriciaTree::(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix~, UInt
mask~, PatriciaTree[T]
left=(PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
left), PatriciaTree[T]
right~)
} else {
PatriciaTree::(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix~, UInt
mask~, PatriciaTree[T]
left~, PatriciaTree[T]
right=(PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
right))
}
} else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join(Int
k.fn Int::reinterpret_as_uint(self : Int) -> UInt
reinterpret the signed int as unsigned int, when the value is
non-negative, i.e, 0..=2^31-1, the value is the same. When the
value is negative, it turns into a large number,
for example, -1 turns into 2^32-1
reinterpret_as_uint(), (key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key=Int
k, T
value=T
v), UInt
prefix, PatriciaTree[T]
tree)
}
}
}
(PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
self)
}
Merge operations generally follow the same logic, with the slight difference that they also consider cases where the prefix and mask are identical.
///|
fn[T] enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree::fn[T] PatriciaTree::union_with(combine~ : (T, T) -> T, left : PatriciaTree[T], right : PatriciaTree[T]) -> PatriciaTree[T]
union_with(
(T, T) -> T
combine~ : (type parameter T
T, type parameter T
T) -> type parameter T
T,
PatriciaTree[T]
left : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
PatriciaTree[T]
right : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T],
) -> enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T] {
fn (PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
left : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T], PatriciaTree[T]
right : enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T]) -> enum PatriciaTree[T] {
Empty
Leaf(key~ : Int, value~ : T)
Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree[type parameter T
T] {
match (PatriciaTree[T]
left, PatriciaTree[T]
right) {
(PatriciaTree[T]
Empty, PatriciaTree[T]
t) | (PatriciaTree[T]
t, PatriciaTree[T]
Empty) => PatriciaTree[T]
t
((key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key~, T
value~), PatriciaTree[T]
t) => PatriciaTree[T]
t.fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with(Int
key, T
value, (T, T) -> T
combine~)
(PatriciaTree[T]
t, (key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf(Int
key~, T
value~)) =>
PatriciaTree[T]
t.fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with(Int
key, T
value, (T, T) -> T
combine=fn(T
x, T
y) { (T, T) -> T
combine(T
y, T
x) })
(
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
BranchPatriciaTree[T]
(UInt
prefixPatriciaTree[T]
=UInt
pPatriciaTree[T]
, UInt
maskPatriciaTree[T]
=UInt
mPatriciaTree[T]
, PatriciaTree[T]
leftPatriciaTree[T]
=PatriciaTree[T]
s0PatriciaTree[T]
, PatriciaTree[T]
rightPatriciaTree[T]
=PatriciaTree[T]
s1PatriciaTree[T]
) as s,
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
BranchPatriciaTree[T]
(UInt
prefixPatriciaTree[T]
=UInt
qPatriciaTree[T]
, UInt
maskPatriciaTree[T]
=UInt
nPatriciaTree[T]
, PatriciaTree[T]
leftPatriciaTree[T]
=PatriciaTree[T]
t0PatriciaTree[T]
, PatriciaTree[T]
rightPatriciaTree[T]
=PatriciaTree[T]
t1PatriciaTree[T]
) as t,
) =>
if UInt
m fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== UInt
n (Bool, Bool) -> Bool
&& UInt
p fn Eq::equal(self : UInt, other : UInt) -> Bool
Compares two unsigned 32-bit integers for equality.
Parameters:
self : The first unsigned integer operand.
other : The second unsigned integer operand to compare with.
Returns true if both integers have the same value, false otherwise.
Example:
let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
== UInt
q {
// The trees have the same prefix. Merge the subtrees
PatriciaTree::(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(
UInt
prefix=UInt
p,
UInt
mask=UInt
m,
PatriciaTree[T]
left=(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
s0, PatriciaTree[T]
t0),
PatriciaTree[T]
right=(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
s1, PatriciaTree[T]
t1),
)
} else if UInt
m fn Compare::op_lt(x : UInt, y : UInt) -> Bool
< UInt
n (Bool, Bool) -> Bool
&& fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix(UInt
key=UInt
q, UInt
prefix=UInt
p, UInt
mask=UInt
m) {
// q contains p. Merge t with a subtree of s
if fn zero(k : UInt, mask~ : UInt) -> Bool
zero(UInt
q, UInt
mask=UInt
m) {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix=UInt
p, UInt
mask=UInt
m, PatriciaTree[T]
left=(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
s0, PatriciaTree[T]
t), PatriciaTree[T]
right=PatriciaTree[T]
s1)
} else {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix=UInt
p, UInt
mask=UInt
m, PatriciaTree[T]
left=PatriciaTree[T]
s0, PatriciaTree[T]
right=(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
s1, PatriciaTree[T]
t))
}
} else if UInt
m fn Compare::op_gt(x : UInt, y : UInt) -> Bool
> UInt
n (Bool, Bool) -> Bool
&& fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix(UInt
key=UInt
p, UInt
prefix=UInt
q, UInt
mask=UInt
n) {
// p contains q. Merge s with a subtree of t.
if fn zero(k : UInt, mask~ : UInt) -> Bool
zero(UInt
p, UInt
mask=UInt
n) {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix=UInt
q, UInt
mask=UInt
n, PatriciaTree[T]
left=(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
s, PatriciaTree[T]
t0), PatriciaTree[T]
right=PatriciaTree[T]
t1)
} else {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch(UInt
prefix=UInt
q, UInt
mask=UInt
n, PatriciaTree[T]
left=PatriciaTree[T]
t0, PatriciaTree[T]
right=(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
s, PatriciaTree[T]
t1))
}
} else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join(UInt
p, PatriciaTree[T]
s, UInt
q, PatriciaTree[T]
t)
}
}
}
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go(PatriciaTree[T]
left, PatriciaTree[T]
right)
}
Big-endian Patricia Tree
The Big-endian Patricia Tree changes the order of calculating branching bits from the most significant bit to the least significant bit, building upon the Little-endian Patricia Tree.
What are the benefits of doing this?
-
Better locality. In a Big-endian Patricia Tree, integer keys of similar size are placed close to each other.
-
Facilitates efficient sequential traversal of keys, simply by implementing a standard pre-order/post-order traversal.
-
Merging is often faster. In practice, integer keys in an intmap are usually contiguous. In this case, a Big-endian Patricia Tree will have longer common prefixes, making merge operations faster.
-
In a Big-endian Patricia Tree, if keys are treated as unsigned integers, every key in the right subtree is greater than the key of the current node (conversely, the left subtree contains smaller keys). When writing a lookup function, you only need to use unsigned integer comparison to determine which branch to follow next. On most machines, this can be done with a single instruction, which is low-cost.
Since the final version of the IntMap implementation is not significantly different from the Little Endian Patricia Tree described earlier, it will not be elaborated on here. Readers who are interested can refer to the implementation in this repository: https://github.com/moonbit-community/intmap
A Bug in the Original Implementation
Although the idea behind IntMap's implementation is quite concise and clear, it is still possible to make some very subtle mistakes when writing the specific implementation code. Even the original paper's author was not immune when writing the SML implementation of IntMap, and this issue was later inherited by OCaml's Ptset/Ptmap modules. It wasn't until the paper QuickChecking Patricia Trees, published in 2018, that this problem was discovered.
Specifically, because SML and OCaml languages do not provide unsigned integer types, the masks in the IntMap type were stored as int in the implementations of these two languages. However, when comparing masks in the union_with function, they all forgot that unsigned integer comparison should be used.