Skip to main content

Implementing IntMap in MoonBit

· 8 min read

Key-value containers are an essential part of the standard library in modern programming languages, and their widespread use means that the performance of their basic operations is very important. Most key-value containers in functional languages are implemented based on some kind of balanced binary search tree, which performs well in lookup and insertion operations, but poorly when merging two key-value containers. Hash tables, commonly used in imperative languages, are also not good at merging operations.

IntMap is an immutable key-value container specialized for integers. It can only use integers as keys, and by sacrificing some generality, it achieves efficient merge/intersection operations. This article will start from the simplest binary trie and gradually improve it to IntMap.

Binary Trie

A binary trie is a binary tree that uses the binary representation of each key to determine its position. The binary representation of a key is a finite string of 0s and 1s. If the current bit is 0, it recurses to the left child; if the current bit is 1, it recurses to the right child.

///|
enum BinTrie[T] {
  
BinTrie[T]
Empty
(T) -> BinTrie[T]
Leaf
(

type parameter T

T
)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch
(
BinTrie[T]
left
~ :
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
],
BinTrie[T]
right
~ :
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
])
}

To find the value corresponding to a key in a binary trie, you simply read the binary bits of the key one by one, moving left or right according to their value, until you reach a leaf node.

Here, the order of reading binary bits is from the least significant bit to the most significant bit of the integer.

fn[T] 
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
::
fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup
(
BinTrie[T]
self
:
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
],
UInt
key
:
UInt
UInt
) ->

type parameter T

T
? {
match
BinTrie[T]
self
{
BinTrie[T]
Empty
=>
T?
None
(T) -> BinTrie[T]
Leaf
(
T
value
) =>
(T) -> T?
Some
(
T
value
)
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch
(
BinTrie[T]
left
~,
BinTrie[T]
right
~) =>
if
UInt
key
fn Mod::mod(self : UInt, other : UInt) -> UInt

Calculates the remainder of dividing one unsigned integer by another.

Parameters:

  • self : The unsigned integer dividend.
  • other : The unsigned integer divisor.

Returns the remainder of the division operation.

Throws a panic if other is zero.

Example:

let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
%
2U
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
0 {
BinTrie[T]
left
.
fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup
(
UInt
key
fn Div::div(self : UInt, other : UInt) -> UInt

Performs division between two unsigned 32-bit integers. The operation follows standard unsigned integer division rules, where the result is truncated towards zero.

Parameters:

  • self : The dividend (the number to be divided).
  • other : The divisor (the number to divide by).

Returns an unsigned 32-bit integer representing the quotient of the division.

Example:

let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/
2)
} else {
BinTrie[T]
right
.
fn[T] BinTrie::lookup(self : BinTrie[T], key : UInt) -> T?
lookup
(
UInt
key
fn Div::div(self : UInt, other : UInt) -> UInt

Performs division between two unsigned 32-bit integers. The operation follows standard unsigned integer division rules, where the result is truncated towards zero.

Parameters:

  • self : The dividend (the number to be divided).
  • other : The divisor (the number to divide by).

Returns an unsigned 32-bit integer representing the quotient of the division.

Example:

let a = 42U
let b = 5U
inspect(a / b, content="8") // Using infix operator
/
2)
} } }

To avoid creating too many empty trees, we don't directly call the value constructor, but instead use the branch method.

fn[T] 
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
::
fn[T] BinTrie::br(left : BinTrie[T], right : BinTrie[T]) -> BinTrie[T]
br
(
BinTrie[T]
left
:
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
],
BinTrie[T]
right
:
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
]) ->
enum BinTrie[T] {
  Empty
  Leaf(T)
  Branch(left~ : BinTrie[T], right~ : BinTrie[T])
}
BinTrie
[

type parameter T

T
] {
match (
BinTrie[T]
left
,
BinTrie[T]
right
) {
(
BinTrie[T]
Empty
,
BinTrie[T]
Empty
) =>
BinTrie[T]
Empty
_ =>
(left~ : BinTrie[T], right~ : BinTrie[T]) -> BinTrie[T]
Branch
(
BinTrie[T]
left
~,
BinTrie[T]
right
~)
} }

Patricia Tree

The Patricia Tree stores more information than a binary trie to speed up lookups. At each fork, it retains the common prefix of all keys in the subtree (although here it's calculated from the least significant bit, we still use the term prefix) and marks the current branching bit with an unsigned integer. This greatly reduces the number of branches that need to be traversed during a lookup.

///|
enum PatriciaTree[T] {
  
PatriciaTree[T]
Empty
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~ :
Int
Int
,
T
value
~ :

type parameter T

T
)
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~ :
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
,
PatriciaTree[T]
left
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
]
) } ///| fn[T]
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup
(
PatriciaTree[T]
self
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
Int
key
:
Int
Int
) ->

type parameter T

T
? {
match
PatriciaTree[T]
self
{
PatriciaTree[T]
Empty
=>
T?
None
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
~) => if
Int
k
fn Eq::equal(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
==
Int
key
{
(T) -> T?
Some
(
T
value
) } else {
T?
None
}
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
~,
PatriciaTree[T]
right
~) =>
if
Bool
!
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
Bool
(
UInt
key
Bool
=
Int
key
Bool
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
Bool
(),
UInt
prefix
Bool
~,
UInt
mask
Bool
~)
{
T?
None
} else if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
Int
key
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
UInt
mask
~) {
PatriciaTree[T]
left
.
fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup
(
Int
key
)
} else {
PatriciaTree[T]
right
.
fn[T] PatriciaTree::lookup(self : PatriciaTree[T], key : Int) -> T?
lookup
(
Int
key
)
} } } ///| fn
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
key
:
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
) ->
UInt
UInt
{
UInt
key
fn BitAnd::land(self : UInt, other : UInt) -> UInt

Performs a bitwise AND operation between two unsigned 32-bit integers. For each bit position, the result is 1 if the bits at that position in both operands are 1, and 0 otherwise.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns an unsigned 32-bit integer representing the result of the bitwise AND operation.

Example:

let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
&
(
UInt
mask
fn Sub::sub(self : UInt, other : UInt) -> UInt

Performs subtraction between two unsigned 32-bit integers. When the result would be negative, the function wraps around using modular arithmetic (2^32).

Parameters:

  • self : The first unsigned 32-bit integer (minuend).
  • other : The second unsigned 32-bit integer to subtract from the first (subtrahend).

Returns a new unsigned 32-bit integer representing the difference between the two numbers. If the result would be negative, it wraps around to a positive number by adding 2^32 repeatedly until the result is in range.

Example:

let a = 5U
let b = 3U
inspect(a - b, content="2")
let c = 3U
let d = 5U
inspect(c - d, content="4294967294") // wraps around to 2^32 - 2
-
1U)
} ///| fn
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
~ :
UInt
UInt
,
UInt
prefix
~ :
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
) ->
Bool
Bool
{
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
key
,
UInt
mask
~)
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
prefix
} ///| fn
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
k
:
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
) ->
Bool
Bool
{
(
UInt
k
fn BitAnd::land(self : UInt, other : UInt) -> UInt

Performs a bitwise AND operation between two unsigned 32-bit integers. For each bit position, the result is 1 if the bits at that position in both operands are 1, and 0 otherwise.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns an unsigned 32-bit integer representing the result of the bitwise AND operation.

Example:

let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
&
UInt
mask
)
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
0
}

Now the branch method can be further optimized to ensure that Branch nodes do not contain Empty subtrees.

///|
fn[T] 
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
branch
(
UInt
prefix
~ :
UInt
UInt
,
UInt
mask
~ :
UInt
UInt
,
PatriciaTree[T]
left
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
~ :
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
match (
PatriciaTree[T]
left
,
PatriciaTree[T]
right
) {
(
PatriciaTree[T]
Empty
,
PatriciaTree[T]
right
) =>
PatriciaTree[T]
right
(
PatriciaTree[T]
left
,
PatriciaTree[T]
Empty
) =>
PatriciaTree[T]
left
_ =>
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
~,
PatriciaTree[T]
right
~)
} }

Insertion and Merging

Now that the type definitions are established, the next step is to implement the insertion and merging operations. Since an insertion operation can also be viewed as merging a tree with only one leaf node into an existing tree, we will prioritize introducing the implementation of the merge operation.

We first discuss a shortcut: suppose we have two non-empty trees, t0 and t1, whose longest common prefixes are p0 and p1, respectively, and p0 and p1 do not contain each other. In this case, no matter how large t0 and t1 are, the cost of merging them is the same, because only a new Branch node needs to be created. We implement this using the helper function join.

The gen_mask function, which generates a mask, utilizes a property of two's complement representation of integers to find the lowest branching bit.

Assume the binary representation of the input x is

00100100000

Then, x.lnot() gives

11011011111

Adding one gives

11011100000

After a bitwise AND with the original x, we get:

00000100000
///|
fn[T] 
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
UInt
p0
:
UInt
UInt
,
PatriciaTree[T]
t0
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
UInt
p1
:
UInt
UInt
,
PatriciaTree[T]
t1
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
let
UInt
mask
=
fn gen_mask(p0 : UInt, p1 : UInt) -> UInt
gen_mask
(
UInt
p0
,
UInt
p1
)
if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
p0
,
UInt
mask
~) {
PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
p0
,
UInt
mask
~),
UInt
mask
~,
PatriciaTree[T]
left
=
PatriciaTree[T]
t0
,
PatriciaTree[T]
right
=
PatriciaTree[T]
t1
)
} else { PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
fn get_prefix(key : UInt, mask~ : UInt) -> UInt
get_prefix
(
UInt
p0
,
UInt
mask
~),
UInt
mask
~,
PatriciaTree[T]
left
=
PatriciaTree[T]
t1
,
PatriciaTree[T]
right
=
PatriciaTree[T]
t0
)
} } ///| fn
fn gen_mask(p0 : UInt, p1 : UInt) -> UInt
gen_mask
(
UInt
p0
:
UInt
UInt
,
UInt
p1
:
UInt
UInt
) ->
UInt
UInt
{
fn
(UInt) -> UInt
lowest_bit
(
UInt
x
:
UInt
UInt
) ->
UInt
UInt
{
UInt
x
fn BitAnd::land(self : UInt, other : UInt) -> UInt

Performs a bitwise AND operation between two unsigned 32-bit integers. For each bit position, the result is 1 if the bits at that position in both operands are 1, and 0 otherwise.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns an unsigned 32-bit integer representing the result of the bitwise AND operation.

Example:

let a = 0xF0F0U // 1111_0000_1111_0000
let b = 0xFF00U // 1111_1111_0000_0000
inspect(a & b, content="61440") // 1111_0000_0000_0000 = 61440
&
(
UInt
x
.
fn UInt::reinterpret_as_int(self : UInt) -> Int

reinterpret the unsigned int as signed int For number within the range of 0..=2^31-1, the value is the same. For number within the range of 2^31..=2^32-1, the value is negative

reinterpret_as_int
().
fn Neg::neg(self : Int) -> Int

Performs arithmetic negation on an integer value, returning its additive inverse.

Parameters:

  • self : The integer value to negate.

Returns the negation of the input value. For all inputs except Int::min_value(), returns the value with opposite sign. When the input is Int::min_value(), returns Int::min_value() due to two's complement representation.

Example:

inspect(-42, content="-42")
inspect(42, content="42")
inspect(2147483647, content="2147483647") // negating near min value
neg
().
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
())
}
(UInt) -> UInt
lowest_bit
(
UInt
p0
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
UInt
p1
)
}

Everything is ready, and we can now start writing the insert_with function. The handling of Empty and Leaf branches is very straightforward, while for Branch, we call join when the prefixes do not contain each other, and otherwise, we recursively descend into one of the branches based on the branch bit.

///|
fn[T] 
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with
(
PatriciaTree[T]
self
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
Int
k
:
Int
Int
,
T
v
:

type parameter T

T
,
(T, T) -> T
combine
~ : (

type parameter T

T
,

type parameter T

T
) ->

type parameter T

T
,
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
fn
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
tree
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
]) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
match
PatriciaTree[T]
tree
{
PatriciaTree[T]
Empty
=>
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
=
T
v
)
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
PatriciaTree[T]
(
Int
key
PatriciaTree[T]
~,
T
value
PatriciaTree[T]
~) as tree
=>
if
Int
key
fn Eq::equal(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
==
Int
k
{
PatriciaTree::
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~,
T
value
=
(T, T) -> T
combine
(
T
v
,
T
value
))
} else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
=
T
v
),
Int
key
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
PatriciaTree[T]
tree
,
) }
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
PatriciaTree[T]
(
UInt
prefix
PatriciaTree[T]
~,
UInt
mask
PatriciaTree[T]
~,
PatriciaTree[T]
left
PatriciaTree[T]
~,
PatriciaTree[T]
right
PatriciaTree[T]
~) as tree
=>
if
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
=
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
UInt
prefix
~,
UInt
mask
~) {
if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
UInt
mask
~) {
PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
=
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
left
),
PatriciaTree[T]
right
~)
} else { PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
~,
UInt
mask
~,
PatriciaTree[T]
left
~,
PatriciaTree[T]
right
=
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
right
))
} } else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
Int
k
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
(),
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
=
Int
k
,
T
value
=
T
v
),
UInt
prefix
,
PatriciaTree[T]
tree
)
} } }
(PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
self
)
}

Merge operations generally follow the same logic, with the slight difference that they also consider cases where the prefix and mask are identical.

///|
fn[T] 
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
::
fn[T] PatriciaTree::union_with(combine~ : (T, T) -> T, left : PatriciaTree[T], right : PatriciaTree[T]) -> PatriciaTree[T]
union_with
(
(T, T) -> T
combine
~ : (

type parameter T

T
,

type parameter T

T
) ->

type parameter T

T
,
PatriciaTree[T]
left
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
fn
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
left
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
],
PatriciaTree[T]
right
:
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
]) ->
enum PatriciaTree[T] {
  Empty
  Leaf(key~ : Int, value~ : T)
  Branch(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T])
}
PatriciaTree
[

type parameter T

T
] {
match (
PatriciaTree[T]
left
,
PatriciaTree[T]
right
) {
(
PatriciaTree[T]
Empty
,
PatriciaTree[T]
t
) | (
PatriciaTree[T]
t
,
PatriciaTree[T]
Empty
) =>
PatriciaTree[T]
t
(
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~,
T
value
~),
PatriciaTree[T]
t
) =>
PatriciaTree[T]
t
.
fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with
(
Int
key
,
T
value
,
(T, T) -> T
combine
~)
(
PatriciaTree[T]
t
,
(key~ : Int, value~ : T) -> PatriciaTree[T]
Leaf
(
Int
key
~,
T
value
~)) =>
PatriciaTree[T]
t
.
fn[T] PatriciaTree::insert_with(self : PatriciaTree[T], k : Int, v : T, combine~ : (T, T) -> T) -> PatriciaTree[T]
insert_with
(
Int
key
,
T
value
,
(T, T) -> T
combine
=fn(
T
x
,
T
y
) {
(T, T) -> T
combine
(
T
y
,
T
x
) })
(
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
PatriciaTree[T]
(
UInt
prefix
PatriciaTree[T]
=
UInt
p
PatriciaTree[T]
,
UInt
mask
PatriciaTree[T]
=
UInt
m
PatriciaTree[T]
,
PatriciaTree[T]
left
PatriciaTree[T]
=
PatriciaTree[T]
s0
PatriciaTree[T]
,
PatriciaTree[T]
right
PatriciaTree[T]
=
PatriciaTree[T]
s1
PatriciaTree[T]
) as s
,
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
PatriciaTree[T]
(
UInt
prefix
PatriciaTree[T]
=
UInt
q
PatriciaTree[T]
,
UInt
mask
PatriciaTree[T]
=
UInt
n
PatriciaTree[T]
,
PatriciaTree[T]
left
PatriciaTree[T]
=
PatriciaTree[T]
t0
PatriciaTree[T]
,
PatriciaTree[T]
right
PatriciaTree[T]
=
PatriciaTree[T]
t1
PatriciaTree[T]
) as t
,
) => if
UInt
m
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
n
(Bool, Bool) -> Bool
&&
UInt
p
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
q
{
// The trees have the same prefix. Merge the subtrees PatriciaTree::
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
,
PatriciaTree[T]
left
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s0
,
PatriciaTree[T]
t0
),
PatriciaTree[T]
right
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s1
,
PatriciaTree[T]
t1
),
) } else if
UInt
m
fn Compare::op_lt(x : UInt, y : UInt) -> Bool
<
UInt
n
(Bool, Bool) -> Bool
&&
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
=
UInt
q
,
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
) {
// q contains p. Merge t with a subtree of s if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
q
,
UInt
mask
=
UInt
m
) {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
,
PatriciaTree[T]
left
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s0
,
PatriciaTree[T]
t
),
PatriciaTree[T]
right
=
PatriciaTree[T]
s1
)
} else {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
p
,
UInt
mask
=
UInt
m
,
PatriciaTree[T]
left
=
PatriciaTree[T]
s0
,
PatriciaTree[T]
right
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s1
,
PatriciaTree[T]
t
))
} } else if
UInt
m
fn Compare::op_gt(x : UInt, y : UInt) -> Bool
>
UInt
n
(Bool, Bool) -> Bool
&&
fn match_prefix(key~ : UInt, prefix~ : UInt, mask~ : UInt) -> Bool
match_prefix
(
UInt
key
=
UInt
p
,
UInt
prefix
=
UInt
q
,
UInt
mask
=
UInt
n
) {
// p contains q. Merge s with a subtree of t. if
fn zero(k : UInt, mask~ : UInt) -> Bool
zero
(
UInt
p
,
UInt
mask
=
UInt
n
) {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
q
,
UInt
mask
=
UInt
n
,
PatriciaTree[T]
left
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s
,
PatriciaTree[T]
t0
),
PatriciaTree[T]
right
=
PatriciaTree[T]
t1
)
} else {
(prefix~ : UInt, mask~ : UInt, left~ : PatriciaTree[T], right~ : PatriciaTree[T]) -> PatriciaTree[T]
Branch
(
UInt
prefix
=
UInt
q
,
UInt
mask
=
UInt
n
,
PatriciaTree[T]
left
=
PatriciaTree[T]
t0
,
PatriciaTree[T]
right
=
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
s
,
PatriciaTree[T]
t1
))
} } else {
fn[T] join(p0 : UInt, t0 : PatriciaTree[T], p1 : UInt, t1 : PatriciaTree[T]) -> PatriciaTree[T]
join
(
UInt
p
,
PatriciaTree[T]
s
,
UInt
q
,
PatriciaTree[T]
t
)
} } }
(PatriciaTree[T], PatriciaTree[T]) -> PatriciaTree[T]
go
(
PatriciaTree[T]
left
,
PatriciaTree[T]
right
)
}

Big-endian Patricia Tree

The Big-endian Patricia Tree changes the order of calculating branching bits from the most significant bit to the least significant bit, building upon the Little-endian Patricia Tree.

What are the benefits of doing this?

  • Better locality. In a Big-endian Patricia Tree, integer keys of similar size are placed close to each other.

  • Facilitates efficient sequential traversal of keys, simply by implementing a standard pre-order/post-order traversal.

  • Merging is often faster. In practice, integer keys in an intmap are usually contiguous. In this case, a Big-endian Patricia Tree will have longer common prefixes, making merge operations faster.

  • In a Big-endian Patricia Tree, if keys are treated as unsigned integers, every key in the right subtree is greater than the key of the current node (conversely, the left subtree contains smaller keys). When writing a lookup function, you only need to use unsigned integer comparison to determine which branch to follow next. On most machines, this can be done with a single instruction, which is low-cost.

Since the final version of the IntMap implementation is not significantly different from the Little Endian Patricia Tree described earlier, it will not be elaborated on here. Readers who are interested can refer to the implementation in this repository: https://github.com/moonbit-community/intmap

A Bug in the Original Implementation

Although the idea behind IntMap's implementation is quite concise and clear, it is still possible to make some very subtle mistakes when writing the specific implementation code. Even the original paper's author was not immune when writing the SML implementation of IntMap, and this issue was later inherited by OCaml's Ptset/Ptmap modules. It wasn't until the paper QuickChecking Patricia Trees, published in 2018, that this problem was discovered.

Specifically, because SML and OCaml languages do not provide unsigned integer types, the masks in the IntMap type were stored as int in the implementations of these two languages. However, when comparing masks in the union_with function, they all forgot that unsigned integer comparison should be used.

Implementing the Shunting Yard Algorithm in MoonBit

· 12 min read

What is the Shunting Yard Algorithm?

In the implementation of programming languages or interpreters, how to handle mathematical expressions has always been a classic problem. We want to be able to understand "infix expressions" (like 3 + 4 * 2) just like humans do, and correctly consider operator precedence and parentheses.

In 1961, Edsger Dijkstra proposed the famous Shunting Yard algorithm, which provides a mechanical way to convert infix expressions to postfix expressions (RPN) or abstract syntax trees (AST). The algorithm's name comes from railway marshalling yards: train cars are sorted by shunting between tracks, and in expression processing, we use two stacks to store and manage operands and operators. Imagine the process of calculating 3 + 4 * 2 in your head:

  1. You know that multiplication has higher precedence, so you need to calculate 4 * 2 first.
  2. During this process, you temporarily "remember" the preceding 3 and +.
  3. Once the multiplication result is available, you add it to 3.

Dijkstra's insight is that this human thought process of "temporarily remembering something and coming back to process it" can actually be simulated using stacks. Just like railway marshalling yards temporarily park train cars on sidings and then shunt them as needed, the algorithm controls the order of operations by moving numbers and operators between different stacks. The name "Shunting Yard" comes from this railway analogy:

  • Train cars are sorted by moving between tracks;
  • Operators and numbers in mathematical expressions can also be correctly sorted and calculated by moving between stacks.

Dijkstra abstracted our scattered, chaotic human calculation process into a clear, mechanical workflow, allowing computers to process expressions using the same logic.

Basic Flow of the Shunting Yard Algorithm

The Shunting Yard algorithm ensures that expressions are parsed with correct precedence and associativity by maintaining two stacks:

  1. Initialization

    Create two empty stacks:

    • Operator stack (op_stack), used to temporarily store unprocessed operators and parentheses;
    • Value stack (val_stack), used to store operands and partially constructed sub-expressions.
  2. Scan input tokens one by one

    • If token is a number or variable: Push directly into val_stack.

    • If token is an operator:

      1. Check the top element of op_stack.
      2. If and only if the precedence of the top operator is higher than the current operator, or they have equal precedence and the top operator is left-associative, pop the top operator, combine it with two operands from val_stack to form a new sub-expression, and push it back into val_stack.
      3. Repeat this process until the condition is no longer met, then push the current operator into op_stack.
    • If token is a left parenthesis: Push into op_stack as a delimiter marker.

    • If token is a right parenthesis: Continuously pop operators from op_stack and combine them with operands from the top of val_stack to form sub-expressions, until a left parenthesis is encountered; the left parenthesis itself is discarded and does not enter val_stack.

  3. Clear the operator stack

    After all tokens have been scanned, if there are still operators in op_stack, pop them one by one and combine them with operands from val_stack to form larger expressions, until the operator stack is empty.

  4. End condition

    Finally, val_stack should contain only one element, which is the complete abstract syntax tree or postfix expression. If the number of elements in the stack is not one, or there are unmatched parentheses, it indicates that the input expression contains errors.

Example Walkthrough

Let's use the parsing of (1 + 2) * (3 - 4) ^ 2 as an example to demonstrate how the two stacks change during the token reading process, helping us better understand the Shunting Yard algorithm:

StepToken ReadOperator Stack (op_stack)Value Stack (val_stack)Description
1([(][]Left parenthesis pushed into operator stack
21[(][1]Number pushed into value stack
3+[(, +][1]Operator pushed into operator stack
42[(, +][1, 2]Number pushed into value stack
5)[][1 + 2]Pop until left parenthesis: 1 and 2 combined into 1+2
6*[*][1 + 2]Operator pushed into operator stack
7([*, (][1 + 2]Left parenthesis pushed into operator stack
83[*, (][1 + 2, 3]Number pushed into value stack
9-[*, (, -][1 + 2, 3]Operator pushed into operator stack
104[*, (, -][1 + 2, 3, 4]Number pushed into value stack
11)[*][1 + 2, 3 - 4]Pop until left parenthesis: 3 and 4 combined into 3-4
12^[*, ^][1 + 2, 3 - 4]Power operator pushed into stack (right-associative, won't trigger pop)
132[*, ^][1 + 2, 3 - 4, 2]Number pushed into value stack
14End of input[][(1 + 2) * (3 - 4) ^ 2]Clear operator stack: first pop ^, combine 3-4 with 2; then pop *, combine 1+2 with result

In this example, there are several noteworthy points:

  • Parentheses processed first In the first group of parentheses (1 + 2), the operator + is delayed in the operator stack until a right parenthesis is encountered, then combined with 1 and 2. The second group of parentheses (3 - 4) is processed in exactly the same way.

  • Precedence manifestation When * is encountered, it's pushed into the operator stack. But when the power operator ^ is encountered later, since ^ has higher precedence than * and is right-associative, it's pushed directly without triggering the pop of *.

  • Role of associativity The power operator ^ is typically defined as right-associative, meaning the expression a ^ b ^ c will be parsed as a ^ (b ^ c). In this example, (3-4) ^ 2 maintains this associativity, correctly constructing the sub-expression.

  • Final result After input ends, the operator stack is cleared sequentially, ultimately forming the complete expression:

(1 + 2) * ((3 - 4) ^ 2)

Implementing the Shunting Yard Algorithm in MoonBit

First, we need to define the types for expressions and tokens:

enum Expr {
  
(Int) -> Expr
Literal
(
Int
Int
)
(String, Expr, Expr) -> Expr
BinExpr
(
String
String
,
enum Expr {
  Literal(Int)
  BinExpr(String, Expr, Expr)
} derive(Show)
Expr
,
enum Expr {
  Literal(Int)
  BinExpr(String, Expr, Expr)
} derive(Show)
Expr
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
enum Token {
(Int) -> Token
Literal
(
Int
Int
)
(String) -> Token
Op
(
String
String
)
Token
LeftParen
Token
RightParen
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)

We can leverage MoonBit's regular expression matching syntax to quickly implement a simple tokenizer:

pub fn 
fn tokenize(input : StringView) -> Array[Token] raise
tokenize
(
StringView
input
:
type StringView
StringView
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Token {
  Literal(Int)
  Op(String)
  LeftParen
  RightParen
} derive(Show)
Token
] raise {
let
Array[Unit]
tokens
= []
for
StringView
str
=
StringView
input
{
lexmatch
StringView
str
{
"[0-9]+" as n, rest => { tokens.push(Token::Literal(@strconv.parse_int(n))) continue rest }
Unit
"[\-+*/^]" as o
, rest => {
tokens.push(Token::Op(o.to_string())) continue
StringView
rest
} "\(", rest => { tokens.push(Token::LeftParen) continue
Unit
rest
} "\)", rest => { tokens.push(Token::RightParen) continue rest } "[ \n\r\t]+", rest => continue rest "$", _ => break _ => fail("Invalid input") } } tokens }

The tokenize function splits the input string into a series of tokens:

  • Matches numbers [0-9]+ and converts them to Token::Literal;
  • Matches arithmetic and power operators [-+*/^] and converts them to Token::Op;
  • Matches parentheses ( and ) and converts them to LeftParen and RightParen respectively;
  • Skips whitespace characters like spaces and newlines;
  • Reports an error if encountering characters that don't match the rules. Through lexmatch and regular expressions, the entire tokenization process is both concise and efficient.

Next, we define a global operator table to store operator precedence and associativity:

priv enum Associativity {
  
Associativity
Left
Associativity
Right
} priv struct OpInfo {
Int
precedence
:
Int
Int
Associativity
associativity
:
enum Associativity {
  Left
  Right
}
Associativity
} let
Map[String, OpInfo]
op_table
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[
String
String
,
struct OpInfo {
  precedence: Int
  associativity: Associativity
}
OpInfo
] = {
"+": {
Int
precedence
: 10,
Associativity
associativity
:
Associativity
Left
},
"-": {
Int
precedence
: 10,
Associativity
associativity
:
Associativity
Left
},
"*": {
Int
precedence
: 20,
Associativity
associativity
:
Associativity
Left
},
"/": {
Int
precedence
: 20,
Associativity
associativity
:
Associativity
Left
},
"^": {
Int
precedence
: 30,
Associativity
associativity
:
Associativity
Right
},
}

Here, we define the precedence and associativity of common operators through op_table:

  • + and - have the lowest precedence (10) and are left-associative;
  • * and / have higher precedence (20) and are also left-associative;

  • ^ (power operation) has the highest precedence (30) but is right-associative.

Next, we define a helper function to determine whether we need to process (pop) the top operator when encountering a new operator:

fn 
fn should_pop(top_op_info~ : OpInfo, incoming_op_info~ : OpInfo) -> Bool
should_pop
(
OpInfo
top_op_info
~ :
struct OpInfo {
  precedence: Int
  associativity: Associativity
}
OpInfo
,
OpInfo
incoming_op_info
~ :
struct OpInfo {
  precedence: Int
  associativity: Associativity
}
OpInfo
) ->
Bool
Bool
{
OpInfo
top_op_info
.
Int
precedence
fn Compare::op_gt(x : Int, y : Int) -> Bool
>
OpInfo
incoming_op_info
.
Int
precedence
(Bool, Bool) -> Bool
||
(
OpInfo
top_op_info
.
Int
precedence
fn Eq::equal(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
==
OpInfo
incoming_op_info
.
Int
precedence
(Bool, Bool) -> Bool
&&
OpInfo
top_op_info
.
Associativity
associativity
is
Associativity
Left
) }

The logic of should_pop is one of the cores of the Shunting Yard algorithm:

  • If the precedence of the top operator is higher than the new operator, we should process the top operator first;
  • If they have equal precedence and the top operator is left-associative, we should also process the top operator first;
  • Otherwise, keep the top operator and push the new operator directly into the stack.

Next, we implement the expression parsing function:

pub fn 
fn parse_expr(tokens : Array[Token]) -> Expr
parse_expr
(
Array[Token]
tokens
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Token {
  Literal(Int)
  Op(String)
  LeftParen
  RightParen
} derive(Show)
Token
]) ->
enum Expr {
  Literal(Int)
  BinExpr(String, Expr, Expr)
} derive(Show)
Expr
{
let
Array[String]
op_stack
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
String
String
] = []
let
Array[Expr]
val_stack
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Expr {
  Literal(Int)
  BinExpr(String, Expr, Expr)
} derive(Show)
Expr
] = []
fn
(String) -> Unit
push_binary_expr
(
String
top_op
) {
let
Expr
right
=
Array[Expr]
val_stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
().
fn[X] Option::unwrap(self : X?) -> X

Extract the value in Some.

If the value is None, it throws a panic.

unwrap
()
let
Expr
left
=
Array[Expr]
val_stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
().
fn[X] Option::unwrap(self : X?) -> X

Extract the value in Some.

If the value is None, it throws a panic.

unwrap
()
Array[Expr]
val_stack
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(Expr::
(String, Expr, Expr) -> Expr
BinExpr
(
String
top_op
,
Expr
left
,
Expr
right
))
} for
Token
token
in
Array[Token]
tokens
{
match
Token
token
{
(Int) -> Token
Literal
(
Int
n
) =>
Array[Expr]
val_stack
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(Expr::
(Int) -> Expr
Literal
(
Int
n
))
(String) -> Token
Op
(
String
incoming_op
) => {
let
OpInfo
incoming_op_info
=
let op_table : Map[String, OpInfo]
op_table
fn[K : Hash + Eq, V] Map::op_get(self : Map[K, V], key : K) -> V
[
incoming_op]
while true { match
Array[String]
op_stack
.
fn[A] Array::last(self : Array[A]) -> A?

Returns the last element of the array, or None if the array is empty.

Parameters:

  • array : The array to get the last element from.

Returns an optional value containing the last element of the array. The result is None if the array is empty, or Some(x) where x is the last element of the array.

Example:

let arr = [1, 2, 3]
inspect(arr.last(), content="Some(3)")
let empty : Array[Int] = []
inspect(empty.last(), content="None")
last
() {
String?
None
=> break
(String) -> String?
Some
(
String
top_op
) =>
if
String
top_op
fn[T : Eq] @moonbitlang/core/builtin.op_notequal(x : T, y : T) -> Bool
!=
"("
(Bool, Bool) -> Bool
&&
fn should_pop(top_op_info~ : OpInfo, incoming_op_info~ : OpInfo) -> Bool
should_pop
(
OpInfo
top_op_info
=
let op_table : Map[String, OpInfo]
op_table
fn[K : Hash + Eq, V] Map::op_get(self : Map[K, V], key : K) -> V
[
top_op],
OpInfo
incoming_op_info
~) {
Array[String]
op_stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() |>
fn[T] ignore(t : T) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

let x = 42
ignore(x) // Explicitly ignore the value
let mut sum = 0
ignore([1, 2, 3].iter().each(x => sum = sum + x)) // Ignore the Unit return value of each()
ignore
(String) -> Unit
push_binary_expr
(
String
top_op
)
} else { break } } }
Array[String]
op_stack
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
String
incoming_op
)
}
Token
LeftParen
=>
Array[String]
op_stack
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
("(")
Token
RightParen
=>
while
Array[String]
op_stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() is
(String) -> String?
Some
(
String
top_op
) {
if
String
top_op
fn[T : Eq] @moonbitlang/core/builtin.op_notequal(x : T, y : T) -> Bool
!=
"(" {
(String) -> Unit
push_binary_expr
(
String
top_op
)
} else { break } } } } while
Array[String]
op_stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() is
(String) -> String?
Some
(
String
top_op
) {
(String) -> Unit
push_binary_expr
(
String
top_op
)
}
Array[Expr]
val_stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
().
fn[X] Option::unwrap(self : X?) -> X

Extract the value in Some.

If the value is None, it throws a panic.

unwrap
()
}

parse_expr is the core implementation of the entire Shunting Yard algorithm:

  1. Data structure preparation

    • op_stack stores operators and parentheses;
    • val_stack stores operands or partially constructed sub-expressions;
    • The internal function push_binary_expr encapsulates a small step: pop two operands from the value stack, combine them with an operator, generate a new BinExpr node, and push it back into the value stack.
  2. Iterate through tokens

    • Numbers: Push directly into val_stack.
    • Operators: Continuously check the top operator in op_stack, if it has higher precedence or needs to be calculated first, pop it and construct a sub-expression; when the condition is no longer met, push the new operator into the stack.
    • Left parenthesis: Push into op_stack to separate sub-expressions.
    • Right parenthesis: Continuously pop operators and combine them with operands from the value stack to form sub-expressions, until a matching left parenthesis is encountered.
  3. Clear the operator stack

    After iteration is complete, there may still be operators remaining in op_stack, which need to be popped one by one and combined with operands from the value stack until the operator stack is empty.

  4. Return result

    Finally, the value stack should contain only one element, which is the complete abstract syntax tree. If this is not the case, it indicates that the input expression contains syntax errors.

Finally, we can define a simple eval function for testing:

pub fn 
fn eval(expr : Expr) -> Int
eval
(
Expr
expr
:
enum Expr {
  Literal(Int)
  BinExpr(String, Expr, Expr)
} derive(Show)
Expr
) ->
Int
Int
{
match
Expr
expr
{
(Int) -> Expr
Literal
(
Int
n
) =>
Int
n
(String, Expr, Expr) -> Expr
BinExpr
(
String
op
,
Expr
left
,
Expr
right
) =>
match
String
op
{
"+" =>
fn eval(expr : Expr) -> Int
eval
(
Expr
left
)
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
fn eval(expr : Expr) -> Int
eval
(
Expr
right
)
"-" =>
fn eval(expr : Expr) -> Int
eval
(
Expr
left
)
fn Sub::sub(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

let a = 42
let b = 10
inspect(a - b, content="32")
let max = 2147483647 // Int maximum value
inspect(max - -1, content="-2147483648") // Overflow case
-
fn eval(expr : Expr) -> Int
eval
(
Expr
right
)
"*" =>
fn eval(expr : Expr) -> Int
eval
(
Expr
left
)
fn Mul::mul(self : Int, other : Int) -> Int

Multiplies two 32-bit integers. This is the implementation of the * operator for Int.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns the product of the two integers. If the result overflows the range of Int, it wraps around according to two's complement arithmetic.

Example:

inspect(42 * 2, content="84")
inspect(-10 * 3, content="-30")
let max = 2147483647 // Int.max_value
inspect(max * 2, content="-2") // Overflow wraps around
*
fn eval(expr : Expr) -> Int
eval
(
Expr
right
)
"/" =>
fn eval(expr : Expr) -> Int
eval
(
Expr
left
)
fn Div::div(self : Int, other : Int) -> Int

Performs integer division between two 32-bit integers. The result is truncated towards zero (rounds down for positive numbers and up for negative numbers).

Parameters:

  • dividend : The first integer operand to be divided.
  • divisor : The second integer operand that divides the dividend.

Returns the quotient of the division operation.

Throws a panic if divisor is zero.

Example:

inspect(10 / 3, content="3") // truncates towards zero
inspect(-10 / 3, content="-3")
inspect(10 / -3, content="-3")
/
fn eval(expr : Expr) -> Int
eval
(
Expr
right
)
"^" => { fn
(Int, Int) -> Int
pow
(
Int
base
:
Int
Int
,
Int
exp
:
Int
Int
) ->
Int
Int
{
if
Int
exp
fn Eq::equal(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
==
0 {
1 } else {
Int
base
fn Mul::mul(self : Int, other : Int) -> Int

Multiplies two 32-bit integers. This is the implementation of the * operator for Int.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns the product of the two integers. If the result overflows the range of Int, it wraps around according to two's complement arithmetic.

Example:

inspect(42 * 2, content="84")
inspect(-10 * 3, content="-30")
let max = 2147483647 // Int.max_value
inspect(max * 2, content="-2") // Overflow wraps around
*
(Int, Int) -> Int
pow
(
Int
base
,
Int
exp
fn Sub::sub(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

let a = 42
let b = 10
inspect(a - b, content="32")
let max = 2147483647 // Int maximum value
inspect(max - -1, content="-2147483648") // Overflow case
-
1)
} }
(Int, Int) -> Int
pow
(
fn eval(expr : Expr) -> Int
eval
(
Expr
left
),
fn eval(expr : Expr) -> Int
eval
(
Expr
right
))
} _ =>
fn[T] abort(string : String, loc~ : SourceLoc = _) -> T
abort
("Invalid operator")
} } } ///| pub fn
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
(
String
input
:
String
String
) ->
Int
Int
raise {
fn eval(expr : Expr) -> Int
eval
(
fn parse_expr(tokens : Array[Token]) -> Expr
parse_expr
(
fn tokenize(input : StringView) -> Array[Token] raise
tokenize
(
String
input
)))
}

And verify our implementation with some simple test cases:

test "parse_and_eval" {
  
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("1 + 2 * 3"),
String
content
="7")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("2 ^ 3 ^ 2"),
String
content
="512")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("(2 ^ 3) ^ 2"),
String
content
="64")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("(1 + 2) * 3"),
String
content
="9")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("10 - (3 + 2)"),
String
content
="5")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("2 * (3 + 4)"),
String
content
="14")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("(5 + 3) / 2"),
String
content
="4")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("10 / 2 - 1"),
String
content
="4")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("1 + 2 + 3"),
String
content
="6")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("10 - 5 - 2"),
String
content
="3")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("5"),
String
content
="5")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("(1 + 2) * (3 + 4)"),
String
content
="21")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("2 ^ (1 + 2)"),
String
content
="8")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("1 + 2 * 3 - 4 / 2 + 5"),
String
content
="10")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("((1 + 2) * 3) ^ 2 - 10"),
String
content
="71")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("100 / (2 * 5) + 3 * (4 - 1)"),
String
content
="19")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("2 ^ 2 * 3 + 1"),
String
content
="13")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn parse_and_eval(input : String) -> Int raise
parse_and_eval
("1 + 2 * 3 ^ 2 - 4 / 2"),
String
content
="17")
}

Summary

The core idea of the Shunting Yard algorithm lies in using two stacks to explicitly manage the computation process:

  • Value stack (val_stack) is used to store numbers and partially combined sub-expressions;
  • Operator stack (op_stack) is used to store unprocessed operators and parentheses.

By defining operator precedence and associativity, and continuously comparing and popping top operators during token scanning, the Shunting Yard algorithm ensures that expressions are combined into abstract syntax trees (AST) in the correct order. Finally, when all tokens have been read and the operator stack is cleared, what remains in the value stack is the complete expression tree.

This method intuitively simulates our manual calculation approach: first "remember" content that cannot be calculated immediately, then retrieve and process it when conditions are appropriate. Its process is clear and implementation is concise, making it very suitable as a starting point for learning expression parsing.

Previously, MoonBit Pearl published an article introducing Pratt parsing. Both are classic methods for solving "how to correctly parse expression precedence and associativity," but their approaches are completely different. Shunting Yard uses loops and explicit data structures, managing unprocessed symbols and partial sub-expressions through operator and value stacks. The entire process is like manually manipulating two stacks, with clear logic that's easy to track. Pratt Parser, on the other hand, is based on recursive descent, where each token defines parsing methods in different contexts, and parsing progress depends on the language runtime's call stack: each recursive call is equivalent to pushing unfinished state onto the stack, then continuing to combine when returning. In other words, Pratt Parser hides the existence of the "stack" within recursive calls, while Shunting Yard makes this state management explicit, directly simulating it with loops and data structures. Therefore, it can be considered that Shunting Yard is a transcription of the mechanisms implicit in Pratt Parser's call stack into explicit stack operations. The former is mechanical in steps, suitable for quickly implementing fixed operator parsing; the latter is more flexible, especially more natural when handling prefix, postfix, or custom operators.

Building Secure WebAssembly Tools with MoonBit and Wassette

· 8 min read

Welcome to the world of MoonBit and Wassette! This tutorial will guide you step-by-step in building a secure tool based on the WebAssembly Component Model. Through a practical weather query application example, you will learn how to leverage MoonBit's efficiency and Wassette's security features to create powerful AI tools.

Introduction to Wassette and MCP

MCP (Model Completion Protocol) is a protocol for AI models to interact with external tools. When an AI needs to perform a specific task (such as network access or data query), it calls the corresponding tool through MCP. This mechanism extends the capabilities of AI but also brings security challenges.

Wassette is a runtime developed by Microsoft based on the WebAssembly Component Model, providing a secure environment for AI systems to execute external tools. It solves potential security risks through sandbox isolation and precise permission control.

Wassette allows tools to run in an isolated environment, with permissions strictly limited by a policy file and interfaces clearly defined by WIT (WebAssembly Interface Type). WIT interfaces are also used to generate data formats for tool interaction.

Overall Process

Before we start, let's understand the overall process:

Let's start this journey!

Step 1: Install Necessary Tools

First, we need to install four tools (we assume the MoonBit toolchain is already installed):

  • wasm-tools: A WebAssembly toolset for processing and manipulating Wasm files
  • wit-deps: A WebAssembly Interface Type dependency manager
  • wit-bindgen: A WebAssembly Interface Type binding generator for generating language bindings
  • wassette: A runtime based on the Wasm Component Model for executing our tools

Among them, wasm-tools, wit-deps, and wit-bindgen can be installed via cargo (requires Rust to be installed):

cargo install wasm-tools
cargo install wit-deps-cli
cargo install wit-bindgen-cli

Or download from GitHub Releases:

Step 2: Define the Interface

Interface definition is the core of the entire workflow. We use the WebAssembly Interface Type (WIT) format to define the component's interface.

First, create the project directory and necessary subdirectories:

mkdir -p weather-app/wit
cd weather-app

Create deps.toml

Create a deps.toml file in the wit directory to define project dependencies:

cli = "https://github.com/WebAssembly/wasi-cli/archive/refs/tags/v0.2.7.tar.gz"
http = "https://github.com/WebAssembly/wasi-http/archive/refs/tags/v0.2.7.tar.gz"

These dependencies specify the WASI (WebAssembly System Interface) components we will use:

  • cli: Provides command-line interface functionality. Not used in this example.
  • http: Provides HTTP client and server functionality. The client functionality is used in this example.

Then, run wit-deps update. This command will fetch the dependencies and expand them in the wit/deps/ directory.

Create world.wit

Next, create a world.wit file to define our component interface. WIT is a declarative interface description language designed for the WebAssembly Component Model. It allows us to define how components interact with each other without worrying about specific implementation details. For more details, you can check the Component Model manual.

package peter-jerry-ye:weather@0.1.0;

world w {
  import wasi:http/outgoing-handler@0.2.7;
  export get-weather: func(city: string) -> result<string, string>;
}

This WIT file defines:

  • A package named peter-jerry-ye:weather with version 0.1.0
  • A world named w, which is the main interface of the component
  • Imports the outgoing request interface of WASI HTTP
  • Exports a function named get-weather that takes a city name string and returns a result (a weather information string on success, or an error message string on failure)

Step 3: Generate Code

Now that we have defined the interface, the next step is to generate the corresponding code skeleton. We use the wit-bindgen tool to generate binding code for MoonBit:

# Make sure you are in the project root directory
wit-bindgen moonbit --derive-eq --derive-show --derive-error wit

This command will read the files in the wit directory and generate the corresponding MoonBit code. The generated files will be placed in the gen directory.

Note: The current version of the generated code may contain some warnings, which will be fixed in future updates.

The generated directory structure should look like this:

.
├── ffi/
├── gen/
│   ├── ffi.mbt
│   ├── moon.pkg.json
│   ├── world
│   │   └── w
│   │       ├── moon.pkg.json
│   │       └── stub.mbt
│   └── world_w_export.mbt
├── interface/
├── moon.mod.json
├── Tutorial.md
├── wit/
└── world/

These generated files include:

  • Basic FFI (Foreign Function Interface) code (ffi/)
  • Generated import functions (world/, interface/)
  • Wrappers for exported functions (gen/)
  • The stub.mbt file to be implemented

Step 4: Modify the Generated Code

Now we need to modify the generated stub file to implement our weather query functionality. The main files to edit are gen/world/w/stub.mbt and moon.pkg.json in the same directory. Before that, let's add dependencies to facilitate implementation:

moon update
moon add moonbitlang/x
{
  "import": [
    "peter-jerry-ye/weather/interface/wasi/http/types",
    "peter-jerry-ye/weather/interface/wasi/http/outgoingHandler",
    "peter-jerry-ye/weather/interface/wasi/io/poll",
    "peter-jerry-ye/weather/interface/wasi/io/streams",
    "peter-jerry-ye/weather/interface/wasi/io/error",
    "moonbitlang/x/encoding"
  ]
}

Let's look at the generated stub code:

// Generated by `wit-bindgen` 0.44.0.

///|
pub fn 
fn get_weather(city : String) -> Result[String, String]
get_weather
(
String
city
:
String
String
) ->
enum Result[A, B] {
  Err(B)
  Ok(A)
}
Result
[
String
String
,
String
String
] {
... // This is the part we need to implement }

Now, we need to add the implementation code to request weather information using an HTTP client. Edit the gen/world/w/stub.mbt file as follows:

///|
pub fn 
fn get_weather(city : String) -> Result[String, String]
get_weather
(
String
city
:
String
String
) ->
enum Result[A, B] {
  Err(B)
  Ok(A)
}
Result
[
String
String
,
String
String
] {
(try?
fn get_weather_(city : String) -> String raise

Use MoonBit's error handling mechanism to simplify implementation

get_weather_
(
String
city
)).
fn[T, E, F] Result::map_err(self : Result[T, E], f : (E) -> F) -> Result[T, F]

Maps the value of a Result if it is Err into another, otherwise returns the Ok value unchanged.

Example

  let x: Result[Int, String] = Err("error")
  let y = x.map_err((v : String) => { v + "!" })
  assert_eq(y, Err("error!"))
map_err
(_.
fn[Self : Show] Show::to_string(self : Self) -> String

Default implementation for Show::to_string, uses a StringBuilder

to_string
())
} ///| Use MoonBit's error handling mechanism to simplify implementation fn
fn get_weather_(city : String) -> String raise

Use MoonBit's error handling mechanism to simplify implementation

get_weather_
(
String
city
:
String
String
) ->
String
String
raise {
let
Unit
request
=
(Unit) -> Unit
@types.OutgoingRequest::
(Unit) -> Unit
outgoing_request
(
() -> Unit
@types.Fields::
() -> Unit
fields
(),
) if
Unit
request
.
(Unit) -> Unit
set_authority
(
Unit
Some
("wttr.in")) is
(_/0) -> Unit
Err
(_) {
fn[T] fail(msg : String, loc~ : SourceLoc = _) -> T raise Failure

Raises a Failure error with a given message and source location.

Parameters:

  • message : A string containing the error message to be included in the failure.
  • location : The source code location where the failure occurred. Automatically provided by the compiler when not specified.

Returns a value of type T wrapped in a Failure error type.

Throws an error of type Failure with a message that includes both the source location and the provided error message.

fail
("Invalid Authority")
} if
Unit
request
.
(Unit) -> Unit
set_path_with_query
(
Unit
Some
("/\{
String
city
}?format=3")) is
(_/0) -> Unit
Err
(_) {
fn[T] fail(msg : String, loc~ : SourceLoc = _) -> T raise Failure

Raises a Failure error with a given message and source location.

Parameters:

  • message : A string containing the error message to be included in the failure.
  • location : The source code location where the failure occurred. Automatically provided by the compiler when not specified.

Returns a value of type T wrapped in a Failure error type.

Throws an error of type Failure with a message that includes both the source location and the provided error message.

fail
("Invalid path with query")
} if
Unit
request
.
(Unit) -> Unit
set_method
(
Unit
Get
) is
(_/0) -> Unit
Err
(_) {
fn[T] fail(msg : String, loc~ : SourceLoc = _) -> T raise Failure

Raises a Failure error with a given message and source location.

Parameters:

  • message : A string containing the error message to be included in the failure.
  • location : The source code location where the failure occurred. Automatically provided by the compiler when not specified.

Returns a value of type T wrapped in a Failure error type.

Throws an error of type Failure with a message that includes both the source location and the provided error message.

fail
("Invalid Method")
} let
Unit
future_response
=
(Unit, Unit) -> Unit
@outgoingHandler.handle
(
Unit
request
,
Unit
None
).
() -> Unit
unwrap_or_error
()
defer
Unit
future_response
.
() -> Unit
drop
()
let
Unit
pollable
=
Unit
future_response
.
() -> Unit
subscribe
()
defer
Unit
pollable
.
() -> Unit
drop
()
Unit
pollable
.
() -> Unit
block
()
let
Unit
response
=
Unit
future_response
.
() -> Unit
get
().
() -> Unit
unwrap
().
() -> Unit
unwrap
().
() -> Unit
unwrap_or_error
()
defer
Unit
response
.
() -> Unit
drop
()
let
Unit
body
=
Unit
response
.
() -> Unit
consume
().
() -> Unit
unwrap
()
defer
Unit
body
.
() -> Unit
drop
()
let
Unit
stream
=
Unit
body
.
() -> Unit
stream
().
() -> Unit
unwrap
()
defer
Unit
stream
.
() -> Unit
drop
()
let
Unit
decoder
=
(Unit) -> Unit
@encoding.decoder
(
Unit
UTF8
)
let
StringBuilder
builder
=
type StringBuilder
StringBuilder
::
fn StringBuilder::new(size_hint? : Int) -> StringBuilder

Creates a new string builder with an optional initial capacity hint.

Parameters:

  • size_hint : An optional initial capacity hint for the internal buffer. If less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes, not the size of characters. size_hint may be ignored on some platforms, JS for example.

Returns a new StringBuilder instance with the specified initial capacity.

new
()
loop
Unit
stream
.
(Int) -> Unit
blocking_read
(1024) {
(Unit) -> Unit
Ok
(
Unit
bytes
) => {
Unit
decoder
.
(Unit, StringBuilder, Bool) -> Unit
decode_to
(
Unit
bytes
.
() -> Unit
unsafe_reinterpret_as_bytes
()[:],
StringBuilder
builder
,
Bool
stream
=true,
) continue
Unit
stream
.
(Int) -> Unit
blocking_read
(1024)
}
(_/0) -> Unit
Err
(
_/0
Closed
) =>
Unit
decoder
.
(String, StringBuilder, Bool) -> Unit
decode_to
("",
StringBuilder
builder
,
Bool
stream
=false)
(_/0) -> Unit
Err
(
(Unit) -> _/0
LastOperationFailed
(
Unit
e
)) => {
defer
Unit
e
.
() -> Unit
drop
()
fn[T] fail(msg : String, loc~ : SourceLoc = _) -> T raise Failure

Raises a Failure error with a given message and source location.

Parameters:

  • message : A string containing the error message to be included in the failure.
  • location : The source code location where the failure occurred. Automatically provided by the compiler when not specified.

Returns a value of type T wrapped in a Failure error type.

Throws an error of type Failure with a message that includes both the source location and the provided error message.

fail
(
Unit
e
.
() -> String
to_debug_string
())
} }
StringBuilder
builder
.
fn StringBuilder::to_string(self : StringBuilder) -> String

Returns the current content of the StringBuilder as a string.

to_string
()
}

This code implements the following functions:

  1. Creates an HTTP request to the wttr.in weather service
  2. Sets the request path, including the city name and format parameters
  3. Sends the request and waits for the response
  4. Extracts the content from the response
  5. Decodes the content and returns the weather information string

Step 5: Build the Project

Now that we have implemented the functionality, the next step is to build the project.

moon build --target wasm
wasm-tools component embed wit target/wasm/release/build/gen/gen.wasm -o core.wasm --encoding utf16
wasm-tools component new core.wasm -o weather.wasm

After a successful build, a weather.wasm file will be generated in the project root directory. This is our WebAssembly component.

You can then load it into Wassette:

wassette component load file://$(pwd)/weather.wasm

Step 6 (Optional): Configure Security Policy

Wassette strictly controls the permissions of WebAssembly components — a key part of ensuring tool security. Through fine-grained permission control, we can ensure the tool only performs expected operations.

In this example, we want it to access wttr.in, so we can grant permission using:

wassette permission grant network weather wttr.in

Step 7: Interact with AI

Finally, we can use Wassette to run our component and interact with AI. For example, in VSCode Copilot, modify .vscode/mcp.json:

{
  "servers": {
    "wassette": {
      "command": "wassette",
      "args": ["serve", "--disable-builtin-tools", "--stdio"],
      "type": "stdio"
    }
  },
  "inputs": []
}

After restarting Wassette, you can ask AI:

Using Wassette, load the component ./weather.wasm (note the use of the file schema) and query the weather for Shenzhen.

The AI will call load-component and get-weather in sequence, returning:

The component has been successfully loaded. The weather in Shenzhen is: ☀️ +30°C.

Summary

At this point, we have successfully created a secure MCP tool based on the WebAssembly Component Model, which can:

  1. Define clear interfaces
  2. Utilize the efficiency of MoonBit
  3. Run in Wassette's secure sandbox
  4. Interact with AI

Wassette is currently at version 0.3.4 and still lacks some MCP concepts, such as prompts, workspaces, reverse retrieval of user instructions, and AI generation capabilities. But it demonstrates how quickly an MCP can be built using the Wasm Component Model.

MoonBit will continue to improve its component model capabilities, including adding asynchronous support from the upcoming WASIp3 and simplifying the development process. Stay tuned!

Let's flood a HashMap!

· 14 min read
Rynco Maekawa

This article gives a brief introduction of the structure of a hash table, demonstrates hash flooding attack -- a common attack on it, and how to militate it when implementing this data structure.

Everybody loves hashmaps.

They provide a blazing fast average O(1)O(1) access* to associate any value to any key, asking for only two things in return: an equality comparer and a hash function, nothing more. This unique property makes hashmaps often more efficient than other associative data structures like search trees. As a result, hashmaps are nowadays one of the most used data structures in programming languages.

From the humble dict in Python, to databases and distributed systems, and even JavaScript objects, they're everywhere. They power database indexing systems, enable efficient caching mechanisms, and form the backbone of web frameworks for routing requests. Modern compilers use them for symbol tables, operating systems rely on them for process management, and virtually every web application uses them to manage user state.

Whether you're building a web server, parsing JSON values, dealing with configurations, or just counting word frequencies, chances are you'll reach for a hashmap. They've become so fundamental that many developers take their O(1)O(1) magic for granted -- but the 11 in O(1)O(1) has got some strings* attached.

The anatomy of a hashmap

A hashmap is made of two parts: a bucket array and a hash function.

struct MyHashMap[K, V] {
  
Array[ChainingBucket[K, V]]
buckets
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
struct ChainingBucket[K, V] {
  values: Array[(K, V)]
}
Bucket
[

type parameter K

K
,

type parameter V

V
]]
(K) -> UInt
hash_fn
: (

type parameter K

K
) ->
UInt
UInt
}

The bucket array contains a list of what we call "buckets". Each bucket stores some data we have inserted.

The hash function H associates each key with an integer. This integer is used to find an index in the bucket array to store our value. Usually, the index is derived by simply moduloing the integer with the size of the bucket array, i.e. index = H(key) % bucket_array_size. The hashmap expects the function to satisfy two important properties:

  1. The same key is always converted to the same number. i.e., if a == b, then H(a) == H(b).

    This property ensures that, once we have found a bucket to insert using a key, we can always find the same bucket where it has been inserted, using the same key.

  2. The resulting number is distributed uniformly across the space of possible results for different keys.

    This property ensures that different keys are unlikely to have the same associated integer, and in consequence, unlikely to be mapped to the same bucket in the array, allowing us to retrieve the value efficiently.

Now, you may ask, what would happen if two keys map to the same bucket? This comes to the realm of hash collisions.

Hash collisions

When two keys have the same hash value, or more broadly, when two keys map to the same bucket, a hash collision occurs.

As hashmaps determines everything based on the hash value (or bucket index), the two keys now look the same to the hashmap itself -- they should be put into the same place, but still unequal enough to not overwriting each other.

Hashmap designers have a couple of strategies to deal with collisions, which fall into one of the two broad categories:

  • The chaining method puts these keys in the same bucket. Each bucket now may contain the data for a number of keys, instead of just one. When searching for a colliding key, all keys in the bucket are searched at once.

    struct ChainingBucket[K, V] {
      
    Array[(K, V)]
    values
    :
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [(

    type parameter K

    K
    ,

    type parameter V

    V
    )]
    }

    Java's HashMap is a popular example of this approach.

  • The open addressing method still has one key per bucket, but uses a separate strategy to choose another bucket index when keys collide. When searching for a key, buckets are searched in the order of the strategy until the it is obvious that there are no more keys that could match.

    struct OpenAddressBucket[K, V] {
      
    Int
    hash
    :
    Int
    Int
    K
    key
    :

    type parameter K

    K
    V
    value
    :

    type parameter V

    V
    }

    MoonBit's standard library Map is an example of this approach.

Either case, when a hash collision happens, we have no choice but to search through everything corresponding to the bucket we've found, to determine whether the key we are looking for is there or not.

Using a chaining hashmap (for simplicity), the whole operation looks something like this:

typealias 
struct ChainingBucket[K, V] {
  values: Array[(K, V)]
}
ChainingBucket
as Bucket
/// Search for the place where the key is stored. /// /// Returns `(bucket, index, number_of_searches_done)` fn[K :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
, V]
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
::
fn[K : Eq, V] MyHashMap::search(self : MyHashMap[K, V], key : K) -> (Int, Int?, Int)

Search for the place where the key is stored.

Returns (bucket, index, number_of_searches_done)

search
(
MyHashMap[K, V]
self
:
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
[

type parameter K

K
,

type parameter V

V
],
K
key
:

type parameter K

K
) -> (
Int
Int
,
Int
Int
?,
Int
Int
) {
let
UInt
hash
= (
MyHashMap[K, V]
self
.
(K) -> UInt
hash_fn
)(
K
key
)
let
Int
bucket
= (
UInt
hash
fn Mod::mod(self : UInt, other : UInt) -> UInt

Calculates the remainder of dividing one unsigned integer by another.

Parameters:

  • self : The unsigned integer dividend.
  • other : The unsigned integer divisor.

Returns the remainder of the division operation.

Throws a panic if other is zero.

Example:

let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
%
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
().
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
()).
fn UInt::reinterpret_as_int(self : UInt) -> Int

reinterpret the unsigned int as signed int For number within the range of 0..=2^31-1, the value is the same. For number within the range of 2^31..=2^32-1, the value is negative

reinterpret_as_int
()
// Result let mut
Int?
found_index
=
Int?
None
let mut
Int
n_searches
= 0
// Search through all key-value pairs in the bucket. for
Int
index
,
(K, V)
keyvalue
in
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
bucket].
Array[(K, V)]
values
{
Int
n_searches
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
if
(K, V)
keyvalue
.
K
0
(_ : K, _ : K) -> Bool
==
K
key
{ // Check if the key matches.
Int?
found_index
=
(Int) -> Int?
Some
(
Int
index
)
break } } return (
Int
bucket
,
Int?
found_index
,
Int
n_searches
)
} /// Insert a new key-value pair. /// /// Returns the number of searches done. fn[K :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
, V]
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
::
fn[K : Eq, V] MyHashMap::insert(self : MyHashMap[K, V], key : K, value : V) -> Int

Insert a new key-value pair.

Returns the number of searches done.

insert
(
MyHashMap[K, V]
self
:
struct MyHashMap[K, V] {
  buckets: Array[ChainingBucket[K, V]]
  hash_fn: (K) -> UInt
}
MyHashMap
[

type parameter K

K
,

type parameter V

V
],
K
key
:

type parameter K

K
,
V
value
:

type parameter V

V
) ->
Int
Int
{
let (
Int
bucket
,
Int?
index
,
Int
n_searches
) =
MyHashMap[K, V]
self
.
fn[K : Eq, V] MyHashMap::search(self : MyHashMap[K, V], key : K) -> (Int, Int?, Int)

Search for the place where the key is stored.

Returns (bucket, index, number_of_searches_done)

search
(
K
key
)
if
Int?
index
is
(Int) -> Int?
Some
(
Int
index
) {
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
bucket].
Array[(K, V)]
values
fn[T] Array::op_set(self : Array[T], index : Int, value : T) -> Unit

Sets the element at the specified index in the array to a new value. The original value at that index is overwritten.

Parameters:

  • array : The array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws an error if index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
arr[1] = 42
inspect(arr, content="[1, 42, 3]")
[
index] = (
K
key
,
V
value
)
} else {
MyHashMap[K, V]
self
.
Array[ChainingBucket[K, V]]
buckets
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
bucket].
Array[(K, V)]
values
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
K
key
,
V
value
))
}
Int
n_searches
}

This is the string attached to the O(1)O(1) access magic -- we'd have to search through everything if we're unlucky. This gives the hashmap a worst-case complexity of O(n)O(n), where nn is the number of keys in the hashmap.

Crafting a collision

For most hash functions we use for hashmaps, unlucky collisions are rare. This means that we usually won't need to bother with the worst case scenario and enjoy the O(1)O(1) speed for the vast majority of the time.

That is, unless someone, maybe some black-suited hackerman with some malicious intent, forces you into one.

Hash functions are usually designed to be deterministic and fast, so even without advanced cryptanalysis of the function itself, we can still find some keys that will collide with each other by brute force. 1

fn 
fn find_collision(bucket_count : Int, target_bucket : Int, n_collision_want : Int, hash_fn : (String) -> UInt) -> Array[String]
find_collision
(
Int
bucket_count
:
Int
Int
,
Int
target_bucket
:
Int
Int
,
Int
n_collision_want
:
Int
Int
,
(String) -> UInt
hash_fn
: (
String
String
) ->
UInt
UInt
,
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
String
String
] {
let
Array[String]
result
= []
let
UInt
bucket_count
=
Int
bucket_count
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
()
let
UInt
target_bucket
=
Int
target_bucket
.
fn Int::reinterpret_as_uint(self : Int) -> UInt

reinterpret the signed int as unsigned int, when the value is non-negative, i.e, 0..=2^31-1, the value is the same. When the value is negative, it turns into a large number, for example, -1 turns into 2^32-1

reinterpret_as_uint
()
for
Int
i
= 0; ;
Int
i
=
Int
i
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1 {
// Generate some string key. let
String
s
=
Int
i
.
fn Int::to_string(self : Int, radix~ : Int) -> String

Converts an integer to its string representation in the specified radix (base). Example:

inspect((255).to_string(radix=16), content="ff")
inspect((-255).to_string(radix=16), content="-ff")
to_string
(
Int
radix
=36)
// Calculate the hash value let
UInt
hash
=
(String) -> UInt
hash_fn
(
String
s
)
let
UInt
bucket_index
=
UInt
hash
fn Mod::mod(self : UInt, other : UInt) -> UInt

Calculates the remainder of dividing one unsigned integer by another.

Parameters:

  • self : The unsigned integer dividend.
  • other : The unsigned integer divisor.

Returns the remainder of the division operation.

Throws a panic if other is zero.

Example:

let a = 17U
let b = 5U
inspect(a % b, content="2") // 17 divided by 5 gives quotient 3 and remainder 2
inspect(7U % 4U, content="3")
%
UInt
bucket_count
let
UInt
bucket_index
= if
UInt
bucket_index
fn Compare::op_lt(x : UInt, y : UInt) -> Bool
<
0 {
UInt
bucket_index
fn Add::add(self : UInt, other : UInt) -> UInt

Performs addition between two unsigned 32-bit integers. If the result overflows, it wraps around according to the rules of modular arithmetic (2^32).

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand to be added.

Returns the sum of the two unsigned integers, wrapped around if necessary.

Example:

let a = 42U
let b = 100U
inspect(a + b, content="142")

// Demonstrate overflow behavior
let max = 4294967295U // UInt::max_value
inspect(max + 1U, content="0")
+
UInt
bucket_count
} else {
UInt
bucket_index
} // Check if it collides with our target bucket. if
UInt
bucket_index
fn Eq::equal(self : UInt, other : UInt) -> Bool

Compares two unsigned 32-bit integers for equality.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand to compare with.

Returns true if both integers have the same value, false otherwise.

Example:

let a = 42U
let b = 42U
let c = 24U
inspect(a == b, content="true")
inspect(a == c, content="false")
==
UInt
target_bucket
{
Array[String]
result
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
String
s
)
if
Array[String]
result
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
()
fn Compare::op_ge(x : Int, y : Int) -> Bool
>=
Int
n_collision_want
{
break } } }
Array[String]
result
}

Hash flooding attack

With colliding values in hand, we (in the role of malicious hackermen) can now attack hashtables to constantly exploit their worst-case complexity.

Consider the following case: you are inserting keys into the same hashmap, but every key hashes into the same bucket. With each insert, the hashmap must search through all the existing keys in the bucket to determine whether the new key is already there.

The first insertion compares with 0 keys, the second with 1 key, the third compares with 2 keys, and the number of keys compared grows linearly with each insertion. For nn insertions, the total number of keys compared is:

0+1++(n1)=n(n1)2=n2+n20 + 1 + \dots + (n - 1) = \frac{n(n - 1)}{2} = \frac{n^2 + n}{2}

The total list of nn insertions now takes O(n2)O(n^2) compares to complete2, as opposed to the average case of O(n)O(n) compares. The operation will now take far more time than it ought to.

The attack is not just limited to insertion. Every time when an attacked key is being searched for, the same number of keys will be compared, so every single operation that would have been O(1)O(1) now becomes O(n)O(n). These hashmap operations that would otherwise take negligible time will now be severely slower, making the attacker far easier to deplete the program's resources than before.

This, is what we call a hash flooding attack, taken its name from it flooding the same bucket of the hashmap with colliding keys.

We can demonstrate this with the hashmap implementation we wrote earlier:

/// A simple string hasher via the Fowler-Noll-Vo hash function.
/// https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
fn 
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
(
String
s
:
String
String
) ->
UInt
UInt
{
// In reality this should directly operate on the underlying array of the string let
Bytes
s_bytes
=
fn @moonbitlang/core/encoding/utf16.encode(str : StringView, bom? : Bool, endianness? : @encoding/utf16.Endian) -> Bytes

Encodes a string into a UTF-16 byte array.

Assuming the string is valid.

@encoding/utf16.encode
(
String
s
)
let mut
UInt
acc
:
UInt
UInt
= 0x811c9dc5
for
Byte
b
in
Bytes
s_bytes
{
UInt
acc
= (
UInt
acc
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
Byte
b
.
fn Byte::to_uint(self : Byte) -> UInt

Converts a Byte to a UInt.

Parameters:

  • byte : The Byte value to be converted.

Returns the UInt representation of the Byte.

to_uint
())
fn Mul::mul(self : UInt, other : UInt) -> UInt

Performs multiplication between two unsigned 32-bit integers. The result wraps around if it exceeds the maximum value of UInt.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand.

Returns the product of the two unsigned integers. If the result exceeds the maximum value of UInt (4294967295), it wraps around to the corresponding value modulo 2^32.

Example:

let a = 3U
let b = 4U
inspect(a * b, content="12")
let max = 4294967295U
inspect(max * 2U, content="4294967294") // Wraps around to max * 2 % 2^32
*
0x01000193
}
UInt
acc
} fn
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
n_buckets
:
Int
Int
,
Array[String]
keys
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
String
String
],
(String) -> UInt
hash_fn
: (
String
String
) ->
UInt
UInt
,
) ->
Int
Int
{
let
MyHashMap[String, Int]
map
= {
Array[ChainingBucket[String, Int]]
buckets
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
fn[T] Array::makei(length : Int, value : (Int) -> T raise?) -> Array[T] raise?

Creates a new array of the specified length, where each element is initialized using an index-based initialization function.

Parameters:

  • length : The length of the new array. If length is less than or equal to 0, returns an empty array.
  • initializer : A function that takes an index (starting from 0) and returns a value of type T. This function is called for each index to initialize the corresponding element.

Returns a new array of type Array[T] with the specified length, where each element is initialized using the provided function.

Example:

let arr = Array::makei(3, i => i * 2)
inspect(arr, content="[0, 2, 4]")
makei
(
Int
n_buckets
, _ => {
Array[(String, Int)]
values
: [] }),
(String) -> UInt
hash_fn
}
let mut
Int
total_searches
= 0
for
String
key
in
Array[String]
keys
{
Int
total_searches
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
MyHashMap[String, Int]
map
.
fn[K : Eq, V] MyHashMap::insert(self : MyHashMap[K, V], key : K, value : V) -> Int

Insert a new key-value pair.

Returns the number of searches done.

insert
(
String
key
, 0)
}
Int
total_searches
} test {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Demonstrate hash flooding attack")
let
Int
bucket_count
= 2048
let
Int
target_bucket_id
= 42
let
Int
n_collision_want
= 1000
//
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("First, try to insert non-colliding keys.")
let
Array[String]
non_colliding_keys
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
fn[T] Array::makei(length : Int, value : (Int) -> T raise?) -> Array[T] raise?

Creates a new array of the specified length, where each element is initialized using an index-based initialization function.

Parameters:

  • length : The length of the new array. If length is less than or equal to 0, returns an empty array.
  • initializer : A function that takes an index (starting from 0) and returns a value of type T. This function is called for each index to initialize the corresponding element.

Returns a new array of type Array[T] with the specified length, where each element is initialized using the provided function.

Example:

let arr = Array::makei(3, i => i * 2)
inspect(arr, content="[0, 2, 4]")
makei
(
Int
n_collision_want
,
Int
i
=> (
Int
i
fn Mul::mul(self : Int, other : Int) -> Int

Multiplies two 32-bit integers. This is the implementation of the * operator for Int.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns the product of the two integers. If the result overflows the range of Int, it wraps around according to two's complement arithmetic.

Example:

inspect(42 * 2, content="84")
inspect(-10 * 3, content="-30")
let max = 2147483647 // Int.max_value
inspect(max * 2, content="-2") // Overflow wraps around
*
37).
fn Int::to_string(self : Int, radix~ : Int) -> String

Converts an integer to its string representation in the specified radix (base). Example:

inspect((255).to_string(radix=16), content="ff")
inspect((-255).to_string(radix=16), content="-ff")
to_string
(
Int
radix
=36))
let
Int
n_compares_nc
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
non_colliding_keys
,
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
,
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} non-colliding keys: \{
Int
n_compares_nc
}",
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("")
//
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Now, we want all keys to collide into bucket #\{
Int
target_bucket_id
}.")
let
Array[String]
colliding_keys
=
fn find_collision(bucket_count : Int, target_bucket : Int, n_collision_want : Int, hash_fn : (String) -> UInt) -> Array[String]
find_collision
(
Int
bucket_count
,
Int
target_bucket_id
,
Int
n_collision_want
,
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
,
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Found \{
Array[String]
colliding_keys
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
()} colliding keys.")
let
Int
n_compares_c
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
colliding_keys
,
fn string_fnv_hash(s : String) -> UInt

A simple string hasher via the Fowler-Noll-Vo hash function. https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

string_fnv_hash
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} colliding keys: \{
Int
n_compares_c
}",
) // let
Double
increase
=
Int
n_compares_c
.
fn Int::to_double(self : Int) -> Double

Converts a 32-bit integer to a double-precision floating-point number. The conversion preserves the exact value since all integers in the range of Int can be represented exactly as Double values.

Parameters:

  • self : The 32-bit integer to be converted.

Returns a double-precision floating-point number that represents the same numerical value as the input integer.

Example:

let n = 42
inspect(n.to_double(), content="42")
let neg = -42
inspect(neg.to_double(), content="-42")
to_double
()
fn Div::div(self : Double, other : Double) -> Double

Performs division between two double-precision floating-point numbers. Follows IEEE 754 standard for floating-point arithmetic, including handling of special cases like division by zero (returns infinity) and operations involving NaN.

Parameters:

  • self : The dividend (numerator) in the division operation.
  • other : The divisor (denominator) in the division operation.

Returns the result of dividing self by other. Special cases follow IEEE 754:

  • Division by zero returns positive or negative infinity based on the dividend's sign
  • Operations involving NaN return NaN
  • Division of infinity by infinity returns NaN

Example:

inspect(6.0 / 2.0, content="3")
inspect(-6.0 / 2.0, content="-3")
inspect(1.0 / 0.0, content="Infinity")
/
Int
n_compares_nc
.
fn Int::to_double(self : Int) -> Double

Converts a 32-bit integer to a double-precision floating-point number. The conversion preserves the exact value since all integers in the range of Int can be represented exactly as Double values.

Parameters:

  • self : The 32-bit integer to be converted.

Returns a double-precision floating-point number that represents the same numerical value as the input integer.

Example:

let n = 42
inspect(n.to_double(), content="42")
let neg = -42
inspect(neg.to_double(), content="-42")
to_double
()
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The number of compares increased by a factor of \{
Double
increase
}")
}

The output of the code above is:

Demonstrate hash flooding attack
First, try to insert non-colliding keys.
Total compares for 1000 non-colliding keys: 347

Now, with colliding keys...
Found 1000 colliding keys.
Total compares for 1000 colliding keys: 499500
The number of compares increased by a factor of 1439.4812680115274

... as can be seen directly, now the insertion is some 1000 times slower!

In reality, although the number of buckets in hashmaps is not fixed like our examples, they often follow a certain growing sequence, such as doubling or following a list of predefined prime numbers. This growth pattern makes the bucket count very predictable. Thus, an attacker can initiate a hash flooding attack even if they don't know the exact bucket count.

Mitigating hash flooding attacks

Hash flooding attack works because the attacker knows exactly how a hash function works, and how it connects to where the key is inserted into the hashmap. If we change either of them, the attack will no longer work.

Seeded hash function

By far, the easiest way to do this is to prevent the attacker from knowing how the hash algorithm exactly works. This might sound impossible, but the properties of the hash function actually only need to hold within a single hashmap!

When dealing with hashmaps, we don't need a single, global "hash value" that can be used everywhere, because hashmaps don't care about what happens outside them. Simply swapping out the hash function from table to table, and you get something that's unpredictable to the attacker.

But hey, you may say, "we don't have an infinite supply of different hash algorithms!"

Well, you do. Remember that hash functions need to distribute the value across the result space as uniform as possible? That means, for a good hash function, a slight change in the input can cause a large change in the output. So, in order to get a hash function unique to each table, we only need to feed it some data unique to the table before feeding it the data we want to hash. This is called a "seed" to the hash function, and each table can now have a different seed to use.

Let's demonstrate how the seed solves the problem with a seeded hash function and two tables with different seeds:

/// A modified version of the FNV hash before to allow a seed to be used.
fn 
fn string_fnv_hash_seeded(seed : UInt) -> (String) -> UInt

A modified version of the FNV hash before to allow a seed to be used.

string_fnv_hash_seeded
(
UInt
seed
:
UInt
UInt
) -> (
String
String
) ->
UInt
UInt
{
let
Bytes
seed_bytes
=
UInt
seed
.
fn UInt::to_le_bytes(self : UInt) -> Bytes

Converts the UInt to a Bytes in little-endian byte order.

to_le_bytes
()
fn
(String) -> UInt
string_fnv_hash
(
String
s
:
String
String
) ->
UInt
UInt
{
let
Bytes
s_bytes
=
fn @moonbitlang/core/encoding/utf16.encode(str : StringView, bom? : Bool, endianness? : @encoding/utf16.Endian) -> Bytes

Encodes a string into a UTF-16 byte array.

Assuming the string is valid.

@encoding/utf16.encode
(
String
s
)
let mut
UInt
acc
:
UInt
UInt
= 0x811c9dc5
// Mix in the seed bytes. for
Byte
b
in
Bytes
seed_bytes
{
UInt
acc
= (
UInt
acc
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
Byte
b
.
fn Byte::to_uint(self : Byte) -> UInt

Converts a Byte to a UInt.

Parameters:

  • byte : The Byte value to be converted.

Returns the UInt representation of the Byte.

to_uint
())
fn Mul::mul(self : UInt, other : UInt) -> UInt

Performs multiplication between two unsigned 32-bit integers. The result wraps around if it exceeds the maximum value of UInt.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand.

Returns the product of the two unsigned integers. If the result exceeds the maximum value of UInt (4294967295), it wraps around to the corresponding value modulo 2^32.

Example:

let a = 3U
let b = 4U
inspect(a * b, content="12")
let max = 4294967295U
inspect(max * 2U, content="4294967294") // Wraps around to max * 2 % 2^32
*
0x01000193
} // Hash the string bytes. for
Byte
b
in
Bytes
s_bytes
{
UInt
acc
= (
UInt
acc
fn BitXOr::lxor(self : UInt, other : UInt) -> UInt

Performs a bitwise XOR (exclusive OR) operation between two unsigned 32-bit integers. Each bit in the result is set to 1 if the corresponding bits in the operands are different, and 0 if they are the same.

Parameters:

  • self : The first unsigned 32-bit integer operand.
  • other : The second unsigned 32-bit integer operand.

Returns the result of the bitwise XOR operation.

Example:

let a = 0xFF00U // Binary: 1111_1111_0000_0000
let b = 0x0F0FU // Binary: 0000_1111_0000_1111
inspect(a ^ b, content="61455") // Binary: 1111_0000_0000_1111
^
Byte
b
.
fn Byte::to_uint(self : Byte) -> UInt

Converts a Byte to a UInt.

Parameters:

  • byte : The Byte value to be converted.

Returns the UInt representation of the Byte.

to_uint
())
fn Mul::mul(self : UInt, other : UInt) -> UInt

Performs multiplication between two unsigned 32-bit integers. The result wraps around if it exceeds the maximum value of UInt.

Parameters:

  • self : The first unsigned integer operand.
  • other : The second unsigned integer operand.

Returns the product of the two unsigned integers. If the result exceeds the maximum value of UInt (4294967295), it wraps around to the corresponding value modulo 2^32.

Example:

let a = 3U
let b = 4U
inspect(a * b, content="12")
let max = 4294967295U
inspect(max * 2U, content="4294967294") // Wraps around to max * 2 % 2^32
*
0x01000193
}
UInt
acc
}
(String) -> UInt
string_fnv_hash
} test {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Demonstrate flooding attack mitigation")
let
Int
bucket_count
= 2048
let
Int
target_bucket_id
= 42
let
Int
n_collision_want
= 1000
// The first table has a seed of 42. let
UInt
seed1
:
UInt
UInt
= 42
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("We find collisions using the seed \{
UInt
seed1
}")
let
(String) -> UInt
hash_fn1
=
fn string_fnv_hash_seeded(seed : UInt) -> (String) -> UInt

A modified version of the FNV hash before to allow a seed to be used.

string_fnv_hash_seeded
(
UInt
seed1
)
let
Array[String]
colliding_keys
=
fn find_collision(bucket_count : Int, target_bucket : Int, n_collision_want : Int, hash_fn : (String) -> UInt) -> Array[String]
find_collision
(
Int
bucket_count
,
Int
target_bucket_id
,
Int
n_collision_want
,
(String) -> UInt
hash_fn1
,
) let
Int
n_compares_c
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
colliding_keys
,
(String) -> UInt
hash_fn1
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} colliding keys with seed \{
UInt
seed1
}: \{
Int
n_compares_c
}",
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("")
// The second table has a different seed let
UInt
seed2
:
UInt
UInt
= 100
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"We now use a different seed for the second table, this time \{
UInt
seed2
}",
) let
(String) -> UInt
hash_fn2
=
fn string_fnv_hash_seeded(seed : UInt) -> (String) -> UInt

A modified version of the FNV hash before to allow a seed to be used.

string_fnv_hash_seeded
(
UInt
seed2
)
let
Int
n_compares_nc
=
fn test_attack(n_buckets : Int, keys : Array[String], hash_fn : (String) -> UInt) -> Int
test_attack
(
Int
bucket_count
,
Array[String]
colliding_keys
,
(String) -> UInt
hash_fn2
)
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
(
"Total compares for \{
Int
n_collision_want
} keys that were meant to collide with seed \{
UInt
seed1
}: \{
Int
n_compares_nc
}",
) }

The output of the program above was:

Demonstrate flooding attack mitigation
We find collisions using 42
Total compares for 1000 colliding keys with seed 42: 499500

We now use a different seed for the second table, this time 100
Total compares for 1000 keys that were meant to collide with seed 42: 6342

We can see that, the keys that were colliding in the first table are not colliding in the second. 3 Therefore, we have successfully mitigated the hash flooding attack using this simple trick.

As of where the seed that randomizes each hashmap comes from... For programs with access to an external random source (like Linux's /dev/urandom), using that would generally be the best choice. For programs without such access (such as within a WebAssembly sandbox), a per-process random seed is also a preferrable solution (this is what Python does). Even simpler, a simple counter that increments with each seeding attempt could be good enough -- guessing how many hashmaps have been created can still be quite hard for an attacker.

Other choices

Java uses a different solution, by falling back to a binary search tree (red-black tree) when too many values occupy the same bucket. Yes, this requires the keys to be also comparable in addition to being hashable, but now it guarantees O(logn)O(\log n) worst-case complexity, which is far better than O(n)O(n).

Why does it matter to us?

Due to the ubiquitous nature of hashmaps, it's extremely easy to find some hashmap in a program where you can control the keys, especially in Web programs. Headers, cookies, query parameters and JSON bodies are all key-value pairs, and often stored in hashmaps, which might be vulnerable to hash flooding attacks.

A malicious attacker with enough knowledge of the program (programming language, frameworks, etc.) can then try to send carefully-crafted request payloads to the Web API endpoints. These requests take a lot longer to handle, so if a regular denial-of-service (DoS) attack takes n requests/s to bring down a server, a hash flooding attack might only a tiny fraction of that number, often a magnitude smaller -- making it far more efficient for the attacker. This turns the DoS attack into a HashDoS attack.

Fortunately, by introducing some even slightly unpredictable patterns (such as a per-process randomness or keyed hashing) into hashmaps, we can make such attack significantly harder, often impractical. Also, as such attack is highly dependent on the language, framework, architecture and implementation of target application, crafting one could be quite hard already, and modern, well-configured systems are even more harder to exploit.

Takeaways

Hashmaps give us powerful, constant-time average access -- but that "constant" depends on assumptions an attacker can sometimes break. A targeted hash-flooding attack forces many keys into the same bucket and turns O(1) operations into O(n), enabling highly efficient resource exhaustion.

The good news is the mitigations are simple and practical: introduce some unpredictableness to your hashmaps, use side-channel information when hash alone is not enough, or rehash when the behavior doesn't look right. With these, we can keep our hashmaps fast and secure.

Footnotes

  1. Side note, this is also similar to how Bitcoin mining works -- finding a value to add to an existing string, so the hash of the entire thing (with bits reversed), modulo some given value, is zero.

  2. There's even a Tumblr blog for unexpected quadratic complexity in programming languages, Accidentally Quadratic. You can even find a hashmap-related one here! -- It's almost a manually-introduced hash flooding attack.

  3. You may notice that this number is still slightly higher than that we got with randomly-generated, non-colliding keys. This might be related to that FNV is not designed for the best quality of its output. Since the two seeds are pretty close to each other, the result might still have some similarity. Using a better hash function (or even a cryptographically-secure one like SipHash) would greatly reduce this effect.

Write a HTTP file server in MoonBit

· 17 min read

In this article, I will introduce MoonBit's async programming support and the moonbitlang/async library by writing a simple HTTP file server. If you have experience with the Python language before, you may know that Python has a very convenient builtin HTTP server module. You can launch a HTTP file server sharing current directory by running python -m http.server from the command line, which is useful for LAN file sharing. In this article, we will write a program with similar functionality in MoonBit, and learn about MoonBit's async programming support. We will implement an extra useful functionality absent in python -m http.server: downloading the whole directory as a .zip file.

A brief history of async programming

Async programming enables programs to perform multiple tasks at the same time. For example, for a file server, there may be many users accessing the server at the same time. The server needs to serve all users at the same time while making the experience of every user as fluent as possible. In a typical async program, such as a server, most time is spent on waiting for IO operations in a single task, and only a small portion of time is spent on actual computation. So, we don't really need a lot of computation power to handle a lot of tasks. The key here is to switch frequently between tasks: if a task starts waiting for IO, don't process it anymore, switch to a task that is immediately ready instead.

In the past, async programming is usually implemented via multi-threading. Every task in the program corresponds to a operating system thread. However, OS threads are resource heavy, the context switch between OS threads is expensive, too. So, today, async programming is usually implemented via event loops. In an event loop based async program, the whole is structured as a big loop. In every iteration of the loop, the program check for a list of completed IO operations, and resume the tasks blocked on these IO operations, until they issue another IO request and enter waiting state again. In this programming paradigm, the context switch between tasks happens in the user space, on a single OS thread. So the cost of switching between tasks is very cheap.

Although event loop solves the performance problem, it is very painful to code event loop based program manually. The code of a single task need to be splitted into multiple iterations of the event loop, damaging the readability of program logic significantly. Fortunately, like most other modern programming languages, MoonBit provides native async programming support. Users can write async code just like normal, synchronous code. The MoonBit compiler will automatically split async code into multiple parts, while the moonbitlang/async library provides the event loop, various IO primitives, and a scheduler that actually runs the async code.

Async programming in MoonBit

In MoonBit, you can declare an async function using the async fn syntax. Async functions look exactly the same as normal, synchronous functions, except that thay may be interrupted in the middle at run time, so that the program can switch between multiple tasks.

Unlike most other languages, MoonBit doesn't need special marks such as await when calling async functions. The compiler will automatically infer which function calls are async. However, if you read async MoonBit code in a IDE or text editor that supports MoonBit, you can see async function calls rendered in italic style, and function calls that may raise error rendered with underline. So, you can still easily find out all async function calls when reading code.

For async programming, it is also necessary to have an event loop, a task scheduler and various IO primitives. In MoonBit, these are implemented via the moonbitlang/async library. moonbitlang/async provides support for async primitives such as network IO, file IO and process creation, as well as a lot of useful task management facilities. In the following parts, We will learn about various features of moonbitlang/async while writing the HTTP file server.

The structure of a HTTP server

The structure of a typical HTTP server is:

  • the server listen on a TCP socket, waiting for incoming connections from users
  • after accepting a TCP connection from a user, the server read the user's request from the TCP connection, process it, and send the result back to the user.

Every task described above must be performed asynchronously: when performing the request from the first user, the server should still keep waiting for new connections, and react to the connection request of the next user. If many users connect to the server at the same time, the server should handle the requests from all users in parallel. When handling user requests, all time consuming operations, such as network IO and file IO, should be asynchronous: they should not block the program and affect the handling of other tasks.

moonbitlang/async provides a helper function @http.run_server, which automatically setup a HTTP server and run it:

async fn 
async fn server_main(path~ : String, port~ : Int) -> Unit
server_main
(
String
path
~ :
String
String
,
Int
port
~ :
Int
Int
) ->
Unit
Unit
{
(Unit, (?, Unit) -> Unit) -> Unit
@http.run_server
(
(String) -> Unit
@socket.Addr::
(String) -> Unit
parse
("[::]:\{
Int
port
}"), fn (
?
conn
,
Unit
addr
) {
Unit
@pipe.stderr
.
(String) -> Unit
write
("received new connection from \{
Unit
addr
}\n")
async fn handle_connection(base_path : String, conn : ?) -> Unit
handle_connection
(
String
path
,
?
conn
)
}) }

server_main accepts two parameters. path is the directory to serve, and port is the port to listen on. In moonbitlang/async, all async code are cancellable, and cancellation is performed by raising an error in cancelled code. So, MoonBit assumes all async fn may raise error by default, eliminating the need for explicitly marking async fn with raise.

In server_main, we use @http.run_server to create a HTTP server and run it. @http is the default alias for moonbitlang/async/http, which provides HTTP support for moonbitlang/async. The first parameter of @http.run_server is the address to listen, here we ask the server to listen on [::]:port, which means listening on port on any network interface. moonbitlang/async provides native IPv4/IPv6 dual stack support, so the server here can accept both IPv4 connections and IPv6 connections. The second parameter of @http.run_server is a callback function used for handling client request. The callback function receives two parameters, the first one is the connection from the user, represented using the type @http.ServerConnection. The connection is created automatically by @http.run_server. The second parameter of the callback function is the network address of the user. Here, we use a function handle_connection to handle the request, the implementation of handle_connection will be given later. @http.run_server will automatically create a new task, and run handle_connection in the new task. So, the server may run multiple instances handle_connection in parallel, handling multiple user connections at the same time.

Handle user request

Now, let's implement the handle_connection function. handle_connection accepts two parameters: base_path is the directory being served, and conn is the connection from the user. The implementation of handle_connection is as follows:

async fn 
async fn handle_connection(base_path : String, conn : ?) -> Unit
handle_connection
(
String
base_path
:
String
String
,
?
conn
: @http.ServerConnection,
) ->
Unit
Unit
{
for { let
Unit
request
=
?
conn
.
() -> Unit
read_request
()
?
conn
.
() -> Unit
skip_request_body
()
guard
Unit
request
.
Unit
meth
is
Unit
Get
else {
?
conn
..
(Int, String) -> Unit
send_response
(501, "Not Implemented")
..
(String) -> Unit
write
("This request is not implemented")
..
() -> Unit
end_response
()
} let (
String
path
,
Bool
download_zip
) = match
Unit
request
.
String
path
{
String
[ ..path, .."?download_zip" ]
=> (
StringView
path
.
fn Show::to_string(self : StringView) -> String

Returns a new String containing a copy of the characters in this view.

Examples

  let str = "Hello World"
  let view = str.view(start_offset = str.offset_of_nth_char(0).unwrap(),end_offset = str.offset_of_nth_char(5).unwrap()) // "Hello"
  inspect(view.to_string(), content="Hello")
to_string
(), true)
String
path
=> (
String
path
, false)
} if
Bool
download_zip
{
async fn serve_zip(conn : ?, path : String) -> Unit
serve_zip
(
?
conn
,
String
base_path
fn Add::add(self : String, other : String) -> String

Concatenates two strings, creating a new string that contains all characters from the first string followed by all characters from the second string.

Parameters:

  • self : The first string to concatenate.
  • other : The second string to concatenate.

Returns a new string containing the concatenation of both input strings.

Example:

let hello = "Hello"
let world = " World!"
inspect(hello + world, content="Hello World!")
inspect("" + "abc", content="abc") // concatenating with empty string
+
String
path
)
} else { let
?
file
=
(String, Unit) -> ?
@fs.open
(
String
base_path
fn Add::add(self : String, other : String) -> String

Concatenates two strings, creating a new string that contains all characters from the first string followed by all characters from the second string.

Parameters:

  • self : The first string to concatenate.
  • other : The second string to concatenate.

Returns a new string containing the concatenation of both input strings.

Example:

let hello = "Hello"
let world = " World!"
inspect(hello + world, content="Hello World!")
inspect("" + "abc", content="abc") // concatenating with empty string
+
String
path
,
Unit
mode
=
Unit
ReadOnly
) catch {
_ => {
?
conn
..
(Int, String) -> Unit
send_response
(404, "NotFound")
..
(String) -> Unit
write
("File not found")
..
() -> Unit
end_response
()
continue } } defer
?
file
.
() -> Unit
close
()
if
?
file
.
() -> Unit
kind
() is
Unit
Directory
{
if
Bool
download_zip
{
} else {
async fn serve_directory(conn : ?, dir : ?, path~ : String) -> Unit
serve_directory
(
?
conn
,
?
file
.
() -> ?
as_dir
(),
String
path
~)
} } else {
async fn server_file(conn : ?, file : ?, path~ : String) -> Unit
server_file
(
?
conn
,
?
file
,
String
path
~)
} } } }

In handle_connection, the program read requests from the user connection and handle them in a big loop. In every iteration, we first read the next request from the user via conn.read_request(). conn.read_request() will only read the header part of a HTTP request, in order to allow streaming read for large body in user request. Since our file server only handles GET request, the body of requests is irrelevant. So, we use conn.skip_body() to skip the body of user request, so that the content of the next request can be processed normally.

If we meet a request that is not GET, the else block of guard statement will be executed. Code after the guard statement will be skipped, and the program will enter the next iteration directly and handle the next request. In the else block, we use conn.send_response(..) to send a "NotImplemented" response back to the user. conn.send_response(..) will only send the header part of the response. After send_response, we use conn.write(..) to write the body of the response to the connection. After writing all desired contents, we use conn.end_response() to tell the library that the response body has completed.

Here, we want to implement a useful feature absent in python -m http.server: download the whole directory as a zip file. If the requested URL has the shape /path/to/directory?download_zip, we package /path/to/directory into a .zip file and send it to the user. This feature is implemented using the serve_zip function to be given later.

Since we are implementing a file server, the requested path in users' GET request will map to file system path under base_path directly. @fs is the default alias of moonbitlang/async/fs, the package for file system IO support in moonbitlang/async. Here, we use @fs.open to open the requested file. In the @fs.open operation fails, we send the user a 404 response, notifying the user that the requested file does not exist.

If the requested file is successfully opened, we need to send its content to the user. Before that, we use defer file.close() to ensure that the opened file will be closed correctly. We can obtain the kind of the file via file.kind(). In a file server, directories need some special handling. Since we cannot send a directory over network, we need to serve a HTML page for the user, which contains the contents of the directory, and links that jump to the corresponding page of each file in the directory. This part of the server is implemented in the serve_directory function, whose definition will be provided later.

If the requested file is a regular file, we simply send the content of the file to the user. This is implemented via the serve_file function:

async fn 
async fn server_file(conn : ?, file : ?, path~ : String) -> Unit
server_file
(
?
conn
: @http.ServerConnection,
?
file
: @fs.File,
String
path
~ :
String
String
,
) ->
Unit
Unit
{
let
String
content_type
= match
String
path
{
[.., .. ".png"] => "image/png" [.., .. ".jpg"] | "jpeg" => "image/jpeg" [.., .. ".html"] => "text/html" [.., .. ".css"] => "text/css" [.., .. ".js"] => "text/javascript" [.., .. ".mp4"] => "video/mp4" [.., .. ".mpv"] => "video/mpv" [.., .. ".mpeg"] => "video/mpeg" [.., .. ".mkv"] => "video/x-matroska" _ => "appliaction/octet-stream" }
?
conn
..
(Int, String, Map[String, String]) -> Unit
send_response
(200, "OK",
Map[String, String]
extra_headers
={ "Content-Type":
String
content_type
})
..
(?) -> Unit
write_reader
(
?
file
)
..
() -> Unit
end_response
()
}

In the HTTP response header, we fill in different values for the Content-Type field based on the suffix of the requested file. With correct Content-Type, the users can view the content of image/video/HTML file in the browser directly. For other files, the value of Content-Type is set to application/octet-stream, which tells the browser to download the file automatically.

As before, we use conn.send_response to send the response header. The extra_headers field allows us to set extra header fields for the response. The body of the response is the content of the file. Here, conn.write_reader(..) will send the content of file to the user streamingly. Assume the user requests for a video file and plays it in the browser, if we read the whole video file in memory first before sending it to the user, the user can only see response from the server after the whole video file has been loaded, resulting in poor latency. It is also a huge waste of memory to load the whole video file. write_reader, on the other hand, automatically split the file into small chunks, and send the content of the file chunk-by-chunk. This way, users can start playing the video immediately, and the server can save up a lot of memory.

Next, let's implement the serve_directory function:

async fn 
async fn serve_directory(conn : ?, dir : ?, path~ : String) -> Unit
serve_directory
(
?
conn
: @http.ServerConnection,
?
dir
: @fs.Directory,
String
path
~ :
String
String
,
) ->
Unit
Unit
{
let
Unit
files
=
?
dir
.
() -> Unit
read_all
()
Unit
files
.
() -> Unit
sort
()
?
conn
..
(Int, String, Map[String, String]) -> Unit
send_response
(200, "OK",
Map[String, String]
extra_headers
={ "Content-Type": "text/html" })
..
(String) -> Unit
write
("<!DOCTYPE html><html><head></head><body>")
..
(String) -> Unit
write
("<h1>\{
String
path
}</h1>\n")
..
(String) -> Unit
write
("<div style=\"margin: 1em; font-size: 15pt\">\n")
..
(String) -> Unit
write
("<a href=\"\{
String
path
}?download_zip\">download as zip</a><br/><br/>\n")
if
String
path
[:-1].
fn StringView::rev_find(self : StringView, str : StringView) -> Int?

Returns the offset of the last occurrence of the given substring. If the substring is not found, it returns None.

rev_find
("/") is
(Int) -> Int?
Some
(
Int
index
) {
let
String
parent
= if
Int
index
fn Eq::equal(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

inspect(42 == 42, content="true")
inspect(42 == -42, content="false")
==
0 { "/" } else {
String
path
[:
Int
index
].
fn Show::to_string(self : StringView) -> String

Returns a new String containing a copy of the characters in this view.

Examples

  let str = "Hello World"
  let view = str.view(start_offset = str.offset_of_nth_char(0).unwrap(),end_offset = str.offset_of_nth_char(5).unwrap()) // "Hello"
  inspect(view.to_string(), content="Hello")
to_string
() }
?
conn
.
(String) -> Unit
write
("<a href=\"\{
String
parent
}\">..</a><br/><br/>\n")
} for
Unit
file
in
Unit
files
{
let
String
file_url
= if
String
path
fn String::op_get(self : String, idx : Int) -> Int

Returns the UTF-16 code unit at the given index.

Parameters:

  • string : The string to access.
  • index : The position in the string from which to retrieve the code unit.

This method has O(1) complexity.

[
path.
fn String::length(self : String) -> Int

Returns the number of UTF-16 code units in the string. Note that this is not necessarily equal to the number of Unicode characters (code points) in the string, as some characters may be represented by multiple UTF-16 code units.

Parameters:

  • string : The string whose length is to be determined.

Returns the number of UTF-16 code units in the string.

Example:

inspect("hello".length(), content="5")
inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
inspect("".length(), content="0") // Empty string
length
()
fn Sub::sub(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

let a = 42
let b = 10
inspect(a - b, content="32")
let max = 2147483647 // Int maximum value
inspect(max - -1, content="-2147483648") // Overflow case
-
1]
(Int, Int) -> Bool
!=
'/' {
"\{
String
path
}/\{
Unit
file
}"
} else { "\{
String
path
}\{
Unit
file
}"
}
?
conn
.
(String) -> Unit
write
("<a href=\"\{
String
file_url
}\">\{
Unit
file
}</a><br/>\n")
}
?
conn
..
(String) -> Unit
write
("</div></body></html>")
..
() -> Unit
end_response
()
}

Here, we first read the list of files in the directory and sort them. Next, we build a HTML page based on the content of the directory. The body of the HTML page is the list of files in the directory, each file corresponds to a <a> HTML link showing the name of the file. Users can jump to the page of the file by clicking the link. If the requested directory is not the root directory, we add a special link .. at the beginning of the page, which jumps to the parent directory of current directory. Finally, the page also contains a download as zip link, which jumps to the zip download URL for current directory.

Implement the download as zip feature

Finally, let's implement the "download as zip" feature. Here, for simplicity, we use the zip command for compression. The implementation of serve_zip is as follows:

async fn 
async fn serve_zip(conn : ?, path : String) -> Unit
serve_zip
(
?
conn
: @http.ServerConnection,
String
path
:
String
String
,
) ->
Unit
Unit
{
let
Unit
full_path
=
(String) -> Unit
@fs.realpath
(
String
path
)
let
String
zip_name
= if
Unit
full_path
[:].
(String) -> Unit
rev_find
("/") is
(Int) -> Unit
Some
(
Int
i
) {
Unit
full_path
[
Int
i
+1:].
() -> String
to_string
()
} else {
String
path
}
((Unit) -> Unit) -> Unit
@async.with_task_group
(fn(
Unit
group
) {
let (
Unit
we_read_from_zip
,
Unit
zip_write_to_us
) =
() -> (Unit, Unit)
@process.read_from_process
()
defer
Unit
we_read_from_zip
.
() -> Unit
close
()
Unit
group
.
(() -> Unit) -> Unit
spawn_bg
(fn() {
let
Int
exit_code
=
(String, Array[String], Unit) -> Int
@process.run
(
"zip", [ "-q", "-r", "-",
String
path
],
Unit
stdout
=
Unit
zip_write_to_us
,
) if
Int
exit_code
fn[T : Eq] @moonbitlang/core/builtin.op_notequal(x : T, y : T) -> Bool
!=
0 {
fn[T] fail(msg : String, loc~ : SourceLoc = _) -> T raise Failure

Raises a Failure error with a given message and source location.

Parameters:

  • message : A string containing the error message to be included in the failure.
  • location : The source code location where the failure occurred. Automatically provided by the compiler when not specified.

Returns a value of type T wrapped in a Failure error type.

Throws an error of type Failure with a message that includes both the source location and the provided error message.

fail
("zip failed with exit code \{
Int
exit_code
}")
} })
?
conn
..
(Int, String, Map[String, String]) -> Unit
send_response
(200, "OK",
Map[String, String]
extra_headers
={
"Content-Type": "application/octet-stream", "Content-Disposition": "filename=\{
String
zip_name
}.zip",
}) ..
(Unit) -> Unit
write_reader
(
Unit
we_read_from_zip
)
..
() -> Unit
end_response
()
}) }

At the beginning of serve_zip, we first compute the file name for the .zip file. Next, we create a new task group using @async.with_task_group. Task group is the core construct for task management in moonbitlang/async, all tasks must be spawned in a task group. But before we get into the details of with_task_group, let's first check out the remaining content of serve_zip. First, we use @process.read_from_process to create a temporary pipe. Data written to one end of the pipe can be read from the other end, so the pipe can be used to obtain the output of a system command. We will pass the write end of the pipe, zip_write_to_us to the zip command, and let zip write the result of compression to zip_write_to_us. Meanwhile, we will read the output of the zip command from the read end of the pipe, we_read_from_zip, and send the result to the user.

To accomplish the above job, we first spawn a new task in the task group using growp.spawn_bg(..). group.spawn_bg(..) accepts a function as argument, and run the function in a new background task, in parallel with other code in the program. Within the new task, we wse @process.run to launch the zip command. @process is the default alias of moonbitlang/async/process, which provides process spawning and manipulation support for moonbitlang/async. The meaning of the arguments of zip is:

  • -q: do not output log
  • -r: recursively compress the whole directory
  • -: write the result of compression to stdout
  • path: the directory to compress

When launching zip with @process.run, the stdout=zip_write_to_us part redirects the stdout of zip to zip_write_to_us, so that we can obtain the output of zip. Compared to creating a temporary .zip file to store the result, using a pipe is more efficient because:

  • the data exchange with zip is completely in-memory, which is more efficient than disk IO
  • we can send partial compression result on-the-fly while zip is still working, reducing latency

@process.run will wait until zip finishes and return the exit code of zip. If the zip command fail with a non-zero exit code, we raise an error.

Outside the new task in spawn_bg, we use conn.send_response(..) to initiate a response to the user, and send the output of zip to the user via conn.write_reader(we_read_from_zip). The Content-Disposition HTTP header allows us to specify the file name for the .zip file. This part of code will be run in parallel with the @process.run task.

So far everything looks reasonable. But why do we need to create a new task group here? Why doesn't moonbitlang/async just provide a global task-spawning API, like many other languages do? There is a phenomenon in async programming: it is relatively easy to write an async program that works correctly when everything goes well, but much harder to write an async program that behaves correctly when things go wrong. For the serve_zip example:

  • what should we do if the zip command fails?
  • what should we do if some network error occurs, or the user closes the connection?

If the zip command fails, the whole serve_zip function should fail too. Since the user already received some incomplete data, it is hard to recover the connection back to normal state, so we have to close the whole connection. If network error occurs when sending data, we should stop the zip command immediately, because its result is no longer useful. Keep the zip command running is just a waste of resource. In the worst case, the pipe for communication with zip may get filled up since we are no longer reading from it, and zip may get blocked forever on writing to the pipe and become a zombie process.

In the code above, we did not perform any explicit error handling. However, when the aforementioned error cases occur, our program can behave correctly and handle all edge cases. The magic lies in the @async.with_task_group function, and the structured concurrency paradigm behind it. The semantic of @async.with_task_group(f) is as follows:

  • it will create a new task group group, and run f(group) inside the new group
  • f can spawn new tasks in the group via group.spawn_bg(..)
  • with_task_group will only return after all tasks inside the group terminates
  • if any task inside the group fails, with_task_group will fail as well, and all other remaining tasks in the group is automatically cancelled

The last point here is the key to ensure correct error handling behavior:

  • if the zip command fails, the task that calls @process.run will raise an error, failing the whole task. The error will be propagated to the whole task group since no one is catching it. with_task_group will automatically cancel the response-sending task, propagate the error upwards and close the connection.
  • if network error occurs, the main response-sending task will fail. The error will also get propagated to the whole task group, and the zip task will be cancelled. When @process.run is cancelled, it automatically terminates the zip command by sending a SIGTERM signal

So, when writing async program using moonbitlang/async, users only need to insert task groups at appropriate places based on the structure of the program, all the remaining error handling details are automatically handled by with_task_group. This is the power of the structured concurrency paradigm of moonbitlang/async: it guides users to write async programs with clearer structure, and makes program behave correctly even when things go wrong.

Run the server

We have implemented all features of the HTTP file server, now we can actually run the server. MoonBit provides native support for async code, users can use async fn main to define entry point to async program, or use async test to test async code directly. Here, we let the HTTP server serve the content of current working directory, and let it listen on port 8000:

async test {
  
async fn server_main(path~ : String, port~ : Int) -> Unit
server_main
(
String
path
=".",
Int
port
=8000)
}

To use the file server, just run the source code of this document via moon test /path/to/this/document.mbt.md, and open the address http://127.0.0.1:8000 in your browser.

Other features of moonbitlang/async can be found in its API document and GitHub repo.

Interacting with JavaScript in MoonBit: A First Look

· 13 min read


Introduction

In today's software world, no programming language ecosystem can be an isolated island. As an emerging general-purpose language, MoonBit's success in the vast technological landscape hinges on its seamless integration with existing ecosystems.

MoonBit provides multiple compilation backends, including JavaScript, which opens the door to the vast JavaScript ecosystem. This integration capability greatly expands MoonBit's application scenarios for both front-end browser development and Node.js applications. It allows developers to leverage the type safety and high performance of MoonBit while reusing a wide range of existing JavaScript libraries.

In this article, using Node.js as our example, we'll explore MoonBit's JavaScript FFI step-by-step. We'll cover various topics from basic function calls to complex type and error handling, demonstrating how to build an elegant bridge between the MoonBit and JavaScript worlds.

Prerequisites

Before we begin, let's configure our project. If you don't have an existing project, you can use the moon new tool to create a new MoonBit project.

To let the MoonBit toolchain know that our target platform is JavaScript, we need to add the following content to the moon.mod.json file in the project's root directory:

{
  "preferred-target": "js"
}

This configuration tells the compiler to use the JavaScript backend by default when executing commands like moon build or moon check. Of course, if you want to specify it temporarily on the command line, you can achieve the same effect with the --target=js option.

Building the Project

After completing the above configuration, simply run the familiar build command in the project's root directory:

> moon build

After the command executes successfully, since our project includes an executable entry by default, you can find the build artifacts in the target/js/debug/build/ directory. MoonBit conveniently generates three files for us:

  • .js file: The compiled JavaScript source code.
  • .js.map file: A Source Map file for debugging.
  • .d.ts file: A TypeScript declaration file, which is convenient for integration into TypeScript projects.

First JavaScript API Call

MoonBit's FFI design is principled and consistent. Similar to calling into C or other languages, we define an external function through a declaration with the extern keyword:

extern "js" fn consoleLog(msg : 
String
String
) ->
Unit
Unit
= "(msg) => console.log(msg)"

This line of code is the core of enabling our FFI call. Let's break it down:

  • extern "js": Declares that this is an external function pointing to the JavaScript environment.

  • fn consoleLog(msg : String) -> Unit: This is the function's type signature in MoonBit. It accepts a parameter of type String and returns a unit value (Unit).

  • "(msg) => console.log(msg)": The string literal on the right side of the equals sign is the essence of this declaration, containing the native JavaScript function to be executed.

    Here, we use a concise arrow function. The MoonBit compiler will embed this code as is into the final generated .js file, enabling the call from MoonBit to JavaScript.

    Tip If your JavaScript code snippet is relatively complex, you can use the #| syntax to define multi-line strings to improve readability.

Once this FFI declaration is ready, we can call consoleLog in our MoonBit code just like a normal function:

test "hello" {
  
fn consoleLog(msg : String) -> Unit
consoleLog
("Hello from JavaScript!")
}

Run moon test, and you will see the message printed by JavaScript's console.log in the console. Our first bridge is successfully built!

Interfacing with JavaScript Types

Establishing the call flow is just the first step. The real challenge lies in handling type differences between the two languages. MoonBit is a statically typed language, while JavaScript is dynamically typed. Establishing a safe and reliable type mapping between them is a key consideration in FFI design.

Below, we'll cover how to interface with different JavaScript types in MoonBit, starting from the easiest cases.

JavaScript Types Requiring No Conversion

The simplest case involves types in MoonBit whose underlying compiled representation in JavaScript corresponds directly to a native JavaScript type. In this case, we can pass them directly without any conversion.

The common "zero-cost" interface types are shown below:

MoonBit TypeCorresponding JavaScript Type
Stringstring
Boolboolean
Int, UInt, Float, Doublenumber
BigIntbigint
BytesUint8Array
Array[T]Array<T>
Function TypeFunction

Based on these mappings, we can bind many simple JavaScript functions. In fact, in the previous example of binding the console.log function, we have already used the correspondence between the String type in MoonBit and the string type in JavaScript.

Note: Maintaining the Internal Invariants of MoonBit Types

A crucial detail is that all of MoonBit's standard numeric types (Int, Float, etc.) map to the number type in JavaScript, i.e., IEEE 754 double-precision floating-point numbers. This means that when an integer value crosses the FFI boundary into JavaScript, its behavior will follow floating-point semantics, which may lead to unexpected results from MoonBit's perspective, such as differences in integer overflow behavior:

extern "js" fn incr(x : 
Int
Int
) ->
Int
Int
= "(x) => x + 1"
test "incr" { // In MoonBit, @int.max_value + 1 will overflow and wrap around
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
let @moonbitlang/core/int.max_value : Int

Maximum value of an integer.

@int.max_value
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1,
String
content
="-2147483648")
// In JavaScript, it is treated as a floating-point number and does not overflow
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn incr(x : Int) -> Int
incr
(
let @moonbitlang/core/int.max_value : Int

Maximum value of an integer.

@int.max_value
),
String
content
="2147483648") // ???
}

This is essentially illegal because, according to the internal invariant of the Int type in MoonBit, its value cannot be 2147483648 (which exceeds the maximum value allowed by the type). This may cause unexpected behavior in other MoonBit code downstream that relies on this point. Similar issues may arise when handling other data types across the FFI boundary, so please be sure to pay attention to this when writing related logic.

External JavaScript Types

Of course, the JavaScript world is much richer than these basic types. We will quickly encounter undefined, null, symbol, and various complex host objects, which have no direct counterparts in MoonBit.

For this situation, MoonBit provides the #external annotation. This annotation acts as a contract, telling the compiler: "Please trust me, this type actually exists in the external world (JavaScript). You don't need to care about its internal structure, just treat it as an opaque handle."

For example, we can define a type that represents JavaScript's undefined like this:

#external
type Undefined

extern "js" fn Undefined::new() -> Self = "() => undefined"

However, a standalone Undefined type isn't very useful, as undefined typically appears as part of a union type, like string | undefined.

A more practical approach is to create an Optional[T] type that precisely maps to T | undefined in JavaScript, and which can be easily converted to and from MoonBit's built-in Option[T] (aliased as T?).

To achieve this, we first need a type to represent any JavaScript value, similar to TypeScript's any. This is where #external is useful:

#external
pub type Value

Consequently, we need methods to get the undefined value and to check if a given value is undefined:

extern "js" fn 
type Value
Value
::undefined() ->
type Value
Value
=
#| () => undefined extern "js" fn
type Value
Value
::is_undefined(self :
type Value
Self
) ->
Bool
Bool
=
#| (n) => Object.is(n, undefined)

For easier debugging, we'll implement the Show trait for our Value type, allowing it to be printed:

pub impl 
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
for
type Value
Value
with
fn Show::output(self : Value, logger : &Logger) -> Unit
output
(
Value
self
,
&Logger
logger
) {
&Logger
logger
.
fn Logger::write_string(&Logger, String) -> Unit
write_string
(
Value
self
.
fn Value::to_string(self : Value) -> String
to_string
())
} pub extern "js" fn
type Value
Value
::to_string(self :
type Value
Value
) ->
String
String
=
#| (self) => #| self === undefined ? 'undefined' #| : self === null ? 'null' #| : self.toString()

Next comes the 'magic' of the conversion process. We'll define two special conversion functions:

fn[T] 
type Value
Value
::cast_from(value :

type parameter T

T
) ->
type Value
Value
= "%identity"
fn[T]
type Value
Value
::cast(self :
type Value
Self
) ->

type parameter T

T
= "%identity"

What is %identity

%identity is a special intrinsic provided by MoonBit for zero-cost type casting. It performs type checking at compile time, but has no effect at runtime. It essentially tells the compiler: "Trust me, I know the real type of this value; just treat it as the target type."

This is a double-edged sword: it provides powerful expressiveness at the FFI boundary, but misuse can break type safety. Therefore, its use should be strictly limited to a FFI-related scope.

With these building blocks, we can construct Optional[T]:

#external
type Optional[_] // Corresponds to T | undefined

/// Create an undefined Optional
fn[T] 
type Optional[_]
Optional
::
fn[T] Optional::undefined() -> Optional[T]

Create an undefined Optional

undefined
() ->
type Optional[_]
Optional
[

type parameter T

T
] {
type Value
Value
::
fn Value::undefined() -> Value
undefined
().
fn[T] Value::cast(self : Value) -> T
cast
()
} /// Check if an Optional is undefined fn[T]
type Optional[_]
Optional
::
fn[T] Optional::is_undefined(self : Optional[T]) -> Bool

Check if an Optional is undefined

is_undefined
(
Optional[T]
self
:
type Optional[_]
Optional
[

type parameter T

T
]) ->
Bool
Bool
{
Optional[T]
self
|>
type Value
Value
::
fn[T] Value::cast_from(value : T) -> Value
cast_from
|>
type Value
Value
::
fn Value::is_undefined(self : Value) -> Bool
is_undefined
} /// Unwrap T from Optional[T], panic if it is undefined fn[T]
type Optional[_]
Optional
::
fn[T] Optional::unwrap(self : Optional[T]) -> T

Unwrap T from Optional[T], panic if it is undefined

unwrap
(
Optional[T]
self
:
type Optional[_]
Self
[

type parameter T

T
]) ->

type parameter T

T
{
guard
Bool
!
Optional[T]
self
Bool
.
fn[T] Optional::is_undefined(self : Optional[T]) -> Bool

Check if an Optional is undefined

is_undefined
Bool
()
else {
fn[T] abort(string : String, loc~ : SourceLoc = _) -> T
abort
("Cannot unwrap an undefined value") }
type Value
Value
::
fn[T] Value::cast_from(value : T) -> Value
cast_from
(
Optional[T]
self
).
fn[T] Value::cast(self : Value) -> T
cast
()
} /// Convert Optional[T] to MoonBit's built-in T? fn[T]
type Optional[_]
Optional
::
fn[T] Optional::to_option(self : Optional[T]) -> T?

Convert Optional[T] to MoonBit's built-in T?

to_option
(
Optional[T]
self
:
type Optional[_]
Optional
[

type parameter T

T
]) ->

type parameter T

T
? {
guard
Bool
!
type Value
Value
Bool
::
fn[T] Value::cast_from(value : T) -> Value
cast_from
Bool
(
Optional[T]
self
Bool
).
fn Value::is_undefined(self : Value) -> Bool
is_undefined
Bool
()
else {
T?
None
}
(T) -> T?
Some
(
type Value
Value
::
fn[T] Value::cast_from(value : T) -> Value
cast_from
(
Optional[T]
self
).
fn[T] Value::cast(self : Value) -> T
cast
())
} /// Create Optional[T] from MoonBit's built-in T? fn[T]
type Optional[_]
Optional
::
fn[T] Optional::from_option(value : T?) -> Optional[T]

Create Optional[T] from MoonBit's built-in T?

from_option
(
T?
value
:

type parameter T

T
?) ->
type Optional[_]
Optional
[

type parameter T

T
] {
guard
T?
value
is
(T) -> T?
Some
(
T
v
) else {
type Optional[_]
Optional
::
fn[T] Optional::undefined() -> Optional[T]

Create an undefined Optional

undefined
() }
type Value
Value
::
fn[T] Value::cast_from(value : T) -> Value
cast_from
(
T
v
).
fn[T] Value::cast(self : Value) -> T
cast
()
} test "Optional from and to Option" { let
Optional[Int]
optional
=
type Optional[_]
Optional
::
fn[T] Optional::from_option(value : T?) -> Optional[T]

Create Optional[T] from MoonBit's built-in T?

from_option
(
(Int) -> Int?
Some
(3))
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Optional[Int]
optional
.
fn[T] Optional::unwrap(self : Optional[T]) -> T

Unwrap T from Optional[T], panic if it is undefined

unwrap
(),
String
content
="3")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Optional[Int]
optional
.
fn[T] Optional::is_undefined(self : Optional[T]) -> Bool

Check if an Optional is undefined

is_undefined
(),
String
content
="false")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Optional[Int]
optional
.
fn[T] Optional::to_option(self : Optional[T]) -> T?

Convert Optional[T] to MoonBit's built-in T?

to_option
(),
String
content
="Some(3)")
let
Optional[Int]
optional
:
type Optional[_]
Optional
[
Int
Int
] =
type Optional[_]
Optional
::
fn[T] Optional::from_option(value : T?) -> Optional[T]

Create Optional[T] from MoonBit's built-in T?

from_option
(
Int?
None
)
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Optional[Int]
optional
.
fn[T] Optional::is_undefined(self : Optional[T]) -> Bool

Check if an Optional is undefined

is_undefined
(),
String
content
="true")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Optional[Int]
optional
.
fn[T] Optional::to_option(self : Optional[T]) -> T?

Convert Optional[T] to MoonBit's built-in T?

to_option
(),
String
content
="None")
}

With this setup, we've successfully crafted a safe and ergonomic representation for T | undefined within MoonBit's type system. The same method can also be used to interface with other JavaScript-specific types like null, symbol, RegExp, etc.

Handling JavaScript Errors

A robust FFI layer must handle errors gracefully. By default, if JavaScript code throws an exception during an FFI call, it won't be caught by MoonBit's try-catch mechanism. Instead, it will crash the entire program:

// This is an FFI call that will throw an exception
extern "js" fn boom_naive() -> Value raise = "(u) => undefined.toString()"

test "boom_naive" {
  // This code will directly crash the test process instead of returning a `Result` via `try?`
  inspect(try? boom_naive()) // failed: TypeError: Cannot read properties of undefined (reading 'toString')
}

The correct approach is to wrap the call in a try...catch block on the JavaScript side, and then pass either the successful result or the caught error back to MoonBit. While we could do this directly in the JavaScript code of our extern "js" declaration, a more reusable solution exists:

First, let's define an Error_ type to encapsulate JavaScript errors:

suberror Error_ 
type Value
Value
pub impl
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
for
suberror Error_ Value
Error_
with
fn Show::output(self : Error_, logger : &Logger) -> Unit
output
(
Error_
self
,
&Logger
logger
) {
&Logger
logger
.
fn Logger::write_string(&Logger, String) -> Unit
write_string
("@js.Error: ")
let
(Value) -> Error_
Error_
(
Value
inner
) =
Error_
self
&Logger
logger
.
fn[Obj : Show] Logger::write_object(self : &Logger, obj : Obj) -> Unit
write_object
(
Value
inner
)
}

Next, we'll define a core FFI wrapper function, Error_::wrap_ffi. Its role is to execute an operation (op) in the JavaScript realm and, depending on the outcome, call either a success (on_ok) or error (on_error) callback:

extern "js" fn 
suberror Error_ Value
Error_
::wrap_ffi(
op : () ->
type Value
Value
,
on_ok : (
type Value
Value
) ->
Unit
Unit
,
on_error : (
type Value
Value
) ->
Unit
Unit
,
) ->
Unit
Unit
=
#| (op, on_ok, on_error) => { try { on_ok(op()); } catch (e) { on_error(e); } }

Finally, using this FFI function and MoonBit closures, we can create a more idiomatic Error_::wrap function that returns a T raise Error_:

fn[T] 
suberror Error_ Value
Error_
::
fn[T] Error_::wrap(op : () -> Value, map_ok? : (Value) -> T) -> T raise Error_
wrap
(
() -> Value
op
: () ->
type Value
Value
,
(Value) -> T
map_ok
~ : (
type Value
Value
) ->

type parameter T

T
=
type Value
Value
::
fn[T] Value::cast(self : Value) -> T
cast
,
) ->

type parameter T

T
raise
suberror Error_ Value
Error_
{
// Define a variable to pass the result in and out of the closure let mut
Result[Value, Error_]
res
:
enum Result[A, B] {
  Err(B)
  Ok(A)
}
Result
[
type Value
Value
,
suberror Error_ Value
Error_
] =
(Value) -> Result[Value, Error_]
Ok
(
type Value
Value
::
fn Value::undefined() -> Value
undefined
())
// Call the FFI, passing two closures that will modify the value of res based on the JS execution result
suberror Error_ Value
Error_
::
fn Error_::wrap_ffi(op : () -> Value, on_ok : (Value) -> Unit, on_error : (Value) -> Unit) -> Unit
wrap_ffi
(
() -> Value
op
, fn(
Value
v
) {
Result[Value, Error_]
res
=
(Value) -> Result[Value, Error_]
Ok
(
Value
v
) }, fn(
Value
e
) {
Result[Value, Error_]
res
=
(Error_) -> Result[Value, Error_]
Err
(
(Value) -> Error_
Error_
(
Value
e
)) })
// Check the value of res and return the corresponding result or throw an error match
Result[Value, Error_]
res
{
(Value) -> Result[Value, Error_]
Ok
(
Value
v
) =>
(Value) -> T
map_ok
(
Value
v
)
(Error_) -> Result[Value, Error_]
Err
(
Error_
e
) => raise
Error_
e
} }

Now, we can safely call the function that previously threw an exception, and we can handle possible errors with pure MoonBit code:

extern "js" fn boom() -> 
type Value
Value
= "(u) => undefined.toString()"
test "boom" { let
Result[Value, Error_]
result
= try?
suberror Error_ Value
Error_
::
fn[T] Error_::wrap(op : () -> Value, map_ok? : (Value) -> T) -> T raise Error_
wrap
(
fn boom() -> Value
boom
)
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(
Result[Value, Error_]
result
:
enum Result[A, B] {
  Err(B)
  Ok(A)
}
Result
[
type Value
Value
,
suberror Error_ Value
Error_
]),
String
content
="Err(@js.Error: TypeError: Cannot read properties of undefined (reading 'toString'))",
) }

Interfacing with External JavaScript APIs

Having mastered the key techniques for bridging types and handling errors, it's time to turn our attention to the wider world: the Node.js and NPM ecosystem. The entry point to all of it is a binding for the require() function:

extern "js" fn require_ffi(path : 
String
String
) ->
type Value
Value
= "(path) => require(path)"
/// A more convenient wrapper that supports chained property access, e.g., require("a", keys=["b", "c"]) pub fn
fn require(path : String, keys? : Array[String]) -> Value

A more convenient wrapper that supports chained property access, e.g., require("a", keys=["b", "c"])

require
(
String
path
:
String
String
,
Array[String]
keys
~ :
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
String
String
] = []) ->
type Value
Value
{
Array[String]
keys
.
fn[A, B] Array::fold(self : Array[A], init~ : B, f : (B, A) -> B raise?) -> B raise?

Fold out values from an array according to certain rules.

Example:

let sum = [1, 2, 3, 4, 5].fold(init=0, (sum, elem) => sum + elem)
assert_eq(sum, 15)
fold
(
Value
init
=
fn require_ffi(path : String) -> Value
require_ffi
(
String
path
),
type Value
Value
::
fn[T] Value::get_with_string(self : Value, key : String) -> T
get_with_string
)
} // ... where the definition of Value::get_with_string is as follows: fn[T]
type Value
Value
::
fn[T] Value::get_with_string(self : Value, key : String) -> T
get_with_string
(
Value
self
:
type Value
Self
,
String
key
:
String
String
) ->

type parameter T

T
{
Value
self
.
fn Value::get_ffi(self : Value, key : Value) -> Value
get_ffi
(
type Value
Value
::
fn[T] Value::cast_from(value : T) -> Value
cast_from
(
String
key
)).
fn[T] Value::cast(self : Value) -> T
cast
()
} extern "js" fn
type Value
Value
::get_ffi(self :
type Value
Self
, key :
type Value
Self
) ->
type Value
Self
= "(obj, key) => obj[key]"

With this require function, we can easily load Node.js's built-in modules, such as the node:path module, and call its methods:

// Load the basename function of the node:path module
let 
(String) -> String
basename
: (
String
String
) ->
String
String
=
fn require(path : String, keys~ : Array[String]) -> Value

A more convenient wrapper that supports chained property access, e.g., require("a", keys=["b", "c"])

require
("node:path",
Array[String]
keys
=["basename"]).
fn[T] Value::cast(self : Value) -> T
cast
()
test "require Node API" {
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
let basename : (String) -> String
basename
("/foo/bar/baz/asdf/quux.html"),
String
content
="quux.html")
}

More excitingly, we can use the same method to call the vast collection of third-party libraries on NPM. Let's take a popular statistical calculation library simple-statistics as an example.

First, we need to initialize package.json and install dependencies, just like in a standard JavaScript project. Here we use pnpm, you can also use npm or yarn:

> pnpm init
> pnpm install simple-statistics

Once the preparation is complete, we can directly require this library in our MoonBit code and get the standardDeviation function from it:

let 
(Array[Double]) -> Double
standard_deviation
: (
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
Double
Double
]) ->
Double
Double
=
fn require(path : String, keys~ : Array[String]) -> Value

A more convenient wrapper that supports chained property access, e.g., require("a", keys=["b", "c"])

require
(
"simple-statistics",
Array[String]
keys
=["standardDeviation"],
).
fn[T] Value::cast(self : Value) -> T
cast
()

Now, whether we use moon run or moon test, MoonBit can correctly load dependencies via Node.js and execute the code, returning the expected result.

test "require external lib" {
  
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
let standard_deviation : (Array[Double]) -> Double
standard_deviation
([2, 4, 4, 4, 5, 5, 7, 9]),
String
content
="2")
}

This is quite powerful: with just a few lines of FFI code, we've connected MoonBit's type-safe world with NPM's vast and mature ecosystem.

Conclusion

In this article, we've explored the fundamentals of interacting with JavaScript in MoonBit, from the most basic type interfacing to complex error handling, and finally to the easy integration of external libraries. These features bridge the gap between MoonBit's static type system and JavaScript's dynamic typing, reflecting a modern approach to cross-language interoperability, while allowing developers to enjoy the type safety and modern features of MoonBit while seamlessly accessing the vast JavaScript ecosystem, opening up immense application prospects.

Of course, with great power comes great responsibility. While the FFI is powerful, we must handle type conversions and error boundaries carefully to ensure program robustness.

Mastering these FFI techniques is a crucial skill for developers wanting to extend MoonBit applications with JavaScript libraries. By applying these techniques, we can build high-quality applications that leverage both the strengths of MoonBit and the rich resources of the JavaScript ecosystem.

To learn more about MoonBit's ongoing progress in JavaScript interoperability, please check out the web frontend of mooncakes.io and its underlying UI library, rabbit-tea, both built with MoonBit.

Two Approaches to Regex Engines: Derivative and Thompson VM

· 11 min read

Regular expression engines can be implemented using fundamentally different approaches, each with distinct trade-offs in performance, memory usage, and implementation complexity. This article explores two mathematically equivalent but practically different methods for regex matching: Brzozowski derivatives and Thompson's virtual machine approach.

Both methods operate on the same abstract syntax tree representation, providing a unified foundation for direct performance comparison. The key insight is how these seemingly different approaches solve identical problems through different computational strategies—one through algebraic transformation, the other through program execution.

Conventions & Definitions

To establish a common foundation, both regex engines start with a shared AST representation that captures the essential structure of regular expressions in a tree format:

enum Ast {
  
(Char) -> Ast
Chr
(
Char
Char
)
(Ast, Ast) -> Ast
Seq
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
)
(Ast, Int?) -> Ast
Rep
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Int
Int
?)
(Ast) -> Ast
Opt
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
)

Additionally, we provide smart constructors to simplify regex construction:

fn 
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
fn Ast::chr(chr : Char) -> Ast
chr
(
Char
chr
:
Char
Char
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Char) -> Ast
Chr
(
Char
chr
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
fn Ast::seq(self : Ast, other : Ast) -> Ast
seq
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Ast
other
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Ast, Ast) -> Ast
Seq
(
Ast
self
,
Ast
other
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
fn Ast::rep(self : Ast, n? : Int) -> Ast
rep
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Int?
n
? :
Int
Int
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Ast, Int?) -> Ast
Rep
(
Ast
self
,
Int?
n
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
fn Ast::opt(self : Ast) -> Ast
opt
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
Unit
@fs.
(Ast) -> Ast
Opt
(
Ast
self
)
}

The AST defines four fundamental regex operations:

  1. Chr(Char) matches a single literal character.
  2. Seq(Ast, Ast) matches one pattern followed by another through concatenation.
  3. Rep(Ast, Int?) repeats a pattern either unlimited times when None or exactly n times when Some(n).
  4. Opt(Ast) makes a pattern optional, equivalent to pattern? in standard regex syntax.

For example, we can build the regex (ab*)?—an optional sequence of 'a' followed by zero or more 'b's—as:

Ast::chr('a').seq(Ast::chr('b').rep()).opt()

Brzozowski Derivative

The derivative-based approach transforms regular expressions algebraically using formal language theory. For each input character, it computes the "derivative" of the regex by asking: "what remains to be matched after consuming this character?" This creates a new regex representing the remaining pattern.

We extend the basic Ast type to represent derivatives and nullability explicitly:

enum Exp {
  
Exp
Nil
Exp
Eps
(Char) -> Exp
Chr
(
Char
Char
)
(Exp, Exp) -> Exp
Alt
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
(Exp, Exp) -> Exp
Seq
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
(Exp) -> Exp
Rep
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait Compare {
  compare(Self, Self) -> Int
  op_lt(Self, Self) -> Bool
  op_gt(Self, Self) -> Bool
  op_le(Self, Self) -> Bool
  op_ge(Self, Self) -> Bool
}

Trait for types whose elements are ordered

The return value of [compare] is:

  • zero, if the two arguments are equal
  • negative, if the first argument is smaller
  • positive, if the first argument is greater
Compare
)

The constructors in Exp represent:

  1. Nil represents an impossible pattern that can never match anything.
  2. Eps matches the empty string.
  3. Chr(Char) matches a single character.
  4. Alt(Exp, Exp) represents alternation, providing choice between patterns.
  5. Seq(Exp, Exp) represents concatenation of two patterns.
  6. Rep(Exp) represents repetition of a pattern.

We use the Exp::of_ast function to convert the Ast into the more expressive Exp format:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::of_ast(ast : Ast) -> Exp
of_ast
(
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match
Ast
ast
{
(Char) -> Ast
Chr
(
Char
c
) =>
(Char) -> Exp
Chr
(
Char
c
)
(Ast, Ast) -> Ast
Seq
(
Ast
a
,
Ast
b
) =>
(Exp, Exp) -> Exp
Seq
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::of_ast(ast : Ast) -> Exp
of_ast
(
Ast
a
),
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::of_ast(ast : Ast) -> Exp
of_ast
(
Ast
b
))
(Ast, Int?) -> Ast
Rep
(
Ast
a
,
Int?
None
) =>
(Exp) -> Exp
Rep
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::of_ast(ast : Ast) -> Exp
of_ast
(
Ast
a
))
(Ast, Int?) -> Ast
Rep
(
Ast
a
,
(Int) -> Int?
Some
(
Int
n
)) => {
let
Exp
sec
=
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::of_ast(ast : Ast) -> Exp
of_ast
(
Ast
a
)
let mut
Exp
exp
=
Exp
sec
for _ in
Int
1
..<
Int
n
{
Exp
exp
=
(Exp, Exp) -> Exp
Seq
(
Exp
exp
,
Exp
sec
)
}
Exp
exp
}
(Ast) -> Ast
Opt
(
Ast
a
) =>
(Exp, Exp) -> Exp
Alt
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::of_ast(ast : Ast) -> Exp
of_ast
(
Ast
a
),
Exp
Eps
)
} }

We also provide smart constructors for Exp to simplify pattern building:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::seq(a : Exp, b : Exp) -> Exp
seq
(
Exp
a
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Exp
b
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match (
Exp
a
,
Exp
b
) {
(
Exp
Nil
, _) | (_,
Exp
Nil
) =>
Exp
Nil
(
Exp
Eps
,
Exp
b
) =>
Exp
b
(
Exp
a
,
Exp
Eps
) =>
Exp
a
(
Exp
a
,
Exp
b
) =>
(Exp, Exp) -> Exp
Seq
(
Exp
a
,
Exp
b
)
} }

However, the smart constructor for Alt is strictly necessary—it ensures that the constructed Exp is normalized to "similarity" as mentioned in the original paper by Brzozowski. Two regexes are similar if one can be reduced to the other by applying the following rules:

AAABBAA(BC)(AB)C \begin{align} & A \mid \emptyset &&\rightarrow A \\ & A \mid B &&\rightarrow B \mid A \\ & A \mid (B \mid C) &&\rightarrow (A \mid B) \mid C \end{align}

Therefore, we normalize the Alt construction to always use the same associativity and order of alternatives:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::alt(a : Exp, b : Exp) -> Exp
alt
(
Exp
a
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Exp
b
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match (
Exp
a
,
Exp
b
) {
(
Exp
Nil
,
Exp
b
) =>
Exp
b
(
Exp
a
,
Exp
Nil
) =>
Exp
a
(
(Exp, Exp) -> Exp
Alt
(
Exp
a
,
Exp
b
),
Exp
c
) =>
Exp
a
.
fn Exp::alt(a : Exp, b : Exp) -> Exp
alt
(
Exp
b
.
fn Exp::alt(a : Exp, b : Exp) -> Exp
alt
(
Exp
c
))
(
Exp
a
,
Exp
b
) => {
if
Exp
a
(Exp, Exp) -> Bool

automatically derived

==
Exp
b
{
Exp
a
} else if
Exp
a
(x : Exp, y : Exp) -> Bool
>
Exp
b
{
(Exp, Exp) -> Exp
Alt
(
Exp
b
,
Exp
a
)
} else {
(Exp, Exp) -> Exp
Alt
(
Exp
a
,
Exp
b
)
} } } }

The nullable function determines if a pattern can match the empty string without consuming input:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::nullable(self : Exp) -> Bool
nullable
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
Bool
Bool
{
match
Exp
self
{
Exp
Nil
=> false
Exp
Eps
=> true
(Char) -> Exp
Chr
(_) => false
(Exp, Exp) -> Exp
Alt
(
Exp
l
,
Exp
r
) =>
Exp
l
.
fn Exp::nullable(self : Exp) -> Bool
nullable
()
(Bool, Bool) -> Bool
||
Exp
r
.
fn Exp::nullable(self : Exp) -> Bool
nullable
()
(Exp, Exp) -> Exp
Seq
(
Exp
l
,
Exp
r
) =>
Exp
l
.
fn Exp::nullable(self : Exp) -> Bool
nullable
()
(Bool, Bool) -> Bool
&&
Exp
r
.
fn Exp::nullable(self : Exp) -> Bool
nullable
()
(Exp) -> Exp
Rep
(_) => true
} }

The deriv function computes the derivative of a pattern with respect to a character, transforming the pattern based on the rules defined in the Brzozowski derivative. We have reordered the rules to match the order in the deriv function:

Da=Daϵ=Daa=ϵDab= for (ab)Da(PQ)=(DaP)(DaQ)Da(PQ)=(DaPQ)(ν(P)DaQ)Da(P)=DaPP \begin{align} D_{a} \emptyset &= \emptyset \\ D_{a} \epsilon &= \emptyset \\ D_{a} a &= \epsilon \\ D_{a} b &= \emptyset & \text{ for }(a \neq b) \\ D_{a} (P \mid Q) &= (D_{a} P) \mid (D_{a} Q) \\ D_{a} (P \cdot Q) &= (D_{a} P \cdot Q) \mid (\nu(P) \cdot D_{a} Q) \\ D_{a} (P\ast) &= D_{a} P \cdot P\ast \\ \end{align}
fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::deriv(self : Exp, c : Char) -> Exp
deriv
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Char
c
:
Char
Char
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match
Exp
self
{
Exp
Nil
=>
Exp
self
Exp
Eps
=>
Exp
Nil
(Char) -> Exp
Chr
(
Char
d
) if
Char
d
fn Eq::equal(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
==
Char
c
=>
Exp
Eps
(Char) -> Exp
Chr
(_) =>
Exp
Nil
(Exp, Exp) -> Exp
Alt
(
Exp
l
,
Exp
r
) =>
Exp
l
.
fn Exp::deriv(self : Exp, c : Char) -> Exp
deriv
(
Char
c
).
fn Exp::alt(a : Exp, b : Exp) -> Exp
alt
(
Exp
r
.
fn Exp::deriv(self : Exp, c : Char) -> Exp
deriv
(
Char
c
))
(Exp, Exp) -> Exp
Seq
(
Exp
l
,
Exp
r
) => {
let
Exp
dl
=
Exp
l
.
fn Exp::deriv(self : Exp, c : Char) -> Exp
deriv
(
Char
c
)
if
Exp
l
.
fn Exp::nullable(self : Exp) -> Bool
nullable
() {
Exp
dl
.
fn Exp::seq(a : Exp, b : Exp) -> Exp
seq
(
Exp
r
).
fn Exp::alt(a : Exp, b : Exp) -> Exp
alt
(
Exp
r
.
fn Exp::deriv(self : Exp, c : Char) -> Exp
deriv
(
Char
c
))
} else {
Exp
dl
.
fn Exp::seq(a : Exp, b : Exp) -> Exp
seq
(
Exp
r
)
} }
(Exp) -> Exp
Rep
(
Exp
e
) =>
Exp
e
.
fn Exp::deriv(self : Exp, c : Char) -> Exp
deriv
(
Char
c
).
fn Exp::seq(a : Exp, b : Exp) -> Exp
seq
(
Exp
self
)
} }

To simplify our implementation, we only perform strict matching—the pattern must match the entire input string. Therefore, we only check for nullability after the entire input has been consumed:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::matches(self : Exp, s : String) -> Bool
matches
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
String
s
:
String
String
) ->
Bool
Bool
{
loop (
Exp
self
,
String
s
.
fn String::view(self : String, start_offset? : Int, end_offset? : Int) -> StringView

Creates a View into a String.

Example

  let str = "Hello🤣🤣🤣"
  let view1 = str.view()
  inspect(view1, content=
   "Hello🤣🤣🤣"
  )
  let start_offset = str.offset_of_nth_char(1).unwrap()
  let end_offset = str.offset_of_nth_char(6).unwrap() // the second emoji
  let view2 = str.view(start_offset~, end_offset~)
  inspect(view2, content=
   "ello🤣"
  )
view
()) {
(
Exp
Nil
, _) => {
return false } (
Exp
e
, []) => {
return
Exp
e
.
fn Exp::nullable(self : Exp) -> Bool
nullable
()
} (
Exp
e
,
StringView
[
Char
c
StringView
, .. s]
) => {
continue (
Exp
e
.
fn Exp::deriv(self : Exp, c : Char) -> Exp
deriv
(
Char
c
),
StringView
s
)
} } }

Virtual Machine

The VM approach compiles regular expressions into bytecode instructions for a simple virtual machine. This method transforms the pattern-matching problem into program execution, where the VM simulates all possible paths through a non-deterministic finite automaton simultaneously.

Ken Thompson's 1968 paper described a regex engine that compiled patterns into IBM 7094 machine code. The key insight was to avoid exponential backtracking by maintaining multiple execution threads that advance through input in lockstep, processing one character at a time across all possible matching paths.

Instruction Set and Program Representation

The VM operates on four fundamental instructions that correspond to NFA operations:

enum Ops {
  
Ops
Done
(Char) -> Ops
Char
(
Char
Char
)
(Int) -> Ops
Jump
(
Int
Int
)
(Int) -> Ops
Fork
(
Int
Int
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)

Each instruction serves a specific purpose in NFA simulation. Done marks successful completion of pattern matching, equivalent to Thompson's original match. Char(c) consumes input character c and advances to the next instruction. Jump(addr) provides unconditional jump to instruction at address addr (Thompson's jmp). Fork(addr) creates two execution paths—one continues to the next instruction, another jumps to addr (Thompson's split).

The Fork instruction is crucial for handling non-determinism in patterns like alternation and repetition, where multiple execution paths must be explored simultaneously. This maps directly to NFA ε-transitions, where execution can spontaneously branch without consuming input.

We define a Prg that wraps an array of instructions with convenience methods for building and manipulating bytecode programs.

struct Prg(
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
]) derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
fn
type Prg Array[Ops] derive(Show)
Prg
::
fn Prg::push(self : Prg, inst : Ops) -> Unit
push
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
Ops
inst
:
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
) ->
Unit
Unit
{
Prg
self
.
Array[Ops]
0
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Ops
inst
)
} fn
type Prg Array[Ops] derive(Show)
Prg
::
fn Prg::length(self : Prg) -> Int
length
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
) ->
Int
Int
{
Prg
self
.
Array[Ops]
0
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
()
} fn
type Prg Array[Ops] derive(Show)
Prg
::
fn Prg::op_set(self : Prg, index : Int, inst : Ops) -> Unit
op_set
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
Int
index
:
Int
Int
,
Ops
inst
:
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
) ->
Unit
Unit
{
Prg
self
.
Array[Ops]

Sets the element at the specified index in the array to a new value. The original value at that index is overwritten.

Parameters:

  • array : The array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws an error if index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
arr[1] = 42
inspect(arr, content="[1, 42, 3]")
0
fn[T] Array::op_set(self : Array[T], index : Int, value : T) -> Unit

Sets the element at the specified index in the array to a new value. The original value at that index is overwritten.

Parameters:

  • array : The array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws an error if index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
arr[1] = 42
inspect(arr, content="[1, 42, 3]")
[
index] =
Ops
inst
}

AST Compilation to Bytecode

The Prg::of_ast function translates AST patterns into VM instructions using standard NFA construction techniques:

  1. Seq(a, b):

    code for a
    code for b
    
  2. Rep(a, None) (unbounded repetition):

        Fork L1, L2
    L1: code for a
        Jump L1
    L2:
    
  3. Rep(a, Some(n)) (fixed repetition):

    code for a
    code for a
    ... (n times) ...
    
  4. Opt(a) (optional):

        Fork L1, L2
    L1: code for a
    L2:
    

Note that the Fork constructor only accepts one address, because we always want to proceed to the next instruction after the Fork.

fn 
type Prg Array[Ops] derive(Show)
Prg
::
fn Prg::of_ast(ast : Ast) -> Prg
of_ast
(
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
type Prg Array[Ops] derive(Show)
Prg
{
fn
(Prg, Ast) -> Unit
compile
(
Prg
prog
:
type Prg Array[Ops] derive(Show)
Prg
,
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
Unit
Unit
{
match
Ast
ast
{
(Char) -> Ast
Chr
(
Char
chr
) =>
Prg
prog
.
fn Prg::push(self : Prg, inst : Ops) -> Unit
push
(
(Char) -> Ops
Char
(
Char
chr
))
(Ast, Ast) -> Ast
Seq
(
Ast
l
,
Ast
r
) => {
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
l
)
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
r
)
}
(Ast, Int?) -> Ast
Rep
(
Ast
e
,
Int?
None
) => {
let
Int
fork
=
Prg
prog
.
fn Prg::length(self : Prg) -> Int
length
()
Prg
prog
.
fn Prg::push(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Fork
(0))
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
Prg
prog
.
fn Prg::push(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Jump
(
Int
fork
))
Prg
prog
fn Prg::op_set(self : Prg, index : Int, inst : Ops) -> Unit
[
fork] =
(Int) -> Ops
Fork
(
Prg
prog
.
fn Prg::length(self : Prg) -> Int
length
())
}
(Ast, Int?) -> Ast
Rep
(
Ast
e
,
(Int) -> Int?
Some
(
Int
n
)) =>
for _ in
Int
0
..<
Int
n
{
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
}
(Ast) -> Ast
Opt
(
Ast
e
) => {
let
Int
fork_inst
=
Prg
prog
.
fn Prg::length(self : Prg) -> Int
length
()
Prg
prog
.
fn Prg::push(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Fork
(0))
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
Prg
prog
fn Prg::op_set(self : Prg, index : Int, inst : Ops) -> Unit
[
fork_inst] =
(Int) -> Ops
Fork
(
Prg
prog
.
fn Prg::length(self : Prg) -> Int
length
())
} } } let
Prg
prog
:
type Prg Array[Ops] derive(Show)
Prg
= []
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
ast
)
Prg
prog
.
fn Prg::push(self : Prg, inst : Ops) -> Unit
push
(
Ops
Done
)
Prg
prog
}

VM Execution Loop

In Rob Pike's implementation, the VM executes one-past the end of the input string to handle the final acceptance state. To make this explicit, our matches function implements the core VM execution loop using a two-phase approach:

Phase 1 handles character processing. For each input character, it processes all active threads in the current context. Char instructions that match the current character create new threads in the next context. Jump and Fork instructions immediately spawn new threads in the current context. After processing all threads, it swaps contexts and continues with the next character.

Phase 2 handles final acceptance. After consuming all input, it processes remaining threads looking for Done instructions. It handles any final Jump/Fork instructions that don't consume input. It returns true if any thread reaches a Done instruction.

fn 
type Prg Array[Ops] derive(Show)
Prg
::
fn Prg::matches(self : Prg, data : StringView) -> Bool
matches
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
StringView
data
:
type StringView
@string.View
) ->
Bool
Bool
{
let
(Array[Ops]) -> Prg
Prg
(
Array[Ops]
prog
) =
Prg
self
let mut
Ctx
curr
=
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
fn Ctx::new(length : Int) -> Ctx
new
(
Array[Ops]
prog
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
())
let mut
Ctx
next
=
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
fn Ctx::new(length : Int) -> Ctx
new
(
Array[Ops]
prog
.
fn[T] Array::length(self : Array[T]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length
())
Ctx
curr
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(0)
for
Char
c
in
StringView
data
{
while
Ctx
curr
.
fn Ctx::pop(self : Ctx) -> Int?
pop
() is
(Int) -> Int?
Some
(
Int
pc
) {
match
Array[Ops]
prog
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
pc] {
Ops
Done
=> ()
(Char) -> Ops
Char
(
Char
char
) if
Char
char
fn Eq::equal(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
==
Char
c
=> {
Ctx
next
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
}
(Int) -> Ops
Jump
(
Int
jump
) =>
Ctx
curr
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Int
jump
)
(Int) -> Ops
Fork
(
Int
fork
) => {
Ctx
curr
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Int
fork
)
Ctx
curr
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
} _ => () } } let
Ctx
temp
=
Ctx
curr
Ctx
curr
=
Ctx
next
Ctx
next
=
Ctx
temp
Ctx
next
.
fn Ctx::reset(self : Ctx) -> Unit
reset
()
} while
Ctx
curr
.
fn Ctx::pop(self : Ctx) -> Int?
pop
() is
(Int) -> Int?
Some
(
Int
pc
) {
match
Array[Ops]
prog
fn[T] Array::op_get(self : Array[T], index : Int) -> T

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

let arr = [1, 2, 3]
inspect(arr[1], content="2")
[
pc] {
Ops
Done
=> return true
(Int) -> Ops
Jump
(
Int
x
) =>
Ctx
curr
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Int
x
)
(Int) -> Ops
Fork
(
Int
x
) => {
Ctx
curr
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Int
x
)
Ctx
curr
.
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
} _ => () } } false }

In the original blog post, Rob Pike uses a recursive function to handle Fork and Jump instructions so that threads are executed according to their priorities. Instead, we use a stack-like structure to manage all threads of execution, which naturally respects thread priority:

struct Ctx {
  
@deque.Deque[Int]
deque
:
#alias(T, deprecated="`T` is deprecated, use `Deque` instead")
type @deque.Deque[A]
@deque.Deque
[
Int
Int
]
FixedArray[Bool]
visit
:
type FixedArray[A]
FixedArray
[
Bool
Bool
]
} fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
fn Ctx::new(length : Int) -> Ctx
new
(
Int
length
:
Int
Int
) ->
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
{
{
@deque.Deque[Int]
deque
:
fn[A] @moonbitlang/core/deque.new(capacity? : Int) -> @deque.Deque[A]

Creates a new empty deque with an optional initial capacity.

Parameters:

  • capacity : The initial capacity of the deque. If not specified, defaults to 0 and will be automatically adjusted as elements are added.

Returns a new empty deque of type T[A] where A is the type of elements the deque will hold.

Example

let dq : @deque.Deque[Int] = @deque.new()
inspect(dq.length(), content="0")
inspect(dq.capacity(), content="0")
let dq : @deque.Deque[Int] = @deque.new(capacity=10)
inspect(dq.length(), content="0")
inspect(dq.capacity(), content="10")
@deque.new
(),
FixedArray[Bool]
visit
:
type FixedArray[A]
FixedArray
::
fn[T] FixedArray::make(len : Int, init : T) -> FixedArray[T]

Creates a new fixed-size array with the specified length, initializing all elements with the given value.

Parameters:

  • length : The length of the array to create. Must be non-negative.
  • initial_value : The value used to initialize all elements in the array.

Returns a new fixed-size array of type FixedArray[T] with length elements, where each element is initialized to initial_value.

Throws a panic if length is negative.

Example:

let arr = FixedArray::make(3, 42)
inspect(arr[0], content="42")
inspect(arr.length(), content="3")

WARNING: A common pitfall is creating with the same initial value, for example:

let two_dimension_array = FixedArray::make(10, FixedArray::make(10, 0))
two_dimension_array[0][5] = 10
assert_eq(two_dimension_array[5][5], 10)

This is because all the cells reference to the same object (the FixedArray[Int] in this case). One should use makei() instead which creates an object for each index.

make
(
Int
length
, false) }
} fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
fn Ctx::add(self : Ctx, pc : Int) -> Unit
add
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
,
Int
pc
:
Int
Int
) ->
Unit
Unit
{
if
Bool
!
Ctx
self
Bool
.
FixedArray[Bool]
visit
fn[T] FixedArray::op_get(self : FixedArray[T], idx : Int) -> T

Retrieves an element at the specified index from a fixed-size array. This function implements the array indexing operator [].

Parameters:

  • array : The fixed-size array to access.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Panics if the index is out of bounds.

Example:

let arr = FixedArray::make(3, 42)
inspect(arr[1], content="42")
[
Bool
pc]
{
Ctx
self
.
@deque.Deque[Int]
deque
.
fn[A] @deque.Deque::push_back(self : @deque.Deque[A], value : A) -> Unit

Adds an element to the back of the deque.

If the deque is at capacity, it will be reallocated.

Example

  let dv = @deque.from_array([1, 2, 3, 4, 5])
  dv.push_back(6)
  assert_eq(dv.back(), Some(6))
push_back
(
Int
pc
)
Ctx
self
.
FixedArray[Bool]
visit
fn[T] FixedArray::op_set(self : FixedArray[T], idx : Int, val : T) -> Unit

Sets the value at the specified index in a fixed-size array.

Parameters:

  • array : The fixed-size array to be modified.
  • index : The index at which to set the value. Must be non-negative and less than the array's length.
  • value : The value to be set at the specified index.

Throws a runtime error if the index is out of bounds (less than 0 or greater than or equal to the array's length).

Example:

let arr = FixedArray::make(3, 0)
arr.set(1, 42)
inspect(arr[1], content="42")
[
pc] = true
} } fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
fn Ctx::pop(self : Ctx) -> Int?
pop
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
) ->
Int
Int
? {
match
Ctx
self
.
@deque.Deque[Int]
deque
.
fn[A] @deque.Deque::pop_back(self : @deque.Deque[A]) -> A?

Removes a back element from a deque and returns it, or None if it is empty.

Example

  let dv = @deque.from_array([1, 2, 3, 4, 5])
  assert_eq(dv.pop_back(), Some(5))
pop_back
() {
(Int) -> Int?
Some
(
Int
pc
) => {
Ctx
self
.
FixedArray[Bool]
visit
fn[T] FixedArray::op_set(self : FixedArray[T], idx : Int, val : T) -> Unit

Sets the value at the specified index in a fixed-size array.

Parameters:

  • array : The fixed-size array to be modified.
  • index : The index at which to set the value. Must be non-negative and less than the array's length.
  • value : The value to be set at the specified index.

Throws a runtime error if the index is out of bounds (less than 0 or greater than or equal to the array's length).

Example:

let arr = FixedArray::make(3, 0)
arr.set(1, 42)
inspect(arr[1], content="42")
[
pc] = false
(Int) -> Int?
Some
(
Int
pc
)
}
Int?
None
=>
Int?
None
} } fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
fn Ctx::reset(self : Ctx) -> Unit
reset
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
) ->
Unit
Unit
{
Ctx
self
.
@deque.Deque[Int]
deque
.
fn[A] @deque.Deque::clear(self : @deque.Deque[A]) -> Unit

Clears the deque, removing all values.

This method has no effect on the allocated capacity of the deque, only setting the length to 0.

Example

  let dv = @deque.from_array([1, 2, 3, 4, 5])
  dv.clear()
  inspect(dv.length(), content="0")
clear
()
Ctx
self
.
FixedArray[Bool]
visit
.
fn[T] FixedArray::fill(self : FixedArray[T], value : T, start? : Int, end? : Int) -> Unit

Fill the array with a given value.

This method fills all or part of a FixedArray with the given value.

Parameters

  • value: The value to fill the array with
  • start: The starting index (inclusive, default: 0)
  • end: The ending index (exclusive, optional)

If end is not provided, fills from start to the end of the array. If start equals end, no elements are modified.

Panics

  • Panics if start is negative or greater than or equal to the array length
  • Panics if end is provided and is less than start or greater than array length
  • Does nothing if the array is empty

Example

// Fill entire array
let fa : FixedArray[Int] = [0, 0, 0, 0, 0]
fa.fill(3)
inspect(fa, content="[3, 3, 3, 3, 3]")

// Fill from index 1 to 3 (exclusive)
let fa2 : FixedArray[Int] = [0, 0, 0, 0, 0]
fa2.fill(9, start=1, end=3)
inspect(fa2, content="[0, 9, 9, 0, 0]")

// Fill from index 2 to end
let fa3 : FixedArray[String] = ["a", "b", "c", "d"]
fa3.fill("x", start=2)
inspect(
  fa3,
  content=(
    #|["a", "b", "x", "x"]
  ),
)
fill
(false)
}

The visit array is used to drop low-priority threads. When a new thread is added, we first check if it is already in the deque using the visit array. If it is, we drop it; otherwise, we add it to the deque and mark it as visited. This mechanism is necessary to avoid infinite loops or exponential blowup when the regex contains patterns that can be expanded indefinitely, such as (a?)*.

Benchmarks and Performance Analysis

The benchmark demonstrates both approaches on a pathological case that challenges many regex implementations:

test (
@bench.Bench
b
:
#alias(T)
type @bench.Bench
@bench.T
) {
let
Int
n
= 15
let
String
txt
= "a".
fn String::repeat(self : String, n : Int) -> String

Returns a new string with self repeated n times.

repeat
(
Int
n
)
let
Ast
chr
=
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
fn Ast::chr(chr : Char) -> Ast
chr
('a')
let
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
=
Ast
chr
.
fn Ast::opt(self : Ast) -> Ast
opt
().
fn Ast::rep(self : Ast, n~ : Int) -> Ast
rep
(
Int
n
~).
fn Ast::seq(self : Ast, other : Ast) -> Ast
seq
(
Ast
chr
.
fn Ast::rep(self : Ast, n~ : Int) -> Ast
rep
(
Int
n
~))
let
Exp
exp
=
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
fn Exp::of_ast(ast : Ast) -> Exp
of_ast
(
Ast
ast
)
@bench.Bench
b
.
fn @bench.Bench::bench(self : @bench.Bench, name~ : String, f : () -> Unit, count? : UInt) -> Unit

Run a benchmark in batch mode

bench
(
String
name
="derive", () =>
Exp
exp
.
fn Exp::matches(self : Exp, s : String) -> Bool
matches
(
String
txt
) |>
fn[T] ignore(t : T) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

let x = 42
ignore(x) // Explicitly ignore the value
let mut sum = 0
ignore([1, 2, 3].iter().each(x => sum = sum + x)) // Ignore the Unit return value of each()
ignore
())
let
Prg
tvm
=
type Prg Array[Ops] derive(Show)
Prg
::
fn Prg::of_ast(ast : Ast) -> Prg
of_ast
(
Ast
ast
)
@bench.Bench
b
.
fn @bench.Bench::bench(self : @bench.Bench, name~ : String, f : () -> Unit, count? : UInt) -> Unit

Run a benchmark in batch mode

bench
(
String
name
="thompson", () =>
Prg
tvm
.
fn Prg::matches(self : Prg, data : StringView) -> Bool
matches
(
String
txt
) |>
fn[T] ignore(t : T) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

let x = 42
ignore(x) // Explicitly ignore the value
let mut sum = 0
ignore([1, 2, 3].iter().each(x => sum = sum + x)) // Ignore the Unit return value of each()
ignore
())
}

This pattern (a?){n}a{n} represents a classical exponential blowup case for backtracking engines. The pattern allows n different ways to match n 'a' characters, creating exponential search spaces in naive implementations.

name     time (mean ± σ)         range (min … max)
derive     41.78 µs ±   0.14 µs    41.61 µs …  42.13 µs  in 10 ×   2359 runs
thompson   12.79 µs ±   0.04 µs    12.74 µs …  12.84 µs  in 10 ×   7815 runs

The benchmark results show that the VM approach is significantly faster than the derivative-based approach for this case. The derivative method frequently allocates intermediate regex structures, leading to higher overhead and slower performance. In contrast, the VM executes a fixed set of instructions and rarely allocates new structures once the deque grows to its full size.

However, the derivative approach is easier to reason about. We can easily prove termination of the algorithm, as the number of derivatives to be computed is bounded by the size of the AST and strictly decreases with each recursive application of the deriv function. The VM approach, on the other hand, can potentially run indefinitely if the input Prg contains infinite loops, and requires careful handling of thread priority to avoid infinite loops and exponential blowup in the number of threads.

Prettyprinter: Declarative Structured Data Formatting with Function Composition

· 8 min read

When working with structured data, printing it in a clear and adaptable format is a common challenge. This comes up often in debugging, logging, and code generation. For instance, an array literal [a,b,c] should ideally print on one line if the screen is wide enough, but gracefully wrap and indent when space is limited.

Traditional solutions often rely on manually concatenating strings while tracking indentation levels. This approach is not only tedious, but also error-prone.

A more elegant solution is to use function composition. With this approach, we build a prettyprinter: a system where users combine primitive formatting functions into a Doc structure that describes the intended layout. Given a maximum width, the prettyprinter automatically chooses the most readable formatting.

This makes the printing process declarative—you specify what the layout should look like under different conditions, and the system figures out how to render it.

SimpleDoc Primitives

We begin with a minimal representation called SimpleDoc. It consists of just four primitives:

enum SimpleDoc {
  
SimpleDoc
Empty
SimpleDoc
Line
(String) -> SimpleDoc
Text
(
String
String
)
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
,
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
)
}
  • Empty: represents an empty string
  • Line: represents a newline
  • Text(String): plain text without line breaks
  • Cat(SimpleDoc, SimpleDoc): concatenates two SimpleDocss

Using these primitives, we can implement a simple rendering function. It flattens a SimpleDoc into a string using a stack-based traversal:

fn 
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
::
fn SimpleDoc::render(doc : SimpleDoc) -> String
render
(
SimpleDoc
doc
:
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
) ->
String
String
{
let
StringBuilder
buf
=
type StringBuilder
StringBuilder
::
fn StringBuilder::new(size_hint? : Int) -> StringBuilder

Creates a new string builder with an optional initial capacity hint.

Parameters:

  • size_hint : An optional initial capacity hint for the internal buffer. If less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes, not the size of characters. size_hint may be ignored on some platforms, JS for example.

Returns a new StringBuilder instance with the specified initial capacity.

new
()
let
Array[SimpleDoc]
stack
= [
SimpleDoc
doc
]
while
Array[SimpleDoc]
stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() is
(SimpleDoc) -> SimpleDoc?
Some
(
SimpleDoc
doc
) {
match
SimpleDoc
doc
{
SimpleDoc
Empty
=> ()
SimpleDoc
Line
=> {
StringBuilder
buf
..
fn Logger::write_string(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
("\n")
}
(String) -> SimpleDoc
Text
(
String
text
) => {
StringBuilder
buf
.
fn Logger::write_string(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
(
String
text
)
}
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
SimpleDoc
left
,
SimpleDoc
right
) =>
Array[SimpleDoc]
stack
..
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
SimpleDoc
right
)..
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
SimpleDoc
left
)
} }
StringBuilder
buf
.
fn StringBuilder::to_string(self : StringBuilder) -> String

Returns the current content of the StringBuilder as a string.

to_string
()
}

Here’s a quick test: we can see that the expressiveness of SimpleDoc is equivalent to String: Empty corresponds to "", Line corresponds to "\n", Text("a") corresponds to "a", and Cat(Text("a"), Text("b")) corresponds to "a" + "b".

test "simple doc" {
  let 
SimpleDoc
doc
:
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
=
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
(String) -> SimpleDoc
Text
("hello"),
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
SimpleDoc
Line
,
(String) -> SimpleDoc
Text
("world")))
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
SimpleDoc
doc
.
fn SimpleDoc::render(doc : SimpleDoc) -> String
render
(),
String
content
=(
#|hello #|world ), ) }

At this stage, the SimpleDoc doesn’t yet handle indentation or layout choices—but we’re about to fix that.

ExtendDoc: Nest, Choice, Group

To handle real-world formatting, we extend SimpleDoc with three new primitives:

enum ExtendDoc {
  
ExtendDoc
Empty
ExtendDoc
Line
(String) -> ExtendDoc
Text
(
String
String
)
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
(Int, ExtendDoc) -> ExtendDoc
Nest
(
Int
Int
,
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
(ExtendDoc) -> ExtendDoc
Group
(
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
}
  • Nest Nest(Int, ExtendDoc) indents the doc by n spaces after each line break. Nested levels accumulate.

  • Choice Choice(ExtendDoc, ExtendDoc) stores two alternative layouts. Usually, the first parameter is the more compact layout without line breaks, and the second is the layout with Lines. The renderer uses the first layout in compact mode and the second otherwise.

  • Group Group(ExtendDoc) groups an ExtendDoc and decides between compact or non-compact layout based on the available width. If the remaining space is sufficient, it prints compactly; otherwise, it falls back to the layout with line breaks.

Measuring Space

To know whether compact layout fits, we need a way to estimate how many characters a document would require:

let 
Int
max_space
= 9999
fn
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
::
fn ExtendDoc::space(self : ExtendDoc) -> Int
space
(
ExtendDoc
self
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
Self
) ->
Int
Int
{
match
ExtendDoc
self
{
ExtendDoc
Empty
=> 0
ExtendDoc
Line
=>
let max_space : Int
max_space
(String) -> ExtendDoc
Text
(
String
str
) =>
String
str
.
fn String::length(self : String) -> Int

Returns the number of UTF-16 code units in the string. Note that this is not necessarily equal to the number of Unicode characters (code points) in the string, as some characters may be represented by multiple UTF-16 code units.

Parameters:

  • string : The string whose length is to be determined.

Returns the number of UTF-16 code units in the string.

Example:

inspect("hello".length(), content="5")
inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
inspect("".length(), content="0") // Empty string
length
()
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
ExtendDoc
a
,
ExtendDoc
b
) =>
ExtendDoc
a
.
fn ExtendDoc::space(self : ExtendDoc) -> Int
space
()
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
ExtendDoc
b
.
fn ExtendDoc::space(self : ExtendDoc) -> Int
space
()
(Int, ExtendDoc) -> ExtendDoc
Nest
(_,
ExtendDoc
a
) |
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
ExtendDoc
a
, _) |
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
a
) =>
ExtendDoc
a
.
fn ExtendDoc::space(self : ExtendDoc) -> Int
space
()
} }

Here, Line is treated as requiring “infinite” space. This guarantees that if a Group contains a line break, it won’t attempt to print compactly.

Rendering ExtendDoc

We extend SimpleDoc::render to implement ExtendDoc::render. Since after printing a substructure we need to return to the original indentation level, the stack must also store two states for each pending ExtendDoc: indentation and whether compact mode is active. We also maintain a column variable to track the number of characters already used on the current line, in order to calculate remaining space. Finally, the function adds a width parameter to specify the maximum line width.

fn 
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
::
fn ExtendDoc::render(doc : ExtendDoc, width? : Int) -> String
render
(
ExtendDoc
doc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
Int
width
~ :
Int
Int
= 80) ->
String
String
{
let
StringBuilder
buf
=
type StringBuilder
StringBuilder
::
fn StringBuilder::new(size_hint? : Int) -> StringBuilder

Creates a new string builder with an optional initial capacity hint.

Parameters:

  • size_hint : An optional initial capacity hint for the internal buffer. If less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes, not the size of characters. size_hint may be ignored on some platforms, JS for example.

Returns a new StringBuilder instance with the specified initial capacity.

new
()
let
Array[(Int, Bool, ExtendDoc)]
stack
= [(0, false,
ExtendDoc
doc
)] // default: no indentation, non-compact mode
let mut
Int
column
= 0
while
Array[(Int, Bool, ExtendDoc)]
stack
.
fn[T] Array::pop(self : Array[T]) -> T?

Removes the last element from an array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() is
((Int, Bool, ExtendDoc)) -> (Int, Bool, ExtendDoc)?
Some
((
Int
indent
,
Bool
fit
,
ExtendDoc
doc
)) {
match
ExtendDoc
doc
{
ExtendDoc
Empty
=> ()
ExtendDoc
Line
=> {
StringBuilder
buf
..
fn Logger::write_string(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
("\n")
for _ in
Int
0
..<
Int
indent
{
StringBuilder
buf
.
fn Logger::write_string(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
(" ")
}
Int
column
=
Int
indent
}
(String) -> ExtendDoc
Text
(
String
text
) => {
StringBuilder
buf
.
fn Logger::write_string(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
(
String
text
)
Int
column
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
String
text
.
fn String::length(self : String) -> Int

Returns the number of UTF-16 code units in the string. Note that this is not necessarily equal to the number of Unicode characters (code points) in the string, as some characters may be represented by multiple UTF-16 code units.

Parameters:

  • string : The string whose length is to be determined.

Returns the number of UTF-16 code units in the string.

Example:

inspect("hello".length(), content="5")
inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
inspect("".length(), content="0") // Empty string
length
()
}
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
ExtendDoc
left
,
ExtendDoc
right
) =>
Array[(Int, Bool, ExtendDoc)]
stack
..
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
,
Bool
fit
,
ExtendDoc
right
))..
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
,
Bool
fit
,
ExtendDoc
left
))
(Int, ExtendDoc) -> ExtendDoc
Nest
(
Int
n
,
ExtendDoc
doc
) =>
Array[(Int, Bool, ExtendDoc)]
stack
..
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
n
,
Bool
fit
,
ExtendDoc
doc
))
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
ExtendDoc
a
,
ExtendDoc
b
) =>
Array[(Int, Bool, ExtendDoc)]
stack
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(if
Bool
fit
{ (
Int
indent
,
Bool
fit
,
ExtendDoc
a
) } else { (
Int
indent
,
Bool
fit
,
ExtendDoc
b
) })
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
doc
) => {
let
Bool
fit
=
Bool
fit
(Bool, Bool) -> Bool
||
Int
column
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
ExtendDoc
doc
.
fn ExtendDoc::space(self : ExtendDoc) -> Int
space
()
fn Compare::op_le(x : Int, y : Int) -> Bool
<=
Int
width
Array[(Int, Bool, ExtendDoc)]
stack
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
,
Bool
fit
,
ExtendDoc
doc
))
} } }
StringBuilder
buf
.
fn StringBuilder::to_string(self : StringBuilder) -> String

Returns the current content of the StringBuilder as a string.

to_string
()
}

Let’s use ExtendDoc to describe a (expr) and print it under different width:

let 
ExtendDoc
softline
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
ExtendDoc
Empty
,
ExtendDoc
Line
)
impl
trait Add {
  add(Self, Self) -> Self
  op_add(Self, Self) -> Self
}

types implementing this trait can use the + operator

Add
for
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
with
fn Add::op_add(a : ExtendDoc, b : ExtendDoc) -> ExtendDoc
op_add
(
ExtendDoc
a
,
ExtendDoc
b
) {
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
ExtendDoc
a
,
ExtendDoc
b
)
} test "tuple" { let
ExtendDoc
tuple
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
(String) -> ExtendDoc
Text
("(")
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(Int, ExtendDoc) -> ExtendDoc
Nest
(2,
let softline : ExtendDoc
softline
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(String) -> ExtendDoc
Text
("expr"))
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let softline : ExtendDoc
softline
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(String) -> ExtendDoc
Text
(")"),
)
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
tuple
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=40),
String
content
="(expr)")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
tuple
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=5),
String
content
=(
#|( #| expr #|) ), ) }

Here, softline is defined as a choice between Empty and Line. Since rendering starts in non-compact mode, we wrap the whole expression with Group. When the width is sufficient, the entire expression prints on one line; otherwise, it automatically wraps with indentation. To improve readability, we overloaded the + operator for ExtendDoc.

Composition Functions

In practice, users rely more on higher-level combinators built from the ExtendDoc primitives—like the softline above. Let’s introduce some useful functions for structured printing.

softline & softbreak

let 
ExtendDoc
softbreak
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
(String) -> ExtendDoc
Text
(" "),
ExtendDoc
Line
)

Similar to softline, except that in compact mode it inserts a space. Note that within the same Group, all Choices follow the same compact or non-compact decision.

let 
ExtendDoc
abc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
("abc")
let
ExtendDoc
def
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
("def")
let
ExtendDoc
ghi
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
("ghi")
test "softbreak" { let
ExtendDoc
doc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
let abc : ExtendDoc
abc
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let softbreak : ExtendDoc
softbreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let def : ExtendDoc
def
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let softbreak : ExtendDoc
softbreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let ghi : ExtendDoc
ghi
)
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=20),
String
content
="abc def ghi")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=10),
String
content
=(
#|abc #|def #|ghi ), ) }

autoline & autobreak

let 
ExtendDoc
autoline
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
let softline : ExtendDoc
softline
)
let
ExtendDoc
autobreak
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
let softbreak : ExtendDoc
softbreak
)

autoline and autobreak make sure the ExtendDocs fit as much as possible on one line, like text editors do.

test {
  let 
ExtendDoc
doc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
let abc : ExtendDoc
abc
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let autobreak : ExtendDoc
autobreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let def : ExtendDoc
def
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let autobreak : ExtendDoc
autobreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let ghi : ExtendDoc
ghi
,
)
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=10),
String
content
="abc def ghi")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=5),
String
content
=(
#|abc def #|ghi ), )
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=3),
String
content
=(
#|abc #|def #|ghi ), ) }

sepby

fn 
fn sepby(xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby
(
Array[ExtendDoc]
xs
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
],
ExtendDoc
sep
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
) ->
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
{
match
Array[ExtendDoc]
xs
{
[] =>
ExtendDoc
Empty
Array[ExtendDoc]
[
ExtendDoc
x
Array[ExtendDoc]
, .. xs]
=>
ArrayView[ExtendDoc]
xs
.
fn[A, B] ArrayView::fold(self : ArrayView[A], init~ : B, f : (B, A) -> B raise?) -> B raise?

Fold out values from an ArrayView according to certain rules.

Example

  let sum = [1, 2, 3, 4, 5][:].fold(init=0, (sum, elem) => sum + elem)
  inspect(sum, content="15")
fold
(
ExtendDoc
init
=
ExtendDoc
x
, (
ExtendDoc
a
,
ExtendDoc
b
) =>
ExtendDoc
a
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
sep
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
b
)
} }

sepby inserts a separator sep between ExtendDocs.

let 
ExtendDoc
comma
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
(",")
test { let
ExtendDoc
layout
=
(ExtendDoc) -> ExtendDoc
Group
(
fn sepby(xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby
([
let abc : ExtendDoc
abc
,
let def : ExtendDoc
def
,
let ghi : ExtendDoc
ghi
],
let comma : ExtendDoc
comma
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let softbreak : ExtendDoc
softbreak
))
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
layout
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=40),
String
content
="abc, def, ghi")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
layout
.
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=10),
String
content
=(
#|abc, #|def, #|ghi ), ) }

surround

fn 
fn surround(m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround
(
ExtendDoc
m
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
ExtendDoc
l
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
ExtendDoc
r
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
) ->
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
{
ExtendDoc
l
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
m
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
r
}

surround wraps an ExtendDoc with left and right delimiters.

test {
  
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn surround(m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround
(
let abc : ExtendDoc
abc
,
(String) -> ExtendDoc
Text
("("),
(String) -> ExtendDoc
Text
(")")).
fn ExtendDoc::render(doc : ExtendDoc, width? : Int) -> String
render
(),
String
content
="(abc)")
}

Printing JSON

Using the functions above, we can implement a JSON prettyprinter. This function recursively processes each JSON element and generates the appropriate layout.

fn 
fn pretty(x : Json) -> ExtendDoc
pretty
(
Json
x
:
enum Json {
  Null
  True
  False
  Number(Double, repr~ : String?)
  String(String)
  Array(Array[Json])
  Object(Map[String, Json])
}
Json
) ->
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
{
fn
(Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list
(
Array[ExtendDoc]
xs
,
ExtendDoc
l
,
ExtendDoc
r
) {
(
(Int, ExtendDoc) -> ExtendDoc
Nest
(2,
let softline : ExtendDoc
softline
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
fn sepby(xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby
(
Array[ExtendDoc]
xs
,
let comma : ExtendDoc
comma
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let softbreak : ExtendDoc
softbreak
))
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
let softline : ExtendDoc
softline
)
|>
fn surround(m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround
(
ExtendDoc
l
,
ExtendDoc
r
)
|>
(ExtendDoc) -> ExtendDoc
Group
} match
Json
x
{
(Array[Json]) -> Json
Array
(
Array[Json]
elems
) => {
let
Array[ExtendDoc]
elems
=
Array[Json]
elems
.
fn[T] Array::iter(self : Array[T]) -> Iter[T]

Creates an iterator over the elements of the array.

Parameters:

  • array : The array to create an iterator from.

Returns an iterator that yields each element of the array in order.

Example:

let arr = [1, 2, 3]
let mut sum = 0
arr.iter().each(x => sum = sum + x)
inspect(sum, content="6")
iter
().
fn[T, R] Iter::map(self : Iter[T], f : (T) -> R) -> Iter[R]

Transforms the elements of the iterator using a mapping function.

Type Parameters

  • T: The type of the elements in the iterator.
  • R: The type of the transformed elements.

Arguments

  • self - The input iterator.
  • f - The mapping function that transforms each element of the iterator.

Returns

A new iterator that contains the transformed elements.

map
(
fn pretty(x : Json) -> ExtendDoc
pretty
).
fn[T] Iter::collect(self : Iter[T]) -> Array[T]

Collects the elements of the iterator into an array.

collect
()
(Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list
(
Array[ExtendDoc]
elems
,
(String) -> ExtendDoc
Text
("["),
(String) -> ExtendDoc
Text
("]"))
}
(Map[String, Json]) -> Json
Object
(
Map[String, Json]
pairs
) => {
let
Array[ExtendDoc]
pairs
=
Map[String, Json]
pairs
.
fn[K, V] Map::iter(self : Map[K, V]) -> Iter[(K, V)]

Returns the iterator of the hash map, provide elements in the order of insertion.

iter
()
.
fn[T, R] Iter::map(self : Iter[T], f : (T) -> R) -> Iter[R]

Transforms the elements of the iterator using a mapping function.

Type Parameters

  • T: The type of the elements in the iterator.
  • R: The type of the transformed elements.

Arguments

  • self - The input iterator.
  • f - The mapping function that transforms each element of the iterator.

Returns

A new iterator that contains the transformed elements.

map
(
(String, Json)
p
=>
(ExtendDoc) -> ExtendDoc
Group
(
(String) -> ExtendDoc
Text
(
(String, Json)
p
.
String
0
.
fn String::escape(self : String) -> String

Returns a valid MoonBit string literal representation of a string, add quotes and escape special characters.

Examples

  let str = "Hello \n"
  inspect(str.to_string(), content="Hello \n")
  inspect(str.escape(), content="\"Hello \\n\"")
escape
())
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(String) -> ExtendDoc
Text
(": ")
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
fn pretty(x : Json) -> ExtendDoc
pretty
(
(String, Json)
p
.
Json
1
)))
.
fn[T] Iter::collect(self : Iter[T]) -> Array[T]

Collects the elements of the iterator into an array.

collect
()
(Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list
(
Array[ExtendDoc]
pairs
,
(String) -> ExtendDoc
Text
("{"),
(String) -> ExtendDoc
Text
("}"))
}
(String) -> Json
String
(
String
s
) =>
(String) -> ExtendDoc
Text
(
String
s
.
fn String::escape(self : String) -> String

Returns a valid MoonBit string literal representation of a string, add quotes and escape special characters.

Examples

  let str = "Hello \n"
  inspect(str.to_string(), content="Hello \n")
  inspect(str.escape(), content="\"Hello \\n\"")
escape
())
(Double, repr~ : String?) -> Json
Number
(
Double
i
) =>
(String) -> ExtendDoc
Text
(
Double
i
.
fn Double::to_string(self : Double) -> String

Converts a double-precision floating-point number to its string representation.

Parameters:

  • self: The double-precision floating-point number to be converted.

Returns a string representation of the double-precision floating-point number.

Example:

inspect(42.0.to_string(), content="42")
inspect(3.14159.to_string(), content="3.14159")
inspect((-0.0).to_string(), content="0")
inspect(@double.not_a_number.to_string(), content="NaN")
to_string
())
Json
False
=>
(String) -> ExtendDoc
Text
("false")
Json
True
=>
(String) -> ExtendDoc
Text
("true")
Json
Null
=>
(String) -> ExtendDoc
Text
("null")
} }

When rendered, the JSON automatically adapts to different widths:

test {
  let 
Json
json
:
enum Json {
  Null
  True
  False
  Number(Double, repr~ : String?)
  String(String)
  Array(Array[Json])
  Object(Map[String, Json])
}
Json
= {
"key1": "string", "key2": [12345, 67890], "key3": [ { "field1": 1, "field2": 2 }, { "field1": 1, "field2": 2 }, { "field1": [1, 2], "field2": 2 }, ], }
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn pretty(x : Json) -> ExtendDoc
pretty
(
Json
json
).
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=80),
String
content
=(
#|{ #| "key1": "string", #| "key2": [12345, 67890], #| "key3": [ #| {"field1": 1, "field2": 2}, #| {"field1": 1, "field2": 2}, #| {"field1": [1, 2], "field2": 2} #| ] #|} ), )
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn pretty(x : Json) -> ExtendDoc
pretty
(
Json
json
).
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=30),
String
content
=(
#|{ #| "key1": "string", #| "key2": [12345, 67890], #| "key3": [ #| {"field1": 1, "field2": 2}, #| {"field1": 1, "field2": 2}, #| { #| "field1": [1, 2], #| "field2": 2 #| } #| ] #|} ), )
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
fn pretty(x : Json) -> ExtendDoc
pretty
(
Json
json
).
fn ExtendDoc::render(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=20),
String
content
=(
#|{ #| "key1": "string", #| "key2": [ #| 12345, #| 67890 #| ], #| "key3": [ #| { #| "field1": 1, #| "field2": 2 #| }, #| { #| "field1": 1, #| "field2": 2 #| }, #| { #| "field1": [ #| 1, #| 2 #| ], #| "field2": 2 #| } #| ] #|} ), ) }

Conclusion

By combining a small set of primitives with function composition, we can build a flexible, declarative prettyprinter that adapts structured data layouts to the available screen width.

This approach scales well: you describe layout intentions with combinators like sepby, surround, or autobreak, and the rendering engine takes care of indentation, line breaks, and fitting.

The current implementation can be further optimized:

  • Memoizing space calculations to improve performance.
  • Adding a ribbon parameter to balance whitespace vs. content density
  • Supporting advanced layouts like hanging indents or mandatory line breaks

For a deeper dive, see Philip Wadler’s classic paper A prettier printer – Philip Wadler, as well as prettyprinter libraries in Haskell, OCaml, and other languages.

Mini-adapton: incremental computation in MoonBit

· 10 min read

Introduction

Let's first illustrate how incremental computation looks like with an example similar to spreadsheet. First define a dependency graph like this:

In this graph, t1's value is computed from n1 + n2 and t2's value is computed from t1 + n3.

When we want to get the value of t2, the computation defined in the graph will be done: first t1 is computed by n1 + n2, then t2 is computed by t1 + n3. This process is the same as non-incremental computation.

However, when we start to change values in n1, n2, or n3, things get different. Say we swap the value of n1 and n2, then get t2's value. In non-incremental computation, both t1 and t2 will be recomputed. But the computation of t2 is actually not needed, since all its dependency t1 and n3 are not changed (swap n1 and n2 wont change t1's value).

The following code example does exactly what we describe above. We use Cell::new to define n1, n2, and n3, which does not need computation. And Thunk::new to define t1 and t2 with computation.

test {
  // a counter to record the times of t2's computation
  let mut 
Int
cnt
= 0
// start define the graph let
Cell[Int]
n1
=
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
fn[A : Eq] Cell::new(value : A) -> Cell[A]
new
(1)
let
Cell[Int]
n2
=
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
fn[A : Eq] Cell::new(value : A) -> Cell[A]
new
(2)
let
Cell[Int]
n3
=
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
fn[A : Eq] Cell::new(value : A) -> Cell[A]
new
(3)
let
Thunk[Int]
t1
=
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
fn[A : Eq] Thunk::new(thunk : () -> A) -> Thunk[A]
new
(fn() {
Cell[Int]
n1
.
fn[A] Cell::get(self : Cell[A]) -> A
get
()
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Cell[Int]
n2
.
fn[A] Cell::get(self : Cell[A]) -> A
get
()
}) let
Thunk[Int]
t2
=
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
fn[A : Eq] Thunk::new(thunk : () -> A) -> Thunk[A]
new
(fn() {
Int
cnt
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
Thunk[Int]
t1
.
fn[A : Eq] Thunk::get(self : Thunk[A]) -> A
get
()
fn Add::add(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Cell[Int]
n3
.
fn[A] Cell::get(self : Cell[A]) -> A
get
()
}) // get the value of t2
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Thunk[Int]
t2
.
fn[A : Eq] Thunk::get(self : Thunk[A]) -> A
get
(),
String
content
="6")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Int
cnt
,
String
content
="1")
// swap value of n1 and n2
Cell[Int]
n1
.
fn[A : Eq] Cell::set(self : Cell[A], new_value : A) -> Unit
set
(2)
Cell[Int]
n2
.
fn[A : Eq] Cell::set(self : Cell[A], new_value : A) -> Unit
set
(1)
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Thunk[Int]
t2
.
fn[A : Eq] Thunk::get(self : Thunk[A]) -> A
get
(),
String
content
="6")
// t2 does not recompute
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Int
cnt
,
String
content
="1")
}

In this article, we will show how to implement an incremental computation library in MoonBit with the api used in the above example:

Cell::new
Cell::get
Cell::set
Thunk::new
Thunk::get

Problem Analysis and Solution

To implement the library, there are three main problems to solve:

Build up dependency graph on the fly

As a library in MoonBit, we don't have any easy ways to build up the dependency graph statically, since MoonBit does not have any meta programming mechanism currently. Therefore, we need to construct dependency graph on the fly. Since all we care about is what cells/thunks does a thunk depend on, a good option to build up dependency graph would be when user calls Thunk::get. Take the code above as an example:

let n1 = Cell::new(1)
let n2 = Cell::new(2)
let n3 = Cell::new(3)
let t1 = Thunk::new(fn() { n1.get() + n2.get() })
let t2 = Thunk::new(fn() { t1.get() + n3.get() })
t2.get()

When user calls t2.get(), we can know that at runtime t1.get() and n3.get() are called inside it. Therefore, t1 and n3 are dependencies of t2 and we can construct a subgraph:

The same story will also happen when t1.get() is called inside t2.get().

So here is the plan:

  1. we declare a stack to record which thunk are we currently getting. The reason we use stack here is that we are essentially record call stacks of every get.
  2. whenever we call get, mark it as the dependency of stack top. If it's a thunk, push it onto stack.
  3. whenever a thunk's get finished, pop it off the stack.

Let's see the full process of above example under this algorithm:

  1. when we call t2.get, push t2 on the stack.

  2. when we call t1.get inside t2.get, mark t1 as a dependency of t2 and push t1 onto the stack.

  3. when we call n1.get inside t1.get, mark n1 as a dependency of t1.

  4. same story goes for n2.

  5. when t1.get finished, pop it from stack.

  6. when we call n3.get, mark n3 as a dependency of t2

Besides the edge from dependent to dependency, we'd better also record an edge from dependency to dependent, so that we can easily traverse the graph backwards when we need.

In the code below, we'll use outgoing_edges to refer to edge from parent(dependent) to child (dependency) and incoming_edges to refer to the opposite.

A mechanism to mark outdated node

Whenever we call Cell::set, the node itself and all nodes depend on it should be marked as outdated. This will be one of the criteria to determine whether a thunk needs to be recomputed. This is generally a recursive backward traverse from a leaf of a graph. We can describe the process as pseudo MoonBit code:

fn dirty(node: Node) -> Unit {
  for n in node.incoming_edges {
    n.set_dirty(true)
    dirty(node)
  }
}

Determine whether a thunk needs to be recomputed

Whenever we call Thunk::get, we need to determine whether it really needs to be recomputed. But the dirty mechanism we describe in the last subsection is not enough. If we only use dirtiness to determine whether a thunk needs to be recomputed, there would be unneeded computation. Let's see it from the example we give at the beginning:

n1.set(2)
n2.set(1)
inspect(t2.get(), content="6")

After we swap the value of n1 and n2, n1, n2, t1, and t2 should all be marked as dirty, but when we call t2.get, there is no need to recompute t2, since the value of t1 does not change.

This reminds us that despite dirtiness, we need also to record whether a node's value differs from its last value. If a node is both dirty and one of its dependencies' value changed, it needs to be recomputed.

We can describe the algorithm as the pseudo MoonBit code below:

fn propagate(self: Node) -> Unit {
  // When a node is dirty, it might need to be recomputed
  if self.is_dirty() {
    // after recomputing, it's no longer dirty
    self.set_dirty(false)
    for dependency in self.outgoing_edges() {
      // recursively recompute every dependency
      dependency.propagate()
      // If a dependency's value changed, the node needs to be recomputed
      if dependency.is_changed() {
        // remove all incoming_edges and outgoing_edges, since they will be reconstructed during evaluate
        self.incoming_edges().clear()
        self.outgoing_edges().clear()
        self.evaluate()
        return
      }
    }
  }
}

Implementation

Given the algorithms described in the last section, the implementation should be quite straightforward.

First, let's define Cell:

struct Cell[A] {
  mut 
Bool
is_dirty
:
Bool
Bool
mut
A
value
:

type parameter A

A
mut
Bool
is_changed
:
Bool
Bool
Array[&Node]
incoming_edges
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
}

Since Cell can only be leaf node in dependency graph, it does not have outgoing_edges. The trait Node here is used to abstract node in dependency graph.

Then, let's define Thunk:

struct Thunk[A] {
  mut 
Bool
is_dirty
:
Bool
Bool
mut
A?
value
:

type parameter A

A
?
mut
Bool
is_changed
:
Bool
Bool
() -> A
thunk
: () ->

type parameter A

A
Array[&Node]
incoming_edges
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
Array[&Node]
outgoing_edges
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
}

Thunk's value is optional, since it only exists after we first call Thunk::get.

We can easily add new for both types:

fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
fn[A : Eq] Cell::new(value : A) -> Cell[A]
new
(
A
value
:

type parameter A

A
) ->
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] {
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::{
Bool
is_changed
: false,
A
value
,
Array[&Node]
incoming_edges
: [],
Bool
is_dirty
: false,
} }
fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
fn[A : Eq] Thunk::new(thunk : () -> A) -> Thunk[A]
new
(
() -> A
thunk
: () ->

type parameter A

A
) ->
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] {
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::{
A?
value
:
A?
None
,
Bool
is_changed
: false,
() -> A
thunk
,
Array[&Node]
incoming_edges
: [],
Array[&Node]
outgoing_edges
: [],
Bool
is_dirty
: false,
} }

Thunk and Cell are the two kinds of node in dependency graph, we can use the trait Node mentioned above to abstract them:

trait 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
{
(Self) -> Bool
is_dirty
(

type parameter Self

Self
) ->
Bool
Bool
(Self, Bool) -> Unit
set_dirty
(

type parameter Self

Self
,
Bool
Bool
) ->
Unit
Unit
(Self) -> Array[&Node]
incoming_edges
(

type parameter Self

Self
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
(Self) -> Array[&Node]
outgoing_edges
(

type parameter Self

Self
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
(Self) -> Bool
is_changed
(

type parameter Self

Self
) ->
Bool
Bool
(Self) -> Unit
evaluate
(

type parameter Self

Self
) ->
Unit
Unit
}

And implement the trait for both types:

impl[A] 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
fn[A] Node::incoming_edges(self : Cell[A]) -> Array[&Node]
incoming_edges
(
Cell[A]
self
) {
Cell[A]
self
.
Array[&Node]
incoming_edges
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
fn[A] Node::outgoing_edges(_self : Cell[A]) -> Array[&Node]
outgoing_edges
(
Cell[A]
_self
) {
[] } impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
fn[A] Node::is_dirty(self : Cell[A]) -> Bool
is_dirty
(
Cell[A]
self
) {
Cell[A]
self
.
Bool
is_dirty
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
fn[A] Node::set_dirty(self : Cell[A], new_dirty : Bool) -> Unit
set_dirty
(
Cell[A]
self
,
Bool
new_dirty
) {
Cell[A]
self
.
Bool
is_dirty
=
Bool
new_dirty
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
fn[A] Node::is_changed(self : Cell[A]) -> Bool
is_changed
(
Cell[A]
self
) {
Cell[A]
self
.
Bool
is_changed
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
fn[A] Node::evaluate(_self : Cell[A]) -> Unit
evaluate
(
Cell[A]
_self
) {
() } impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
fn[A : Eq] Node::is_changed(self : Thunk[A]) -> Bool
is_changed
(
Thunk[A]
self
) {
Thunk[A]
self
.
Bool
is_changed
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
fn[A : Eq] Node::outgoing_edges(self : Thunk[A]) -> Array[&Node]
outgoing_edges
(
Thunk[A]
self
) {
Thunk[A]
self
.
Array[&Node]
outgoing_edges
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
fn[A : Eq] Node::incoming_edges(self : Thunk[A]) -> Array[&Node]
incoming_edges
(
Thunk[A]
self
) {
Thunk[A]
self
.
Array[&Node]
incoming_edges
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
fn[A : Eq] Node::is_dirty(self : Thunk[A]) -> Bool
is_dirty
(
Thunk[A]
self
) {
Thunk[A]
self
.
Bool
is_dirty
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
fn[A : Eq] Node::set_dirty(self : Thunk[A], new_dirty : Bool) -> Unit
set_dirty
(
Thunk[A]
self
,
Bool
new_dirty
) {
Thunk[A]
self
.
Bool
is_dirty
=
Bool
new_dirty
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
fn[A : Eq] Node::evaluate(self : Thunk[A]) -> Unit
evaluate
(
Thunk[A]
self
) {
// push self into node_stack top // now self is active target
let node_stack : Array[&Node]
node_stack
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Thunk[A]
self
)
// `self.thunk` might contains `source.get()`, // such as `s1.get()`, `s2.get()` and `s3.get()` // // when call `Thunk::get` or `Cell::get`, // they will treat `node_stack.last()` as themself's target. // if source is `Cell`, then it only record `incoming_edges`. // if source is `Thunk`, then it record `incoming_edges` and `outgoing_edges`, connect each other. // let
A
value
= (
Thunk[A]
self
.
() -> A
thunk
)()
Thunk[A]
self
.
Bool
is_changed
= match
Thunk[A]
self
.
A?
value
{
A?
None
=> true
(A) -> A?
Some
(
A
v
) =>
A
v
fn[T : Eq] @moonbitlang/core/builtin.op_notequal(x : T, y : T) -> Bool
!=
A
value
}
Thunk[A]
self
.
A?
value
=
(A) -> A?
Some
(
A
value
)
// pop self from node_stack // now self is no longer active target
let node_stack : Array[&Node]
node_stack
.
fn[T] Array::unsafe_pop(self : Array[T]) -> T

Removes and returns the last element from the array.

Parameters:

  • array : The array from which to remove and return the last element.

Returns the last element of the array before removal.

Example:

let arr = [1, 2, 3]
inspect(arr.unsafe_pop(), content="3")
inspect(arr, content="[1, 2]")
unsafe_pop
() |>
fn[T] ignore(t : T) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

let x = 42
ignore(x) // Explicitly ignore the value
let mut sum = 0
ignore([1, 2, 3].iter().each(x => sum = sum + x)) // Ignore the Unit return value of each()
ignore
}

The only complicated implementation is Thunk's evaluate. Here we need first to push the thunk on stack for dependency recording. node_stack is defined as below:

let 
Array[&Node]
node_stack
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
] = []

Then do the real computation and compare it with the last value to update self.is_changed. is_changed is used later to determine whether we need to recompute a thunk.

dirty and propagate are almost the same as the pseudo code described above:

fn 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
fn Node::dirty(self : &Node) -> Unit
dirty
(
&Node
self
: &
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
) ->
Unit
Unit
{
for
&Node
dependent
in
&Node
self
.
fn Node::incoming_edges(&Node) -> Array[&Node]
incoming_edges
() {
if
fn not(x : Bool) -> Bool

Performs logical negation on a boolean value.

Parameters:

  • value : The boolean value to negate.

Returns the logical NOT of the input value: true if the input is false, and false if the input is true.

Example:

inspect(not(true), content="false")
inspect(not(false), content="true")
not
(
&Node
dependent
.
fn Node::is_dirty(&Node) -> Bool
is_dirty
()) {
&Node
dependent
.
fn Node::set_dirty(&Node, Bool) -> Unit
set_dirty
(true)
&Node
dependent
.
fn Node::dirty(self : &Node) -> Unit
dirty
()
} } }
fn 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
fn Node::propagate(self : &Node) -> Unit
propagate
(
&Node
self
: &
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
) ->
Unit
Unit
{
if
&Node
self
.
fn Node::is_dirty(&Node) -> Bool
is_dirty
() {
&Node
self
.
fn Node::set_dirty(&Node, Bool) -> Unit
set_dirty
(false)
for
&Node
dependency
in
&Node
self
.
fn Node::outgoing_edges(&Node) -> Array[&Node]
outgoing_edges
() {
&Node
dependency
.
fn Node::propagate(self : &Node) -> Unit
propagate
()
if
&Node
dependency
.
fn Node::is_changed(&Node) -> Bool
is_changed
() {
&Node
self
.
fn Node::incoming_edges(&Node) -> Array[&Node]
incoming_edges
().
fn[T] Array::clear(self : Array[T]) -> Unit

Clears the array, removing all values.

This method has no effect on the allocated capacity of the array, only setting the length to 0.

Example

  let v = [3, 4, 5]
  v.clear()
  assert_eq(v.length(), 0)
clear
()
&Node
self
.
fn Node::outgoing_edges(&Node) -> Array[&Node]
outgoing_edges
().
fn[T] Array::clear(self : Array[T]) -> Unit

Clears the array, removing all values.

This method has no effect on the allocated capacity of the array, only setting the length to 0.

Example

  let v = [3, 4, 5]
  v.clear()
  assert_eq(v.length(), 0)
clear
()
&Node
self
.
fn Node::evaluate(&Node) -> Unit
evaluate
()
return } } } }

With all the foundation we build, the three main api: Cell::get, Cell:set, and Thunk::get are easy to implement.

To get value from a cell, it's simply just return the value filed in struct. But before that, we need first record it as a dependency if it's called inside Thunk::get.

fn[A] 
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
fn[A] Cell::get(self : Cell[A]) -> A
get
(
Cell[A]
self
:
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
]) ->

type parameter A

A
{
if
let node_stack : Array[&Node]
node_stack
.
fn[A] Array::last(self : Array[A]) -> A?

Returns the last element of the array, or None if the array is empty.

Parameters:

  • array : The array to get the last element from.

Returns an optional value containing the last element of the array. The result is None if the array is empty, or Some(x) where x is the last element of the array.

Example:

let arr = [1, 2, 3]
inspect(arr.last(), content="Some(3)")
let empty : Array[Int] = []
inspect(empty.last(), content="None")
last
() is
(&Node) -> &Node?
Some
(
&Node
target
) {
&Node
target
.
fn Node::outgoing_edges(&Node) -> Array[&Node]
outgoing_edges
().
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Cell[A]
self
)
Cell[A]
self
.
Array[&Node]
incoming_edges
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
&Node
target
)
}
Cell[A]
self
.
A
value
}

Whenever we set a cell, we need to first make sure that the two states is_changed and dirty are updated correctly. Then mark every dependent as dirty.

fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
fn[A : Eq] Cell::set(self : Cell[A], new_value : A) -> Unit
set
(
Cell[A]
self
:
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
],
A
new_value
:

type parameter A

A
) ->
Unit
Unit
{
if
Cell[A]
self
.
A
value
fn[T : Eq] @moonbitlang/core/builtin.op_notequal(x : T, y : T) -> Bool
!=
A
new_value
{
Cell[A]
self
.
Bool
is_changed
= true
Cell[A]
self
.
A
value
=
A
new_value
Cell[A]
self
.
fn[A] Node::set_dirty(self : Cell[A], new_dirty : Bool) -> Unit
set_dirty
(true)
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
fn Node::dirty(self : &Node) -> Unit
dirty
(
Cell[A]
self
)
} }

In Thunk::get, similar to Cell::get, we first need to record self as a dependency. After that we pattern match on self.value. If it's None, it means that this is the first time user tries to get the thunk's value, so we can safely just evaluate it. If it's Some, we use propagate to make sure that we only recompute thunks that's really needed.

fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
fn[A : Eq] Thunk::get(self : Thunk[A]) -> A
get
(
Thunk[A]
self
:
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
]) ->

type parameter A

A
{
if
let node_stack : Array[&Node]
node_stack
.
fn[A] Array::last(self : Array[A]) -> A?

Returns the last element of the array, or None if the array is empty.

Parameters:

  • array : The array to get the last element from.

Returns an optional value containing the last element of the array. The result is None if the array is empty, or Some(x) where x is the last element of the array.

Example:

let arr = [1, 2, 3]
inspect(arr.last(), content="Some(3)")
let empty : Array[Int] = []
inspect(empty.last(), content="None")
last
() is
(&Node) -> &Node?
Some
(
&Node
target
) {
&Node
target
.
fn Node::outgoing_edges(&Node) -> Array[&Node]
outgoing_edges
().
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Thunk[A]
self
)
Thunk[A]
self
.
Array[&Node]
incoming_edges
.
fn[T] Array::push(self : Array[T], value : T) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
&Node
target
)
} match
Thunk[A]
self
.
A?
value
{
A?
None
=>
Thunk[A]
self
.
fn[A : Eq] Node::evaluate(self : Thunk[A]) -> Unit
evaluate
()
(A) -> A?
Some
(_) =>
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
fn Node::propagate(self : &Node) -> Unit
propagate
(
Thunk[A]
self
)
}
Thunk[A]
self
.
A?
value
.
fn[X] Option::unwrap(self : X?) -> X

Extract the value in Some.

If the value is None, it throws a panic.

unwrap
()
}

Reference

A Guide to MoonBit Python Integration

· 12 min read

Introduction

Python, with its concise syntax and vast ecosystem, has become one of the most popular programming languages today. However, discussions around its performance bottlenecks and the maintainability of its dynamic typing system in large-scale projects have never ceased. To address these challenges, the developer community has explored various optimization paths.

The python.mbt tool, officially launched by MoonBit, offers a new perspective. It allows developers to call Python code directly within the MoonBit environment. This combination aims to merge MoonBit's static type safety and high-performance potential with Python's mature ecosystem. Through python.mbt, developers can leverage MoonBit's static analysis capabilities, modern build and testing tools, while enjoying Python's rich library functions, making it possible to build large-scale, high-performance system-level software.

This article aims to delve into the working principles of python.mbt and provide a practical guide. It will answer common questions such as: How does python.mbt work? Is it slower than native Python due to an added intermediate layer? What are its advantages over existing tools like C++'s pybind11 or Rust's PyO3? To answer these questions, we first need to understand the basic workflow of the Python interpreter.

How the Python Interpreter Works

The Python interpreter executes code in three main stages:

  1. Parsing: This stage includes lexical analysis and syntax analysis. The interpreter breaks down human-readable Python source code into tokens and then organizes these tokens into a tree-like structure, the Abstract Syntax Tree (AST), based on syntax rules.

    For example, for the following Python code:

    def add(x, y):
      return x + y
    
    a = add(1, 2)
    print(a)
    

    We can use Python's ast module to view its generated AST structure:

    Module(
        body=[
            FunctionDef(
                name='add',
                args=arguments(
                    args=[
                        arg(arg='x'),
                        arg(arg='y')]),
                body=[
                    Return(
                        value=BinOp(
                            left=Name(id='x', ctx=Load()),
                            op=Add(),
                            right=Name(id='y', ctx=Load())))]),
            Assign(
                targets=[
                    Name(id='a', ctx=Store())],
                value=Call(
                    func=Name(id='add', ctx=Load()),
                    args=[
                        Constant(value=1),
                        Constant(value=2)])),
            Expr(
                value=Call(
                    func=Name(id='print', ctx=Load()),
                    args=[
                        Name(id='a', ctx=Load())]))])
    
  2. Compilation: Next, the Python interpreter compiles the AST into a lower-level, more linear intermediate representation called bytecode. This is a platform-independent instruction set designed for the Python Virtual Machine (PVM).

    Using Python's dis module, we can view the bytecode corresponding to the above code:

      2           LOAD_CONST               0 (<code object add>)
                  MAKE_FUNCTION
                  STORE_NAME               0 (add)
    
      5           LOAD_NAME                0 (add)
                  PUSH_NULL
                  LOAD_CONST               1 (1)
                  LOAD_CONST               2 (2)
                  CALL                     2
                  STORE_NAME               1 (a)
    
      6           LOAD_NAME                2 (print)
                  PUSH_NULL
                  LOAD_NAME                1 (a)
                  CALL                     1
                  POP_TOP
                  RETURN_CONST             3 (None)
    
  3. Execution: Finally, the Python Virtual Machine (PVM) executes the bytecode instructions one by one. Each instruction corresponds to a C function call in the CPython interpreter's underlying layer. For example, LOAD_NAME looks up a variable, and BINARY_OP performs a binary operation. It is this process of interpreting and executing instructions one by one that is the main source of Python's performance overhead. A simple 1 + 2 operation involves the entire complex process of parsing, compilation, and virtual machine execution.

Understanding this process helps us grasp the basic approaches to Python performance optimization and the design philosophy of python.mbt.

Paths to Optimizing Python Performance

Currently, there are two mainstream methods for improving Python program performance:

  1. Just-In-Time (JIT) Compilation: Projects like PyPy analyze a running program and compile frequently executed "hotspot" bytecode into highly optimized native machine code, thereby bypassing the PVM's interpretation and significantly speeding up computationally intensive tasks. However, JIT is not a silver bullet; it cannot solve the inherent problems of Python's dynamic typing, such as the difficulty of effective static analysis in large projects, which poses challenges for software maintenance.
  2. Native Extensions: Developers can use languages like C++ (with pybind11) or Rust (with PyO3) to directly call Python functions or to write performance-critical modules that are then called from Python. This method can achieve near-native performance, but it requires developers to be proficient in both Python and a complex system-level language, presenting a steep learning curve and a high barrier to entry for most Python programmers.

python.mbt is also a native extension. But compared to languages like C++ and Rust, it attempts to find a new balance between performance, ease of use, and engineering capabilities, with a greater emphasis on using Python features directly within the MoonBit language.

  1. High-Performance Core: MoonBit is a statically typed, compiled language whose code can be efficiently compiled into native machine code. Developers can implement computationally intensive logic in MoonBit to achieve high performance from the ground up.
  2. Seamless Python Calls: python.mbt interacts directly with CPython's C-API to call Python modules and functions. This means call overhead is minimized, bypassing Python's parsing and compilation stages and going straight to the virtual machine execution layer.
  3. Gentler Learning Curve: Compared to C++ and Rust, MoonBit's syntax is more modern and concise. It also has comprehensive support for functional programming, a documentation system, unit testing, and static analysis tools, making it more friendly to developers accustomed to Python.
  4. Improved Engineering and AI Collaboration: MoonBit's strong type system and clear interface definitions make code intent more explicit and easier for static analysis tools and AI-assisted programming tools to understand. This helps maintain code quality in large projects and improves the efficiency and accuracy of collaborative coding with AI.

Using Pre-wrapped Python Libraries in MoonBit

To facilitate developer use, MoonBit will officially wrap mainstream Python libraries once the build system and IDE are mature. After wrapping, users can use these Python libraries in their projects just like importing regular MoonBit packages. Let's take the matplotlib plotting library as an example.

First, add the matplotlib dependency in your project's root moon.pkg.json or via the terminal:

moon update
moon add Kaida-Amethyst/matplotlib

Then, declare the import in the moon.pkg.json of the sub-package where you want to use the library. Here, we follow Python's convention and set an alias plt:

{
  "import": [
    {
      "path": "Kaida-Amethyst/matplotlib",
      "alias": "plt"
    }
  ]
}

After configuration, you can call matplotlib in your MoonBit code to create plots:

let 
(Double) -> Double
sin
: (
Double
Double
) ->
Double
Double
=
fn @moonbitlang/core/math.sin(x : Double) -> Double

Calculates the sine of a number in radians. Handles special cases and edge conditions according to IEEE 754 standards.

Parameters:

  • x : The angle in radians for which to calculate the sine.

Returns the sine of the angle x.

Example:

inspect(@math.sin(0.0), content="0")
inspect(@math.sin(1.570796326794897), content="1") // pi / 2
inspect(@math.sin(2.0), content="0.9092974268256817")
inspect(@math.sin(-5.0), content="0.9589242746631385")
inspect(@math.sin(31415926535897.9323846), content="0.0012091232715481885")
inspect(@math.sin(@double.not_a_number), content="NaN")
inspect(@math.sin(@double.infinity), content="NaN")
inspect(@math.sin(@double.neg_infinity), content="NaN")
@math.sin
fn main { let
Array[Double]
x
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
fn[T] Array::makei(length : Int, value : (Int) -> T raise?) -> Array[T] raise?

Creates a new array of the specified length, where each element is initialized using an index-based initialization function.

Parameters:

  • length : The length of the new array. If length is less than or equal to 0, returns an empty array.
  • initializer : A function that takes an index (starting from 0) and returns a value of type T. This function is called for each index to initialize the corresponding element.

Returns a new array of type Array[T] with the specified length, where each element is initialized using the provided function.

Example:

let arr = Array::makei(3, i => i * 2)
inspect(arr, content="[0, 2, 4]")
makei
(100, fn(
Int
i
) {
Int
i
.
fn Int::to_double(self : Int) -> Double

Converts a 32-bit integer to a double-precision floating-point number. The conversion preserves the exact value since all integers in the range of Int can be represented exactly as Double values.

Parameters:

  • self : The 32-bit integer to be converted.

Returns a double-precision floating-point number that represents the same numerical value as the input integer.

Example:

let n = 42
inspect(n.to_double(), content="42")
let neg = -42
inspect(neg.to_double(), content="-42")
to_double
()
fn Mul::mul(self : Double, other : Double) -> Double

Multiplies two double-precision floating-point numbers. This is the implementation of the * operator for Double type.

Parameters:

  • self : The first double-precision floating-point operand.
  • other : The second double-precision floating-point operand.

Returns a new double-precision floating-point number representing the product of the two operands. Special cases follow IEEE 754 standard:

  • If either operand is NaN, returns NaN
  • If one operand is infinity and the other is zero, returns NaN
  • If one operand is infinity and the other is a non-zero finite number, returns infinity with the appropriate sign
  • If both operands are infinity, returns infinity with the appropriate sign

Example:

inspect(2.5 * 2.0, content="5")
inspect(-2.0 * 3.0, content="-6")
let nan = 0.0 / 0.0 // NaN
inspect(nan * 1.0, content="NaN")
*
0.1 })
let
Array[Double]
y
=
Array[Double]
x
.
fn[T, U] Array::map(self : Array[T], f : (T) -> U raise?) -> Array[U] raise?

Maps a function over the elements of the array.

Example

  let v = [3, 4, 5]
  let v2 = v.map((x) => {x + 1})
  assert_eq(v2, [4, 5, 6])
map
(
let sin : (Double) -> Double
sin
)
// To ensure type safety, the wrapped subplots interface always returns a tuple of a fixed type. // This avoids the dynamic behavior in Python where the return type depends on the arguments. let (_,
Unit
axes
) =
(Int, Int) -> (Unit, Unit)
plt::
(Int, Int) -> (Unit, Unit)
subplots
(1, 1)
// Use the .. cascade call syntax
Unit
axes
[0
(Int) -> Unit
]
[0]
..
(Array[Double], Array[Double], Unit, Unit, Int) -> Unit
plot
(
Array[Double]
x
,
Array[Double]
y
,
Unit
color
=
Unit
Green
,
Unit
linestyle
=
Unit
Dashed
,
Int
linewidth
= 2)
..
(String) -> Unit
set_title
("Sine of x")
..
(String) -> Unit
set_xlabel
("x")
..
(String) -> Unit
set_ylabel
("sin(x)")
() -> Unit
@plt.show
()
}

Currently, on macOS and Linux, MoonBit's build system can automatically handle dependencies. On Windows, users may need to manually install a C compiler and configure the Python environment. Future MoonBit IDEs will aim to simplify this process.

Using Unwrapped Python Modules in MoonBit

The Python ecosystem is vast, and even with AI technology, relying solely on official wrappers is not realistic. Fortunately, we can use the core features of python.mbt to interact directly with any Python module. Below, we demonstrate this process using the simple time module from the Python standard library.

Introducing python.mbt

First, ensure your MoonBit toolchain is up to date, then add the python.mbt dependency:

moon update
moon add Kaida-Amethyst/python

Next, import it in your package's moon.pkg.json:

{
  "import": ["Kaida-Amethyst/python"]
}

python.mbt automatically handles the initialization (Py_Initialize) and shutdown of the Python interpreter, so developers don't need to manage it manually.

Importing Python Modules

Use the @python.pyimport function to import modules. To avoid performance loss from repeated imports, it is recommended to use a closure technique to cache the imported module object:

// Define a struct to hold the Python module object for enhanced type safety
pub struct TimeModule {
  
?
time_mod
: PyModule
} // Define a function that returns a closure for getting a TimeModule instance fn
fn import_time_mod() -> () -> TimeModule
import_time_mod
() -> () ->
struct TimeModule {
  time_mod: ?
}
TimeModule
{
// The import operation is performed only on the first call guard
(String) -> Unit
@python.pyimport
("time") is
(?) -> Unit
Some
(
?
time_mod
) else {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Failed to load Python module: time")
fn[T] panic() -> T
panic
("ModuleLoadError")
} let
TimeModule
time_mod
=
struct TimeModule {
  time_mod: ?
}
TimeModule
::{
?
time_mod
}
// The returned closure captures the time_mod variable fn () {
TimeModule
time_mod
}
} // Create a global time_mod "getter" function let
() -> TimeModule
time_mod
: () ->
struct TimeModule {
  time_mod: ?
}
TimeModule
=
fn import_time_mod() -> () -> TimeModule
import_time_mod
()

In subsequent code, we should always call time_mod() to get the module, not import_time_mod.

Converting Between MoonBit and Python Objects

To call Python functions, we need to convert between MoonBit objects and Python objects (PyObject).

  1. Integers: Use PyInteger::from to create a PyInteger from an Int64, and to_int64() for the reverse conversion.

    test "py_integer_conversion" {
      let 
    Int64
    n
    :
    Int64
    Int64
    = 42
    let
    &Show
    py_int
    =
    (Int64) -> &Show
    PyInteger::
    (Int64) -> &Show
    from
    (
    Int64
    n
    )
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

    inspect(42, content="42")
    inspect("hello", content="hello")
    inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    py_int
    ,
    String
    content
    ="42")
    fn[T : Eq + Show] assert_eq(a : T, b : T, msg? : String, loc~ : SourceLoc = _) -> Unit raise

    Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

    Parameters:

    • a : First value to compare.
    • b : Second value to compare.
    • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

    Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

    Example:

    assert_eq(1, 1)
    assert_eq("hello", "hello")
    assert_eq
    (
    &Show
    py_int
    .
    () -> Int64
    to_int64
    (), 42L)
    }
  2. Floats: Use PyFloat::from and to_double.

    test "py_float_conversion" {
      let 
    Double
    n
    :
    Double
    Double
    = 3.5
    let
    &Show
    py_float
    =
    (Double) -> &Show
    PyFloat::
    (Double) -> &Show
    from
    (
    Double
    n
    )
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

    inspect(42, content="42")
    inspect("hello", content="hello")
    inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    py_float
    ,
    String
    content
    ="3.5")
    fn[T : Eq + Show] assert_eq(a : T, b : T, msg? : String, loc~ : SourceLoc = _) -> Unit raise

    Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

    Parameters:

    • a : First value to compare.
    • b : Second value to compare.
    • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

    Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

    Example:

    assert_eq(1, 1)
    assert_eq("hello", "hello")
    assert_eq
    (
    &Show
    py_float
    .
    () -> Double
    to_double
    (), 3.5)
    }
  3. Strings: Use PyString::from and to_string.

    test "py_string_conversion" {
      let 
    &Show
    py_str
    =
    (String) -> &Show
    PyString::
    (String) -> &Show
    from
    ("hello")
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

    inspect(42, content="42")
    inspect("hello", content="hello")
    inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    py_str
    ,
    String
    content
    ="'hello'")
    fn[T : Eq + Show] assert_eq(a : T, b : T, msg? : String, loc~ : SourceLoc = _) -> Unit raise

    Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

    Parameters:

    • a : First value to compare.
    • b : Second value to compare.
    • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

    Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

    Example:

    assert_eq(1, 1)
    assert_eq("hello", "hello")
    assert_eq
    (
    &Show
    py_str
    .
    fn Show::to_string(&Show) -> String
    to_string
    (), "hello")
    }
  4. Lists: You can create an empty PyList and append elements, or create one directly from an Array[&IsPyObject].

    test "py_list_from_array" {
      let 
    Unit
    one
    =
    (Int) -> Unit
    PyInteger::
    (Int) -> Unit
    from
    (1)
    let
    Unit
    two
    =
    (Double) -> Unit
    PyFloat::
    (Double) -> Unit
    from
    (2.0)
    let
    Unit
    three
    =
    (String) -> Unit
    PyString::
    (String) -> Unit
    from
    ("three")
    let
    Array[Unit]
    arr
    Array[Unit]
    :
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    Array[Unit]
    [&IsPyObject]
    = [
    Unit
    one
    ,
    Unit
    two
    ,
    Unit
    three
    ]
    let
    &Show
    list
    =
    (Array[Unit]) -> &Show
    PyList::
    (Array[Unit]) -> &Show
    from
    (
    Array[Unit]
    arr
    )
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

    inspect(42, content="42")
    inspect("hello", content="hello")
    inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    list
    ,
    String
    content
    ="[1, 2.0, 'three']")
    }
  5. Tuples: PyTuple requires specifying the size first, then filling elements one by one using the set method.

    test "py_tuple_creation" {
      let 
    &Show
    tuple
    =
    (Int) -> &Show
    PyTuple::
    (Int) -> &Show
    new
    (3)
    &Show
    tuple
    ..
    (Int, Unit) -> Unit
    set
    (0,
    (Int) -> Unit
    PyInteger::
    (Int) -> Unit
    from
    (1))
    ..
    (Int, Unit) -> Unit
    set
    (1,
    (Double) -> Unit
    PyFloat::
    (Double) -> Unit
    from
    (2.0))
    ..
    (Int, Unit) -> Unit
    set
    (2,
    (String) -> Unit
    PyString::
    (String) -> Unit
    from
    ("three"))
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

    inspect(42, content="42")
    inspect("hello", content="hello")
    inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    tuple
    ,
    String
    content
    ="(1, 2.0, 'three')")
    }
  6. Dictionaries: PyDict mainly supports strings as keys. Use new to create a dictionary and set to add key-value pairs. For non-string keys, use set_by_obj.

    test "py_dict_creation" {
      let 
    &Show
    dict
    =
    () -> &Show
    PyDict::
    () -> &Show
    new
    ()
    &Show
    dict
    ..
    (String, Unit) -> Unit
    set
    ("one",
    (Int) -> Unit
    PyInteger::
    (Int) -> Unit
    from
    (1))
    ..
    (String, Unit) -> Unit
    set
    ("two",
    (Double) -> Unit
    PyFloat::
    (Double) -> Unit
    from
    (2.0))
    fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

    inspect(42, content="42")
    inspect("hello", content="hello")
    inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    dict
    ,
    String
    content
    ="{'one': 1, 'two': 2.0}")
    }

When getting elements from Python composite types, python.mbt performs runtime type checking and returns an Optional[PyObjectEnum] to ensure type safety.

test "py_list_get" {
  let 
Unit
list
=
() -> Unit
PyList::
() -> Unit
new
()
Unit
list
.
(Unit) -> Unit
append
(
(Int) -> Unit
PyInteger::
(Int) -> Unit
from
(1))
Unit
list
.
(Unit) -> Unit
append
(
(String) -> Unit
PyString::
(String) -> Unit
from
("hello"))
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Unit
list
.
(Int) -> Unit
get
(0).
() -> &Show
unwrap
(),
String
content
="PyInteger(1)")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Unit
list
.
(Int) -> Unit
get
(1).
() -> &Show
unwrap
(),
String
content
="PyString('hello')")
fn inspect(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Unit
list
.
(Int) -> &Show
get
(2),
String
content
="None") // Index out of bounds returns None
}

Calling Functions in a Module

Calling a function is a two-step process: first, get the function object with get_attr, then execute the call with invoke. The return value of invoke is a PyObject that requires pattern matching and type conversion.

Here is the MoonBit wrapper for time.sleep and time.time:

// Wrap time.sleep
pub fn 
fn sleep(seconds : Double) -> Unit
sleep
(
Double
seconds
:
Double
Double
) ->
Unit
Unit
{
let
TimeModule
lib
=
let time_mod : () -> TimeModule
time_mod
()
guard
TimeModule
lib
.
?
time_mod
.
(String) -> Unit
get_attr
("sleep") is
(_/0) -> Unit
Some
(
(Unit) -> _/0
PyCallable
(
Unit
f
)) else {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("get function `sleep` failed!")
fn[T] panic() -> T
panic
()
} let
Unit
args
=
(Int) -> Unit
PyTuple::
(Int) -> Unit
new
(1)
Unit
args
.
(Int, Unit) -> Unit
set
(0,
(Double) -> Unit
PyFloat::
(Double) -> Unit
from
(
Double
seconds
))
match (try?
Unit
f
.
(Unit) -> Unit
invoke
(
Unit
args
)) {
(Unit) -> Result[Unit, Error]
Ok
(_) =>
Unit
Ok
(())
(Error) -> Result[Unit, Error]
Err
(
Error
e
) => {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("invoke `sleep` failed!")
fn[T] panic() -> T
panic
()
} } } // Wrap time.time pub fn
fn time() -> Double
time
() ->
Double
Double
{
let
TimeModule
lib
=
let time_mod : () -> TimeModule
time_mod
()
guard
TimeModule
lib
.
?
time_mod
.
(String) -> Unit
get_attr
("time") is
(_/0) -> Unit
Some
(
(Unit) -> _/0
PyCallable
(
Unit
f
)) else {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("get function `time` failed!")
fn[T] panic() -> T
panic
()
} match (try?
Unit
f
.
() -> Unit
invoke
()) {
(Unit) -> Result[Unit, Error]
Ok
(
(_/0) -> Unit
Some
(
(Unit) -> _/0
PyFloat
(
Unit
t
))) =>
Unit
t
.
() -> Double
to_double
()
_ => {
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("invoke `time` failed!")
fn[T] panic() -> T
panic
()
} } }

After wrapping, we can use them in a type-safe way in MoonBit:

test "sleep" {
  let 
Unit
start
=
fn time() -> Double
time
().
() -> Unit
unwrap
()
fn sleep(seconds : Double) -> Unit
sleep
(1)
let
Unit
end
=
fn time() -> Double
time
().
() -> Unit
unwrap
()
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("start = \{
Unit
start
}")
fn[T : Show] println(input : T) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("end = \{
Unit
end
}")
}

Practical Advice

  1. Define Clear Boundaries: Treat python.mbt as the "glue layer" connecting MoonBit and the Python ecosystem. Keep core computation and business logic in MoonBit to leverage its performance and type system advantages, and only use python.mbt when necessary to call Python-exclusive libraries.

  2. Use ADTs Instead of String Magic: Many Python functions accept specific strings as arguments to control behavior. In MoonBit wrappers, these "magic strings" should be converted to Algebraic Data Types (ADTs), i.e., enums. This leverages MoonBit's type system to move runtime value checks to compile time, greatly enhancing code robustness.

  3. Thorough Error Handling: The examples in this article use panic or return simple strings for brevity. In production code, you should define dedicated error types and pass and handle them through the Result type, providing clear error context.

  4. Map Keyword Arguments: Python functions extensively use keyword arguments (kwargs), such as plot(color='blue', linewidth=2). This can be elegantly mapped to MoonBit's Labeled Arguments. When wrapping, prioritize using labeled arguments to provide a similar development experience.

    For example, a Python function that accepts kwargs:

    # graphics.py
    def draw_line(points, color="black", width=1):
        # ... drawing logic ...
        print(f"Drawing line with color {color} and width {width}")
    

    Its MoonBit wrapper can be designed as:

    fn draw_line(points: Array[Point], color~: Color = Black, width: Int = 1) -> Unit {
      let points : PyList = ... // convert Array[Point] to PyList
    
      // construct args
      let args = PyTuple::new(1)
      args .. set(0, points)
    
      // construct kwargs
      let kwargs = PyDict::new()
      kwargs
      ..set("color", PyString::from(color))
      ...set("width", PyInteger::from(width))
      match (try? f.invoke(args~, kwargs~)) {
        Ok(_) => ()
        _ => {
          // handle error
        }
      }
    }
    
  5. Beware of Dynamism: Always remember that Python is dynamically typed. Any data obtained from Python should be treated as "untrusted" and must undergo strict type checking and validation. Avoid using unwrap as much as possible; instead, use pattern matching to safely handle all possible cases.

Conclusion

This article has outlined the working principles of python.mbt and demonstrated how to use it to call Python code in MoonBit, whether through pre-wrapped libraries or by interacting directly with Python modules. python.mbt is not just a tool; it represents a fusion philosophy: combining MoonBit's static analysis, high performance, and engineering advantages with Python's vast and mature ecosystem. We hope this article provides developers in the MoonBit and Python communities with a new, more powerful option for building future software.