Skip to main content

Two Approaches to Regex Engines: Derivative and Thompson VM

· 11 min read

Regular expression engines can be implemented using fundamentally different approaches, each with distinct trade-offs in performance, memory usage, and implementation complexity. This article explores two mathematically equivalent but practically different methods for regex matching: Brzozowski derivatives and Thompson's virtual machine approach.

Both methods operate on the same abstract syntax tree representation, providing a unified foundation for direct performance comparison. The key insight is how these seemingly different approaches solve identical problems through different computational strategies—one through algebraic transformation, the other through program execution.

Conventions & Definitions

To establish a common foundation, both regex engines start with a shared AST representation that captures the essential structure of regular expressions in a tree format:

enum Ast {
  
(Char) -> Ast
Chr
(
Char
Char
)
(Ast, Ast) -> Ast
Seq
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
)
(Ast, Int?) -> Ast
Rep
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Int
Int
?)
(Ast) -> Ast
Opt
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
)

Additionally, we provide smart constructors to simplify regex construction:

fn 
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(chr : Char) -> Ast
chr
(
Char
chr
:
Char
Char
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Char) -> Ast
Chr
(
Char
chr
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(self : Ast, other : Ast) -> Ast
seq
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Ast
other
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Ast, Ast) -> Ast
Seq
(
Ast
self
,
Ast
other
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(self : Ast, n? : Int) -> Ast
rep
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Int?
n
? :
Int
Int
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Ast, Int?) -> Ast
Rep
(
Ast
self
,
Int?
n
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(self : Ast) -> Ast
opt
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
Unit
@fs.
(Ast) -> Ast
Opt
(
Ast
self
)
}

The AST defines four fundamental regex operations:

  1. Chr(Char) matches a single literal character.
  2. Seq(Ast, Ast) matches one pattern followed by another through concatenation.
  3. Rep(Ast, Int?) repeats a pattern either unlimited times when None or exactly n times when Some(n).
  4. Opt(Ast) makes a pattern optional, equivalent to pattern? in standard regex syntax.

For example, we can build the regex (ab*)?—an optional sequence of 'a' followed by zero or more 'b's—as:

Ast::chr('a').seq(Ast::chr('b').rep()).opt()

Brzozowski Derivative

The derivative-based approach transforms regular expressions algebraically using formal language theory. For each input character, it computes the "derivative" of the regex by asking: "what remains to be matched after consuming this character?" This creates a new regex representing the remaining pattern.

We extend the basic Ast type to represent derivatives and nullability explicitly:

enum Exp {
  
Exp
Nil
Exp
Eps
(Char) -> Exp
Chr
(
Char
Char
)
(Exp, Exp) -> Exp
Alt
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
(Exp, Exp) -> Exp
Seq
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
(Exp) -> Exp
Rep
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait Compare {
  compare(Self, Self) -> Int
}

Trait for types whose elements are ordered

The return value of [compare] is:

  • zero, if the two arguments are equal
  • negative, if the first argument is smaller
  • positive, if the first argument is greater
Compare
)

The constructors in Exp represent:

  1. Nil represents an impossible pattern that can never match anything.
  2. Eps matches the empty string.
  3. Chr(Char) matches a single character.
  4. Alt(Exp, Exp) represents alternation, providing choice between patterns.
  5. Seq(Exp, Exp) represents concatenation of two patterns.
  6. Rep(Exp) represents repetition of a pattern.

We use the Exp::of_ast function to convert the Ast into the more expressive Exp format:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match
Ast
ast
{
(Char) -> Ast
Chr
(
Char
c
) =>
(Char) -> Exp
Chr
(
Char
c
)
(Ast, Ast) -> Ast
Seq
(
Ast
a
,
Ast
b
) =>
(Exp, Exp) -> Exp
Seq
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
),
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
b
))
(Ast, Int?) -> Ast
Rep
(
Ast
a
,
Int?
None
) =>
(Exp) -> Exp
Rep
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
))
(Ast, Int?) -> Ast
Rep
(
Ast
a
,
(Int) -> Int?
Some
(
Int
n
)) => {
let
Exp
sec
=
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
)
let mut
Exp
exp
=
Exp
sec
for _ in
Int
1
..<
Int
n
{
Exp
exp
=
(Exp, Exp) -> Exp
Seq
(
Exp
exp
,
Exp
sec
)
}
Exp
exp
}
(Ast) -> Ast
Opt
(
Ast
a
) =>
(Exp, Exp) -> Exp
Alt
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
),
Exp
Eps
)
} }

We also provide smart constructors for Exp to simplify pattern building:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(a : Exp, b : Exp) -> Exp
seq
(
Exp
a
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Exp
b
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match (
Exp
a
,
Exp
b
) {
(
Exp
Nil
, _) | (_,
Exp
Nil
) =>
Exp
Nil
(
Exp
Eps
,
Exp
b
) =>
Exp
b
(
Exp
a
,
Exp
Eps
) =>
Exp
a
(
Exp
a
,
Exp
b
) =>
(Exp, Exp) -> Exp
Seq
(
Exp
a
,
Exp
b
)
} }

However, the smart constructor for Alt is strictly necessary—it ensures that the constructed Exp is normalized to "similarity" as mentioned in the original paper by Brzozowski. Two regexes are similar if one can be reduced to the other by applying the following rules:

AAABBAA(BC)(AB)C \begin{align} & A \mid \emptyset &&\rightarrow A \\ & A \mid B &&\rightarrow B \mid A \\ & A \mid (B \mid C) &&\rightarrow (A \mid B) \mid C \end{align}

Therefore, we normalize the Alt construction to always use the same associativity and order of alternatives:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(a : Exp, b : Exp) -> Exp
alt
(
Exp
a
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Exp
b
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match (
Exp
a
,
Exp
b
) {
(
Exp
Nil
,
Exp
b
) =>
Exp
b
(
Exp
a
,
Exp
Nil
) =>
Exp
a
(
(Exp, Exp) -> Exp
Alt
(
Exp
a
,
Exp
b
),
Exp
c
) =>
Exp
a
.
(a : Exp, b : Exp) -> Exp
alt
(
Exp
b
.
(a : Exp, b : Exp) -> Exp
alt
(
Exp
c
))
(
Exp
a
,
Exp
b
) => {
if
Exp
a
(Exp, Exp) -> Bool

automatically derived

==
Exp
b
{
Exp
a
} else if
Exp
a
(self_ : Exp, other : Exp) -> Bool
>
Exp
b
{
(Exp, Exp) -> Exp
Alt
(
Exp
b
,
Exp
a
)
} else {
(Exp, Exp) -> Exp
Alt
(
Exp
a
,
Exp
b
)
} } } }

The nullable function determines if a pattern can match the empty string without consuming input:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(self : Exp) -> Bool
nullable
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
Bool
Bool
{
match
Exp
self
{
Exp
Nil
=> false
Exp
Eps
=> true
(Char) -> Exp
Chr
(_) => false
(Exp, Exp) -> Exp
Alt
(
Exp
l
,
Exp
r
) =>
Exp
l
.
(self : Exp) -> Bool
nullable
()
(Bool, Bool) -> Bool
||
Exp
r
.
(self : Exp) -> Bool
nullable
()
(Exp, Exp) -> Exp
Seq
(
Exp
l
,
Exp
r
) =>
Exp
l
.
(self : Exp) -> Bool
nullable
()
(Bool, Bool) -> Bool
&&
Exp
r
.
(self : Exp) -> Bool
nullable
()
(Exp) -> Exp
Rep
(_) => true
} }

The deriv function computes the derivative of a pattern with respect to a character, transforming the pattern based on the rules defined in the Brzozowski derivative. We have reordered the rules to match the order in the deriv function:

Da=Daϵ=Daa=ϵDab= for (ab)Da(PQ)=(DaP)(DaQ)Da(PQ)=(DaPQ)(ν(P)DaQ)Da(P)=DaPP \begin{align} D_{a} \emptyset &= \emptyset \\ D_{a} \epsilon &= \emptyset \\ D_{a} a &= \epsilon \\ D_{a} b &= \emptyset & \text{ for }(a \neq b) \\ D_{a} (P \mid Q) &= (D_{a} P) \mid (D_{a} Q) \\ D_{a} (P \cdot Q) &= (D_{a} P \cdot Q) \mid (\nu(P) \cdot D_{a} Q) \\ D_{a} (P\ast) &= D_{a} P \cdot P\ast \\ \end{align}
fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(self : Exp, c : Char) -> Exp
deriv
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Char
c
:
Char
Char
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match
Exp
self
{
Exp
Nil
=>
Exp
self
Exp
Eps
=>
Exp
Nil
(Char) -> Exp
Chr
(
Char
d
) if
Char
d
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
Char
c
=>
Exp
Eps
(Char) -> Exp
Chr
(_) =>
Exp
Nil
(Exp, Exp) -> Exp
Alt
(
Exp
l
,
Exp
r
) =>
Exp
l
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
).
(a : Exp, b : Exp) -> Exp
alt
(
Exp
r
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
))
(Exp, Exp) -> Exp
Seq
(
Exp
l
,
Exp
r
) => {
let
Exp
dl
=
Exp
l
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
)
if
Exp
l
.
(self : Exp) -> Bool
nullable
() {
Exp
dl
.
(a : Exp, b : Exp) -> Exp
seq
(
Exp
r
).
(a : Exp, b : Exp) -> Exp
alt
(
Exp
r
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
))
} else {
Exp
dl
.
(a : Exp, b : Exp) -> Exp
seq
(
Exp
r
)
} }
(Exp) -> Exp
Rep
(
Exp
e
) =>
Exp
e
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
).
(a : Exp, b : Exp) -> Exp
seq
(
Exp
self
)
} }

To simplify our implementation, we only perform strict matching—the pattern must match the entire input string. Therefore, we only check for nullability after the entire input has been consumed:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(self : Exp, s : String) -> Bool
matches
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
String
s
:
String
String
) ->
Bool
Bool
{
loop (
Exp
self
,
String
s
.
(self : String, start_offset? : Int, end_offset? : Int) -> @string.View

Creates a View into a String.

Example

  let str = "Hello🤣🤣🤣"
  let view1 = str.view()
  inspect(view1, content=
   "Hello🤣🤣🤣"
  )
  let start_offset = str.offset_of_nth_char(1).unwrap()
  let end_offset = str.offset_of_nth_char(6).unwrap() // the second emoji
  let view2 = str.view(start_offset~, end_offset~)
  inspect(view2, content=
   "ello🤣"
  )
view
()) {
(
Exp
Nil
, _) => {
return false } (
Exp
e
, []) => {
return
Exp
e
.
(self : Exp) -> Bool
nullable
()
} (
Exp
e
,
@string.View
[
Char
c
@string.View
, .. s]
) => {
continue (
Exp
e
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
),
@string.View
s
)
} } }

Virtual Machine

The VM approach compiles regular expressions into bytecode instructions for a simple virtual machine. This method transforms the pattern-matching problem into program execution, where the VM simulates all possible paths through a non-deterministic finite automaton simultaneously.

Ken Thompson's 1968 paper described a regex engine that compiled patterns into IBM 7094 machine code. The key insight was to avoid exponential backtracking by maintaining multiple execution threads that advance through input in lockstep, processing one character at a time across all possible matching paths.

Instruction Set and Program Representation

The VM operates on four fundamental instructions that correspond to NFA operations:

enum Ops {
  
Ops
Done
(Char) -> Ops
Char
(
Char
Char
)
(Int) -> Ops
Jump
(
Int
Int
)
(Int) -> Ops
Fork
(
Int
Int
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)

Each instruction serves a specific purpose in NFA simulation. Done marks successful completion of pattern matching, equivalent to Thompson's original match. Char(c) consumes input character c and advances to the next instruction. Jump(addr) provides unconditional jump to instruction at address addr (Thompson's jmp). Fork(addr) creates two execution paths—one continues to the next instruction, another jumps to addr (Thompson's split).

The Fork instruction is crucial for handling non-determinism in patterns like alternation and repetition, where multiple execution paths must be explored simultaneously. This maps directly to NFA ε-transitions, where execution can spontaneously branch without consuming input.

We define a Prg that wraps an array of instructions with convenience methods for building and manipulating bytecode programs.

struct Prg(
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
]) derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
fn
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg, inst : Ops) -> Unit
push
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
Ops
inst
:
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
) ->
Unit
Unit
{
Prg
self
.
Array[Ops]
0
.
(self : Array[Ops], value : Ops) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Ops
inst
)
} fn
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg) -> Int
length
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
) ->
Int
Int
{
Prg
self
.
Array[Ops]
0
.
(self : Array[Ops]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
()
} fn
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg, index : Int, inst : Ops) -> Unit
op_set
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
Int
index
:
Int
Int
,
Ops
inst
:
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
) ->
Unit
Unit
{
Prg
self
.
Array[Ops]
0
(Array[Ops], Int, Ops) -> Unit

Sets the element at the specified index in the array to a new value. The original value at that index is overwritten.

Parameters:

  • array : The array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws an error if index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
[
index] =
Ops
inst
}

AST Compilation to Bytecode

The Prg::of_ast function translates AST patterns into VM instructions using standard NFA construction techniques:

  1. Seq(a, b):

    code for a
    code for b
    
  2. Rep(a, None) (unbounded repetition):

        Fork L1, L2
    L1: code for a
        Jump L1
    L2:
    
  3. Rep(a, Some(n)) (fixed repetition):

    code for a
    code for a
    ... (n times) ...
    
  4. Opt(a) (optional):

        Fork L1, L2
    L1: code for a
    L2:
    

Note that the Fork constructor only accepts one address, because we always want to proceed to the next instruction after the Fork.

fn 
type Prg Array[Ops] derive(Show)
Prg
::
(ast : Ast) -> Prg
of_ast
(
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
type Prg Array[Ops] derive(Show)
Prg
{
fn
(Prg, Ast) -> Unit
compile
(
Prg
prog
:
type Prg Array[Ops] derive(Show)
Prg
,
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
Unit
Unit
{
match
Ast
ast
{
(Char) -> Ast
Chr
(
Char
chr
) =>
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Char) -> Ops
Char
(
Char
chr
))
(Ast, Ast) -> Ast
Seq
(
Ast
l
,
Ast
r
) => {
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
l
)
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
r
)
}
(Ast, Int?) -> Ast
Rep
(
Ast
e
,
Int?
None
) => {
let
Int
fork
=
Prg
prog
.
(self : Prg) -> Int
length
()
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Fork
(0))
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Jump
(
Int
fork
))
Prg
prog
(Prg, Int, Ops) -> Unit
[
fork] =
(Int) -> Ops
Fork
(
Prg
prog
.
(self : Prg) -> Int
length
())
}
(Ast, Int?) -> Ast
Rep
(
Ast
e
,
(Int) -> Int?
Some
(
Int
n
)) =>
for _ in
Int
0
..<
Int
n
{
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
}
(Ast) -> Ast
Opt
(
Ast
e
) => {
let
Int
fork_inst
=
Prg
prog
.
(self : Prg) -> Int
length
()
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Fork
(0))
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
Prg
prog
(Prg, Int, Ops) -> Unit
[
fork_inst] =
(Int) -> Ops
Fork
(
Prg
prog
.
(self : Prg) -> Int
length
())
} } } let
Prg
prog
:
type Prg Array[Ops] derive(Show)
Prg
= []
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
ast
)
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
Ops
Done
)
Prg
prog
}

VM Execution Loop

In Rob Pike's implementation, the VM executes one-past the end of the input string to handle the final acceptance state. To make this explicit, our matches function implements the core VM execution loop using a two-phase approach:

Phase 1 handles character processing. For each input character, it processes all active threads in the current context. Char instructions that match the current character create new threads in the next context. Jump and Fork instructions immediately spawn new threads in the current context. After processing all threads, it swaps contexts and continues with the next character.

Phase 2 handles final acceptance. After consuming all input, it processes remaining threads looking for Done instructions. It handles any final Jump/Fork instructions that don't consume input. It returns true if any thread reaches a Done instruction.

fn 
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg, data : @string.View) -> Bool
matches
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
@string.View
data
:
#builtin.valtype
type @string.View

A @string.View represents a view of a String that maintains proper Unicode character boundaries. It allows safe access to a substring while handling multi-byte characters correctly.

@string.View
) ->
Bool
Bool
{
let
(Array[Ops]) -> Prg
Prg
(
Array[Ops]
prog
) =
Prg
self
let mut
Ctx
curr
=
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(length : Int) -> Ctx
new
(
Array[Ops]
prog
.
(self : Array[Ops]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
())
let mut
Ctx
next
=
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(length : Int) -> Ctx
new
(
Array[Ops]
prog
.
(self : Array[Ops]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
())
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(0)
for
Char
c
in
@string.View
data
{
while
Ctx
curr
.
(self : Ctx) -> Int?
pop
() is
(Int) -> Int?
Some
(
Int
pc
) {
match
Array[Ops]
prog
(Array[Ops], Int) -> Ops

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
pc] {
Ops
Done
=> ()
(Char) -> Ops
Char
(
Char
char
) if
Char
char
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
Char
c
=> {
Ctx
next
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
}
(Int) -> Ops
Jump
(
Int
jump
) =>
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
jump
)
(Int) -> Ops
Fork
(
Int
fork
) => {
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
fork
)
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
} _ => () } } let
Ctx
temp
=
Ctx
curr
Ctx
curr
=
Ctx
next
Ctx
next
=
Ctx
temp
Ctx
next
.
(self : Ctx) -> Unit
reset
()
} while
Ctx
curr
.
(self : Ctx) -> Int?
pop
() is
(Int) -> Int?
Some
(
Int
pc
) {
match
Array[Ops]
prog
(Array[Ops], Int) -> Ops

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
pc] {
Ops
Done
=> return true
(Int) -> Ops
Jump
(
Int
x
) =>
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
x
)
(Int) -> Ops
Fork
(
Int
x
) => {
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
x
)
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
} _ => () } } false }

In the original blog post, Rob Pike uses a recursive function to handle Fork and Jump instructions so that threads are executed according to their priorities. Instead, we use a stack-like structure to manage all threads of execution, which naturally respects thread priority:

struct Ctx {
  
@deque.Deque[Int]
deque
:
type @deque.Deque[A]
@deque.Deque
[
Int
Int
]
FixedArray[Bool]
visit
:
type FixedArray[A]
FixedArray
[
Bool
Bool
]
} fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(length : Int) -> Ctx
new
(
Int
length
:
Int
Int
) ->
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
{
{
@deque.Deque[Int]
deque
:
(capacity? : Int) -> @deque.Deque[Int]

Creates a new empty deque with an optional initial capacity.

Parameters:

  • capacity : The initial capacity of the deque. If not specified, defaults to 0 and will be automatically adjusted as elements are added.

Returns a new empty deque of type T[A] where A is the type of elements the deque will hold.

Example

  let dq : @deque.Deque[Int] = @deque.new()
  inspect(dq.length(), content="0")
  inspect(dq.capacity(), content="0")

  let dq : @deque.Deque[Int] = @deque.new(capacity=10)
  inspect(dq.length(), content="0")
  inspect(dq.capacity(), content="10")
@deque.new
(),
FixedArray[Bool]
visit
:
type FixedArray[A]
FixedArray
::
(len : Int, init : Bool) -> FixedArray[Bool]

Creates a new fixed-size array with the specified length, initializing all elements with the given value.

Parameters:

  • length : The length of the array to create. Must be non-negative.
  • initial_value : The value used to initialize all elements in the array.

Returns a new fixed-size array of type FixedArray[T] with length elements, where each element is initialized to initial_value.

Throws a panic if length is negative.

Example:

  let arr = FixedArray::make(3, 42)
  inspect(arr[0], content="42")
  inspect(arr.length(), content="3")

WARNING: A common pitfall is creating with the same initial value, for example:

  let two_dimension_array = FixedArray::make(10, FixedArray::make(10, 0))
  two_dimension_array[0][5] = 10
  assert_eq(two_dimension_array[5][5], 10)

This is because all the cells reference to the same object (the FixedArray[Int] in this case). One should use makei() instead which creates an object for each index.

make
(
Int
length
, false) }
} fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(self : Ctx, pc : Int) -> Unit
add
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
,
Int
pc
:
Int
Int
) ->
Unit
Unit
{
if
Bool
!
Ctx
self
Bool
.
FixedArray[Bool]
visit
(FixedArray[Bool], Int) -> Bool

Retrieves an element at the specified index from a fixed-size array. This function implements the array indexing operator [].

Parameters:

  • array : The fixed-size array to access.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a runtime error if the index is out of bounds (negative or greater than or equal to the length of the array).

Example:

  let arr = FixedArray::make(3, 42)
  inspect(arr[1], content="42")
[
Bool
pc]
{
Ctx
self
.
@deque.Deque[Int]
deque
.
(self : @deque.Deque[Int], value : Int) -> Unit

Adds an element to the back of the deque.

If the deque is at capacity, it will be reallocated.

Example

  let dv = @deque.of([1, 2, 3, 4, 5])
  dv.push_back(6)
  assert_eq(dv.back(), Some(6))
push_back
(
Int
pc
)
Ctx
self
.
FixedArray[Bool]
visit
(FixedArray[Bool], Int, Bool) -> Unit

Sets a value at the specified index in a fixed-size array. The original value at that index is overwritten.

Parameters:

  • array : The fixed-size array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws a runtime error if the index is out of bounds (less than 0 or greater than or equal to the length of the array).

Example:

  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
[
pc] = true
} } fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(self : Ctx) -> Int?
pop
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
) ->
Int
Int
? {
match
Ctx
self
.
@deque.Deque[Int]
deque
.
(self : @deque.Deque[Int]) -> Int?

Removes a back element from a deque and returns it, or None if it is empty.

Example

  let dv = @deque.of([1, 2, 3, 4, 5])
  assert_eq(dv.pop_back(), Some(5))
pop_back
() {
(Int) -> Int?
Some
(
Int
pc
) => {
Ctx
self
.
FixedArray[Bool]
visit
(FixedArray[Bool], Int, Bool) -> Unit

Sets a value at the specified index in a fixed-size array. The original value at that index is overwritten.

Parameters:

  • array : The fixed-size array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws a runtime error if the index is out of bounds (less than 0 or greater than or equal to the length of the array).

Example:

  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
[
pc] = false
(Int) -> Int?
Some
(
Int
pc
)
}
Int?
None
=>
Int?
None
} } fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(self : Ctx) -> Unit
reset
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
) ->
Unit
Unit
{
Ctx
self
.
@deque.Deque[Int]
deque
.
(self : @deque.Deque[Int]) -> Unit

Clears the deque, removing all values.

This method has no effect on the allocated capacity of the deque, only setting the length to 0.

Example

  let dv = @deque.of([1, 2, 3, 4, 5])
  dv.clear()
  inspect(dv.length(), content="0")
clear
()
Ctx
self
.
FixedArray[Bool]
visit
.
(self : FixedArray[Bool], value : Bool, start? : Int, end? : Int) -> Unit

Fill the array with a given value.

This method fills all or part of a FixedArray with the given value.

Parameters

  • value: The value to fill the array with
  • start: The starting index (inclusive, default: 0)
  • end: The ending index (exclusive, optional)

If end is not provided, fills from start to the end of the array. If start equals end, no elements are modified.

Panics

  • Panics if start is negative or greater than or equal to the array length
  • Panics if end is provided and is less than start or greater than array length
  • Does nothing if the array is empty

Example

// Fill entire array
let fa : FixedArray[Int] = [0, 0, 0, 0, 0]
fa.fill(3)
inspect(fa, content="[3, 3, 3, 3, 3]")

// Fill from index 1 to 3 (exclusive)
let fa2 : FixedArray[Int] = [0, 0, 0, 0, 0]
fa2.fill(9, start=1, end=3)
inspect(fa2, content="[0, 9, 9, 0, 0]")

// Fill from index 2 to end
let fa3 : FixedArray[String] = ["a", "b", "c", "d"]
fa3.fill("x", start=2)
inspect(fa3, content=(
  #|["a", "b", "x", "x"]
))
fill
(false)
}

The visit array is used to drop low-priority threads. When a new thread is added, we first check if it is already in the deque using the visit array. If it is, we drop it; otherwise, we add it to the deque and mark it as visited. This mechanism is necessary to avoid infinite loops or exponential blowup when the regex contains patterns that can be expanded indefinitely, such as (a?)*.

Benchmarks and Performance Analysis

The benchmark demonstrates both approaches on a pathological case that challenges many regex implementations:

test (
@bench.T
b
:
type @bench.T
@bench.T
) {
let
Int
n
= 15
let
String
txt
= "a".
(self : String, n : Int) -> String

Returns a new string with self repeated n times.

repeat
(
Int
n
)
let
Ast
chr
=
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(chr : Char) -> Ast
chr
('a')
let
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
=
Ast
chr
.
(self : Ast) -> Ast
opt
().
(self : Ast, n~ : Int) -> Ast
rep
(
Int
n
~).
(self : Ast, other : Ast) -> Ast
seq
(
Ast
chr
.
(self : Ast, n~ : Int) -> Ast
rep
(
Int
n
~))
let
Exp
exp
=
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
ast
)
@bench.T
b
.
(self : @bench.T, name~ : String, f : () -> Unit, count? : UInt) -> Unit

Run a benchmark in batch mode

bench
(
String
name
="derive", () =>
Exp
exp
.
(self : Exp, s : String) -> Bool
matches
(
String
txt
) |>
(t : Bool) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
())
let
Prg
tvm
=
type Prg Array[Ops] derive(Show)
Prg
::
(ast : Ast) -> Prg
of_ast
(
Ast
ast
)
@bench.T
b
.
(self : @bench.T, name~ : String, f : () -> Unit, count? : UInt) -> Unit

Run a benchmark in batch mode

bench
(
String
name
="thompson", () =>
Prg
tvm
.
(self : Prg, data : @string.View) -> Bool
matches
(
String
txt
) |>
(t : Bool) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
())
}

This pattern (a?){n}a{n} represents a classical exponential blowup case for backtracking engines. The pattern allows n different ways to match n 'a' characters, creating exponential search spaces in naive implementations.

name     time (mean ± σ)         range (min … max)
derive     41.78 µs ±   0.14 µs    41.61 µs …  42.13 µs  in 10 ×   2359 runs
thompson   12.79 µs ±   0.04 µs    12.74 µs …  12.84 µs  in 10 ×   7815 runs

The benchmark results show that the VM approach is significantly faster than the derivative-based approach for this case. The derivative method frequently allocates intermediate regex structures, leading to higher overhead and slower performance. In contrast, the VM executes a fixed set of instructions and rarely allocates new structures once the deque grows to its full size.

However, the derivative approach is easier to reason about. We can easily prove termination of the algorithm, as the number of derivatives to be computed is bounded by the size of the AST and strictly decreases with each recursive application of the deriv function. The VM approach, on the other hand, can potentially run indefinitely if the input Prg contains infinite loops, and requires careful handling of thread priority to avoid infinite loops and exponential blowup in the number of threads.

Prettyprinter: Declarative Structured Data Formatting with Function Composition

· 8 min read

When working with structured data, printing it in a clear and adaptable format is a common challenge. This comes up often in debugging, logging, and code generation. For instance, an array literal [a,b,c] should ideally print on one line if the screen is wide enough, but gracefully wrap and indent when space is limited.

Traditional solutions often rely on manually concatenating strings while tracking indentation levels. This approach is not only tedious, but also error-prone.

A more elegant solution is to use function composition. With this approach, we build a prettyprinter: a system where users combine primitive formatting functions into a Doc structure that describes the intended layout. Given a maximum width, the prettyprinter automatically chooses the most readable formatting.

This makes the printing process declarative—you specify what the layout should look like under different conditions, and the system figures out how to render it.

SimpleDoc Primitives

We begin with a minimal representation called SimpleDoc. It consists of just four primitives:

enum SimpleDoc {
  
SimpleDoc
Empty
SimpleDoc
Line
(String) -> SimpleDoc
Text
(
String
String
)
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
,
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
)
}
  • Empty: represents an empty string
  • Line: represents a newline
  • Text(String): plain text without line breaks
  • Cat(SimpleDoc, SimpleDoc): concatenates two SimpleDocss

Using these primitives, we can implement a simple rendering function. It flattens a SimpleDoc into a string using a stack-based traversal:

fn 
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
::
(doc : SimpleDoc) -> String
render
(
SimpleDoc
doc
:
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
) ->
String
String
{
let
StringBuilder
buf
=
type StringBuilder
StringBuilder
::
(size_hint? : Int) -> StringBuilder

Creates a new string builder with an optional initial capacity hint.

Parameters:

  • size_hint : An optional initial capacity hint for the internal buffer. If less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes, not the size of characters. size_hint may be ignored on some platforms, JS for example.

Returns a new StringBuilder instance with the specified initial capacity.

new
()
let
Array[SimpleDoc]
stack
= [
SimpleDoc
doc
]
while
Array[SimpleDoc]
stack
.
(self : Array[SimpleDoc]) -> SimpleDoc?

Removes the last element from a array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() is
(SimpleDoc) -> SimpleDoc?
Some
(
SimpleDoc
doc
) {
match
SimpleDoc
doc
{
SimpleDoc
Empty
=> ()
SimpleDoc
Line
=> {
StringBuilder
buf
..
(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
("\n")
}
(String) -> SimpleDoc
Text
(
String
text
) => {
StringBuilder
buf
.
(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
(
String
text
)
}
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
SimpleDoc
left
,
SimpleDoc
right
) =>
Array[SimpleDoc]
stack
..
(self : Array[SimpleDoc], value : SimpleDoc) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
SimpleDoc
right
)..
(self : Array[SimpleDoc], value : SimpleDoc) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
SimpleDoc
left
)
} }
StringBuilder
buf
.
(self : StringBuilder) -> String

Returns the current content of the StringBuilder as a string.

to_string
()
}

Here’s a quick test: we can see that the expressiveness of SimpleDoc is equivalent to String: Empty corresponds to "", Line corresponds to "\n", Text("a") corresponds to "a", and Cat(Text("a"), Text("b")) corresponds to "a" + "b".

test "simple doc" {
  let 
SimpleDoc
doc
:
enum SimpleDoc {
  Empty
  Line
  Text(String)
  Cat(SimpleDoc, SimpleDoc)
}
SimpleDoc
=
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
(String) -> SimpleDoc
Text
("hello"),
(SimpleDoc, SimpleDoc) -> SimpleDoc
Cat
(
SimpleDoc
Line
,
(String) -> SimpleDoc
Text
("world")))
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
SimpleDoc
doc
.
(doc : SimpleDoc) -> String
render
(),
String
content
=(
#|hello #|world ), ) }

At this stage, the SimpleDoc doesn’t yet handle indentation or layout choices—but we’re about to fix that.

ExtendDoc: Nest, Choice, Group

To handle real-world formatting, we extend SimpleDoc with three new primitives:

enum ExtendDoc {
  
ExtendDoc
Empty
ExtendDoc
Line
(String) -> ExtendDoc
Text
(
String
String
)
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
(Int, ExtendDoc) -> ExtendDoc
Nest
(
Int
Int
,
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
(ExtendDoc) -> ExtendDoc
Group
(
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
)
}
  • Nest Nest(Int, ExtendDoc) indents the doc by n spaces after each line break. Nested levels accumulate.

  • Choice Choice(ExtendDoc, ExtendDoc) stores two alternative layouts. Usually, the first parameter is the more compact layout without line breaks, and the second is the layout with Lines. The renderer uses the first layout in compact mode and the second otherwise.

  • Group Group(ExtendDoc) groups an ExtendDoc and decides between compact or non-compact layout based on the available width. If the remaining space is sufficient, it prints compactly; otherwise, it falls back to the layout with line breaks.

Measuring Space

To know whether compact layout fits, we need a way to estimate how many characters a document would require:

let 
Int
max_space
= 9999
fn
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
::
(self : ExtendDoc) -> Int
space
(
ExtendDoc
self
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
Self
) ->
Int
Int
{
match
ExtendDoc
self
{
ExtendDoc
Empty
=> 0
ExtendDoc
Line
=>
Int
max_space
(String) -> ExtendDoc
Text
(
String
str
) =>
String
str
.
(self : String) -> Int

Returns the number of UTF-16 code units in the string. Note that this is not necessarily equal to the number of Unicode characters (code points) in the string, as some characters may be represented by multiple UTF-16 code units.

Parameters:

  • string : The string whose length is to be determined.

Returns the number of UTF-16 code units in the string.

Example:

  inspect("hello".length(), content="5")
  inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
  inspect("".length(), content="0") // Empty string
length
()
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
ExtendDoc
a
,
ExtendDoc
b
) =>
ExtendDoc
a
.
(self : ExtendDoc) -> Int
space
()
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
ExtendDoc
b
.
(self : ExtendDoc) -> Int
space
()
(Int, ExtendDoc) -> ExtendDoc
Nest
(_,
ExtendDoc
a
) |
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
ExtendDoc
a
, _) |
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
a
) =>
ExtendDoc
a
.
(self : ExtendDoc) -> Int
space
()
} }

Here, Line is treated as requiring “infinite” space. This guarantees that if a Group contains a line break, it won’t attempt to print compactly.

Rendering ExtendDoc

We extend SimpleDoc::render to implement ExtendDoc::render. Since after printing a substructure we need to return to the original indentation level, the stack must also store two states for each pending ExtendDoc: indentation and whether compact mode is active. We also maintain a column variable to track the number of characters already used on the current line, in order to calculate remaining space. Finally, the function adds a width parameter to specify the maximum line width.

fn 
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
::
(doc : ExtendDoc, width? : Int) -> String
render
(
ExtendDoc
doc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
Int
width
~ :
Int
Int
= 80) ->
String
String
{
let
StringBuilder
buf
=
type StringBuilder
StringBuilder
::
(size_hint? : Int) -> StringBuilder

Creates a new string builder with an optional initial capacity hint.

Parameters:

  • size_hint : An optional initial capacity hint for the internal buffer. If less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes, not the size of characters. size_hint may be ignored on some platforms, JS for example.

Returns a new StringBuilder instance with the specified initial capacity.

new
()
let
Array[(Int, Bool, ExtendDoc)]
stack
= [(0, false,
ExtendDoc
doc
)] // default: no indentation, non-compact mode
let mut
Int
column
= 0
while
Array[(Int, Bool, ExtendDoc)]
stack
.
(self : Array[(Int, Bool, ExtendDoc)]) -> (Int, Bool, ExtendDoc)?

Removes the last element from a array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() is
((Int, Bool, ExtendDoc)) -> (Int, Bool, ExtendDoc)?
Some
((
Int
indent
,
Bool
fit
,
ExtendDoc
doc
)) {
match
ExtendDoc
doc
{
ExtendDoc
Empty
=> ()
ExtendDoc
Line
=> {
StringBuilder
buf
..
(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
("\n")
for _ in
Int
0
..<
Int
indent
{
StringBuilder
buf
.
(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
(" ")
}
Int
column
=
Int
indent
}
(String) -> ExtendDoc
Text
(
String
text
) => {
StringBuilder
buf
.
(self : StringBuilder, str : String) -> Unit

Writes a string to the StringBuilder.

write_string
(
String
text
)
Int
column
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
String
text
.
(self : String) -> Int

Returns the number of UTF-16 code units in the string. Note that this is not necessarily equal to the number of Unicode characters (code points) in the string, as some characters may be represented by multiple UTF-16 code units.

Parameters:

  • string : The string whose length is to be determined.

Returns the number of UTF-16 code units in the string.

Example:

  inspect("hello".length(), content="5")
  inspect("🤣".length(), content="2") // Emoji uses two UTF-16 code units
  inspect("".length(), content="0") // Empty string
length
()
}
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
ExtendDoc
left
,
ExtendDoc
right
) =>
Array[(Int, Bool, ExtendDoc)]
stack
..
(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
,
Bool
fit
,
ExtendDoc
right
))..
(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
,
Bool
fit
,
ExtendDoc
left
))
(Int, ExtendDoc) -> ExtendDoc
Nest
(
Int
n
,
ExtendDoc
doc
) =>
Array[(Int, Bool, ExtendDoc)]
stack
..
(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
n
,
Bool
fit
,
ExtendDoc
doc
))
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
ExtendDoc
a
,
ExtendDoc
b
) =>
Array[(Int, Bool, ExtendDoc)]
stack
.
(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(if
Bool
fit
{ (
Int
indent
,
Bool
fit
,
ExtendDoc
a
) } else { (
Int
indent
,
Bool
fit
,
ExtendDoc
b
) })
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
doc
) => {
let
Bool
fit
=
Bool
fit
(Bool, Bool) -> Bool
||
Int
column
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
ExtendDoc
doc
.
(self : ExtendDoc) -> Int
space
()
(self_ : Int, other : Int) -> Bool
<=
Int
width
Array[(Int, Bool, ExtendDoc)]
stack
.
(self : Array[(Int, Bool, ExtendDoc)], value : (Int, Bool, ExtendDoc)) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
((
Int
indent
,
Bool
fit
,
ExtendDoc
doc
))
} } }
StringBuilder
buf
.
(self : StringBuilder) -> String

Returns the current content of the StringBuilder as a string.

to_string
()
}

Let’s use ExtendDoc to describe a (expr) and print it under different width:

let 
ExtendDoc
softline
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
ExtendDoc
Empty
,
ExtendDoc
Line
)
impl
trait Add {
  add(Self, Self) -> Self
  op_add(Self, Self) -> Self
}

types implementing this trait can use the + operator

Add
for
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
with
(a : ExtendDoc, b : ExtendDoc) -> ExtendDoc
op_add
(
ExtendDoc
a
,
ExtendDoc
b
) {
(ExtendDoc, ExtendDoc) -> ExtendDoc
Cat
(
ExtendDoc
a
,
ExtendDoc
b
)
} test "tuple" { let
ExtendDoc
tuple
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
(String) -> ExtendDoc
Text
("(")
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(Int, ExtendDoc) -> ExtendDoc
Nest
(2,
ExtendDoc
softline
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(String) -> ExtendDoc
Text
("expr"))
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
softline
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(String) -> ExtendDoc
Text
(")"),
)
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
tuple
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=40),
String
content
="(expr)")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
tuple
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=5),
String
content
=(
#|( #| expr #|) ), ) }

Here, softline is defined as a choice between Empty and Line. Since rendering starts in non-compact mode, we wrap the whole expression with Group. When the width is sufficient, the entire expression prints on one line; otherwise, it automatically wraps with indentation. To improve readability, we overloaded the + operator for ExtendDoc.

Composition Functions

In practice, users rely more on higher-level combinators built from the ExtendDoc primitives—like the softline above. Let’s introduce some useful functions for structured printing.

softline & softbreak

let 
ExtendDoc
softbreak
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc, ExtendDoc) -> ExtendDoc
Choice
(
(String) -> ExtendDoc
Text
(" "),
ExtendDoc
Line
)

Similar to softline, except that in compact mode it inserts a space. Note that within the same Group, all Choices follow the same compact or non-compact decision.

let 
ExtendDoc
abc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
("abc")
let
ExtendDoc
def
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
("def")
let
ExtendDoc
ghi
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
("ghi")
test "softbreak" { let
ExtendDoc
doc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
abc
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
softbreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
def
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
softbreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
ghi
)
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=20),
String
content
="abc def ghi")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=10),
String
content
=(
#|abc #|def #|ghi ), ) }

autoline & autobreak

let 
ExtendDoc
autoline
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
softline
)
let
ExtendDoc
autobreak
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
softbreak
)

autoline and autobreak make sure the ExtendDocs fit as much as possible on one line, like text editors do.

test {
  let 
ExtendDoc
doc
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(ExtendDoc) -> ExtendDoc
Group
(
ExtendDoc
abc
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
autobreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
def
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
autobreak
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
ghi
,
)
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=10),
String
content
="abc def ghi")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=5),
String
content
=(
#|abc def #|ghi ), )
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
doc
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=3),
String
content
=(
#|abc #|def #|ghi ), ) }

sepby

fn 
(xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby
(
Array[ExtendDoc]
xs
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
],
ExtendDoc
sep
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
) ->
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
{
match
Array[ExtendDoc]
xs
{
[] =>
ExtendDoc
Empty
Array[ExtendDoc]
[
ExtendDoc
x
Array[ExtendDoc]
, .. xs]
=>
ArrayView[ExtendDoc]
xs
.
(self : ArrayView[ExtendDoc], init~ : ExtendDoc, f : (ExtendDoc, ExtendDoc) -> ExtendDoc) -> ExtendDoc

Fold out values from an View according to certain rules.

Example

  let sum = [1, 2, 3, 4, 5][:].fold(init=0, (sum, elem) => sum + elem)
  inspect(sum, content="15")
fold
(
ExtendDoc
init
=
ExtendDoc
x
, (
ExtendDoc
a
,
ExtendDoc
b
) =>
ExtendDoc
a
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
sep
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
b
)
} }

sepby inserts a separator sep between ExtendDocs.

let 
ExtendDoc
comma
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
=
(String) -> ExtendDoc
Text
(",")
test { let
ExtendDoc
layout
=
(ExtendDoc) -> ExtendDoc
Group
(
(xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby
([
ExtendDoc
abc
,
ExtendDoc
def
,
ExtendDoc
ghi
],
ExtendDoc
comma
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
softbreak
))
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
layout
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=40),
String
content
="abc, def, ghi")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
ExtendDoc
layout
.
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=10),
String
content
=(
#|abc, #|def, #|ghi ), ) }

surround

fn 
(m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround
(
ExtendDoc
m
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
ExtendDoc
l
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
,
ExtendDoc
r
:
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
) ->
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
{
ExtendDoc
l
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
m
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
r
}

surround wraps an ExtendDoc with left and right delimiters.

test {
  
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround
(
ExtendDoc
abc
,
(String) -> ExtendDoc
Text
("("),
(String) -> ExtendDoc
Text
(")")).
(doc : ExtendDoc, width? : Int) -> String
render
(),
String
content
="(abc)")
}

Printing JSON

Using the functions above, we can implement a JSON prettyprinter. This function recursively processes each JSON element and generates the appropriate layout.

fn 
(x : Json) -> ExtendDoc
pretty
(
Json
x
:
enum Json {
  Null
  True
  False
  Number(Double, repr~ : String?)
  String(String)
  Array(Array[Json])
  Object(Map[String, Json])
}
Json
) ->
enum ExtendDoc {
  Empty
  Line
  Text(String)
  Cat(ExtendDoc, ExtendDoc)
  Nest(Int, ExtendDoc)
  Choice(ExtendDoc, ExtendDoc)
  Group(ExtendDoc)
}
ExtendDoc
{
fn
(Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list
(
Array[ExtendDoc]
xs
,
ExtendDoc
l
,
ExtendDoc
r
) {
(
(Int, ExtendDoc) -> ExtendDoc
Nest
(2,
ExtendDoc
softline
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(xs : Array[ExtendDoc], sep : ExtendDoc) -> ExtendDoc
sepby
(
Array[ExtendDoc]
xs
,
ExtendDoc
comma
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
softbreak
))
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
ExtendDoc
softline
)
|>
(m : ExtendDoc, l : ExtendDoc, r : ExtendDoc) -> ExtendDoc
surround
(
ExtendDoc
l
,
ExtendDoc
r
)
|>
(ExtendDoc) -> ExtendDoc
Group
} match
Json
x
{
(Array[Json]) -> Json
Array
(
Array[Json]
elems
) => {
let
Array[ExtendDoc]
elems
=
Array[Json]
elems
.
(self : Array[Json]) -> Iter[Json]

Creates an iterator over the elements of the array.

Parameters:

  • array : The array to create an iterator from.

Returns an iterator that yields each element of the array in order.

Example:

  let arr = [1, 2, 3]
  let mut sum = 0
  arr.iter().each((x) => { sum = sum + x })
  inspect(sum, content="6")
iter
().
(self : Iter[Json], f : (Json) -> ExtendDoc) -> Iter[ExtendDoc]

Transforms the elements of the iterator using a mapping function.

Type Parameters

  • T: The type of the elements in the iterator.
  • R: The type of the transformed elements.

Arguments

  • self - The input iterator.
  • f - The mapping function that transforms each element of the iterator.

Returns

A new iterator that contains the transformed elements.

map
(
(x : Json) -> ExtendDoc
pretty
).
(self : Iter[ExtendDoc]) -> Array[ExtendDoc]

Collects the elements of the iterator into an array.

collect
()
(Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list
(
Array[ExtendDoc]
elems
,
(String) -> ExtendDoc
Text
("["),
(String) -> ExtendDoc
Text
("]"))
}
(Map[String, Json]) -> Json
Object
(
Map[String, Json]
pairs
) => {
let
Array[ExtendDoc]
pairs
=
Map[String, Json]
pairs
.
(self : Map[String, Json]) -> Iter[(String, Json)]

Returns the iterator of the hash map, provide elements in the order of insertion.

iter
()
.
(self : Iter[(String, Json)], f : ((String, Json)) -> ExtendDoc) -> Iter[ExtendDoc]

Transforms the elements of the iterator using a mapping function.

Type Parameters

  • T: The type of the elements in the iterator.
  • R: The type of the transformed elements.

Arguments

  • self - The input iterator.
  • f - The mapping function that transforms each element of the iterator.

Returns

A new iterator that contains the transformed elements.

map
(
(String, Json)
p
=>
(ExtendDoc) -> ExtendDoc
Group
(
(String) -> ExtendDoc
Text
(
(String, Json)
p
.
String
0
.
(self : String) -> String

Returns a valid MoonBit string literal representation of a string, add quotes and escape special characters.

Examples

  let str = "Hello \n"
  inspect(str.to_string(), content="Hello \n")
  inspect(str.escape(), content="\"Hello \\n\"")
escape
())
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(String) -> ExtendDoc
Text
(": ")
(self : ExtendDoc, other : ExtendDoc) -> ExtendDoc
+
(x : Json) -> ExtendDoc
pretty
(
(String, Json)
p
.
Json
1
)))
.
(self : Iter[ExtendDoc]) -> Array[ExtendDoc]

Collects the elements of the iterator into an array.

collect
()
(Array[ExtendDoc], ExtendDoc, ExtendDoc) -> ExtendDoc
comma_list
(
Array[ExtendDoc]
pairs
,
(String) -> ExtendDoc
Text
("{"),
(String) -> ExtendDoc
Text
("}"))
}
(String) -> Json
String
(
String
s
) =>
(String) -> ExtendDoc
Text
(
String
s
.
(self : String) -> String

Returns a valid MoonBit string literal representation of a string, add quotes and escape special characters.

Examples

  let str = "Hello \n"
  inspect(str.to_string(), content="Hello \n")
  inspect(str.escape(), content="\"Hello \\n\"")
escape
())
(Double, repr~ : String?) -> Json
Number
(
Double
i
) =>
(String) -> ExtendDoc
Text
(
Double
i
.
(self : Double) -> String

Converts a double-precision floating-point number to its string representation.

Parameters:

  • self: The double-precision floating-point number to be converted.

Returns a string representation of the double-precision floating-point number.

Example:

  inspect(42.0.to_string(), content="42")
  inspect(3.14159.to_string(), content="3.14159")
  inspect((-0.0).to_string(), content="0")
  inspect(@double.not_a_number.to_string(), content="NaN")
to_string
())
Json
False
=>
(String) -> ExtendDoc
Text
("false")
Json
True
=>
(String) -> ExtendDoc
Text
("true")
Json
Null
=>
(String) -> ExtendDoc
Text
("null")
} }

When rendered, the JSON automatically adapts to different widths:

test {
  let 
Json
json
:
enum Json {
  Null
  True
  False
  Number(Double, repr~ : String?)
  String(String)
  Array(Array[Json])
  Object(Map[String, Json])
}
Json
= {
"key1": "string", "key2": [12345, 67890], "key3": [ { "field1": 1, "field2": 2 }, { "field1": 1, "field2": 2 }, { "field1": [1, 2], "field2": 2 }, ], }
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(x : Json) -> ExtendDoc
pretty
(
Json
json
).
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=80),
String
content
=(
#|{ #| "key1": "string", #| "key2": [12345, 67890], #| "key3": [ #| {"field1": 1, "field2": 2}, #| {"field1": 1, "field2": 2}, #| {"field1": [1, 2], "field2": 2} #| ] #|} ), )
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(x : Json) -> ExtendDoc
pretty
(
Json
json
).
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=30),
String
content
=(
#|{ #| "key1": "string", #| "key2": [12345, 67890], #| "key3": [ #| {"field1": 1, "field2": 2}, #| {"field1": 1, "field2": 2}, #| { #| "field1": [1, 2], #| "field2": 2 #| } #| ] #|} ), )
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(x : Json) -> ExtendDoc
pretty
(
Json
json
).
(doc : ExtendDoc, width~ : Int) -> String
render
(
Int
width
=20),
String
content
=(
#|{ #| "key1": "string", #| "key2": [ #| 12345, #| 67890 #| ], #| "key3": [ #| { #| "field1": 1, #| "field2": 2 #| }, #| { #| "field1": 1, #| "field2": 2 #| }, #| { #| "field1": [ #| 1, #| 2 #| ], #| "field2": 2 #| } #| ] #|} ), ) }

Conclusion

By combining a small set of primitives with function composition, we can build a flexible, declarative prettyprinter that adapts structured data layouts to the available screen width.

This approach scales well: you describe layout intentions with combinators like sepby, surround, or autobreak, and the rendering engine takes care of indentation, line breaks, and fitting.

The current implementation can be further optimized:

  • Memoizing space calculations to improve performance.
  • Adding a ribbon parameter to balance whitespace vs. content density
  • Supporting advanced layouts like hanging indents or mandatory line breaks

For a deeper dive, see Philip Wadler’s classic paper A prettier printer – Philip Wadler, as well as prettyprinter libraries in Haskell, OCaml, and other languages.

Mini-adapton: incremental computation in MoonBit

· 10 min read

Introduction

Let's first illustrate how incremental computation looks like with an example similar to spreadsheet. First define a dependency graph like this:

In this graph, t1's value is computed from n1 + n2 and t2's value is computed from t1 + n3.

When we want to get the value of t2, the computation defined in the graph will be done: first t1 is computed by n1 + n2, then t2 is computed by t1 + n3. This process is the same as non-incremental computation.

However, when we start to change values in n1, n2, or n3, things get different. Say we swap the value of n1 and n2, then get t2's value. In non-incremental computation, both t1 and t2 will be recomputed. But the computation of t2 is actually not needed, since all its dependency t1 and n3 are not changed (swap n1 and n2 wont change t1's value).

The following code example does exactly what we describe above. We use Cell::new to define n1, n2, and n3, which does not need computation. And Thunk::new to define t1 and t2 with computation.

test {
  // a counter to record the times of t2's computation
  let mut 
Int
cnt
= 0
// start define the graph let
Cell[Int]
n1
=
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
(value : Int) -> Cell[Int]
new
(1)
let
Cell[Int]
n2
=
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
(value : Int) -> Cell[Int]
new
(2)
let
Cell[Int]
n3
=
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
(value : Int) -> Cell[Int]
new
(3)
let
Thunk[Int]
t1
=
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
(thunk : () -> Int) -> Thunk[Int]
new
(fn() {
Cell[Int]
n1
.
(self : Cell[Int]) -> Int
get
()
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Cell[Int]
n2
.
(self : Cell[Int]) -> Int
get
()
}) let
Thunk[Int]
t2
=
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
(thunk : () -> Int) -> Thunk[Int]
new
(fn() {
Int
cnt
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
Thunk[Int]
t1
.
(self : Thunk[Int]) -> Int
get
()
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Cell[Int]
n3
.
(self : Cell[Int]) -> Int
get
()
}) // get the value of t2
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Thunk[Int]
t2
.
(self : Thunk[Int]) -> Int
get
(),
String
content
="6")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Int
cnt
,
String
content
="1")
// swap value of n1 and n2
Cell[Int]
n1
.
(self : Cell[Int], new_value : Int) -> Unit
set
(2)
Cell[Int]
n2
.
(self : Cell[Int], new_value : Int) -> Unit
set
(1)
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Thunk[Int]
t2
.
(self : Thunk[Int]) -> Int
get
(),
String
content
="6")
// t2 does not recompute
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Int
cnt
,
String
content
="1")
}

In this article, we will show how to implement an incremental computation library in MoonBit with the api used in the above example:

Cell::new
Cell::get
Cell::set
Thunk::new
Thunk::get

Problem Analysis and Solution

To implement the library, there are three main problems to solve:

Build up dependency graph on the fly

As a library in MoonBit, we don't have any easy ways to build up the dependency graph statically, since MoonBit does not have any meta programming mechanism currently. Therefore, we need to construct dependency graph on the fly. Since all we care about is what cells/thunks does a thunk depend on, a good option to build up dependency graph would be when user calls Thunk::get. Take the code above as an example:

let n1 = Cell::new(1)
let n2 = Cell::new(2)
let n3 = Cell::new(3)
let t1 = Thunk::new(fn() { n1.get() + n2.get() })
let t2 = Thunk::new(fn() { t1.get() + n3.get() })
t2.get()

When user calls t2.get(), we can know that at runtime t1.get() and n3.get() are called inside it. Therefore, t1 and n3 are dependencies of t2 and we can construct a subgraph:

The same story will also happen when t1.get() is called inside t2.get().

So here is the plan:

  1. we declare a stack to record which thunk are we currently getting. The reason we use stack here is that we are essentially record call stacks of every get.
  2. whenever we call get, mark it as the dependency of stack top. If it's a thunk, push it onto stack.
  3. whenever a thunk's get finished, pop it off the stack.

Let's see the full process of above example under this algorithm:

  1. when we call t2.get, push t2 on the stack.

  2. when we call t1.get inside t2.get, mark t1 as a dependency of t2 and push t1 onto the stack.

  3. when we call n1.get inside t1.get, mark n1 as a dependency of t1.

  4. same story goes for n2.

  5. when t1.get finished, pop it from stack.

  6. when we call n3.get, mark n3 as a dependency of t2

Besides the edge from dependent to dependency, we'd better also record an edge from dependency to dependent, so that we can easily traverse the graph backwards when we need.

In the code below, we'll use outgoing_edges to refer to edge from parent(dependent) to child (dependency) and incoming_edges to refer to the opposite.

A mechanism to mark outdated node

Whenever we call Cell::set, the node itself and all nodes depend on it should be marked as outdated. This will be one of the criteria to determine whether a thunk needs to be recomputed. This is generally a recursive backward traverse from a leaf of a graph. We can describe the process as pseudo MoonBit code:

fn dirty(node: Node) -> Unit {
  for n in node.incoming_edges {
    n.set_dirty(true)
    dirty(node)
  }
}

Determine whether a thunk needs to be recomputed

Whenever we call Thunk::get, we need to determine whether it really needs to be recomputed. But the dirty mechanism we describe in the last subsection is not enough. If we only use dirtiness to determine whether a thunk needs to be recomputed, there would be unneeded computation. Let's see it from the example we give at the beginning:

n1.set(2)
n2.set(1)
inspect(t2.get(), content="6")

After we swap the value of n1 and n2, n1, n2, t1, and t2 should all be marked as dirty, but when we call t2.get, there is no need to recompute t2, since the value of t1 does not change.

This reminds us that despite dirtiness, we need also to record whether a node's value differs from its last value. If a node is both dirty and one of its dependencies' value changed, it needs to be recomputed.

We can describe the algorithm as the pseudo MoonBit code below:

fn propagate(self: Node) -> Unit {
  // When a node is dirty, it might need to be recomputed
  if self.is_dirty() {
    // after recomputing, it's no longer dirty
    self.set_dirty(false)
    for dependency in self.outgoing_edges() {
      // recursively recompute every dependency
      dependency.propagate()
      // If a dependency's value changed, the node needs to be recomputed
      if dependency.is_changed() {
        // remove all incoming_edges and outgoing_edges, since they will be reconstructed during evaluate
        self.incoming_edges().clear()
        self.outgoing_edges().clear()
        self.evaluate()
        return
      }
    }
  }
}

Implementation

Given the algorithms described in the last section, the implementation should be quite straightforward.

First, let's define Cell:

struct Cell[A] {
  mut 
Bool
is_dirty
:
Bool
Bool
mut
A
value
:

type parameter A

A
mut
Bool
is_changed
:
Bool
Bool
Array[&Node]
incoming_edges
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
}

Since Cell can only be leaf node in dependency graph, it does not have outgoing_edges. The trait Node here is used to abstract node in dependency graph.

Then, let's define Thunk:

struct Thunk[A] {
  mut 
Bool
is_dirty
:
Bool
Bool
mut
A?
value
:

type parameter A

A
?
mut
Bool
is_changed
:
Bool
Bool
() -> A
thunk
: () ->

type parameter A

A
Array[&Node]
incoming_edges
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
Array[&Node]
outgoing_edges
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
}

Thunk's value is optional, since it only exists after we first call Thunk::get.

We can easily add new for both types:

fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
(value : A) -> Cell[A]
new
(
A
value
:

type parameter A

A
) ->
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] {
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::{
Bool
is_changed
: false,
A
value
,
Array[&Node]
incoming_edges
: [],
Bool
is_dirty
: false,
} }
fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
(thunk : () -> A) -> Thunk[A]
new
(
() -> A
thunk
: () ->

type parameter A

A
) ->
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] {
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::{
A?
value
:
A?
None
,
Bool
is_changed
: false,
() -> A
thunk
,
Array[&Node]
incoming_edges
: [],
Array[&Node]
outgoing_edges
: [],
Bool
is_dirty
: false,
} }

Thunk and Cell are the two kinds of node in dependency graph, we can use the trait Node mentioned above to abstract them:

trait 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
{
(Self) -> Bool
is_dirty
(

type parameter Self

Self
) ->
Bool
Bool
(Self, Bool) -> Unit
set_dirty
(

type parameter Self

Self
,
Bool
Bool
) ->
Unit
Unit
(Self) -> Array[&Node]
incoming_edges
(

type parameter Self

Self
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
(Self) -> Array[&Node]
outgoing_edges
(

type parameter Self

Self
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
]
(Self) -> Bool
is_changed
(

type parameter Self

Self
) ->
Bool
Bool
(Self) -> Unit
evaluate
(

type parameter Self

Self
) ->
Unit
Unit
}

And implement the trait for both types:

impl[A] 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
(self : Cell[A]) -> Array[&Node]
incoming_edges
(
Cell[A]
self
) {
Cell[A]
self
.
Array[&Node]
incoming_edges
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
(_self : Cell[A]) -> Array[&Node]
outgoing_edges
(
Cell[A]
_self
) {
[] } impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
(self : Cell[A]) -> Bool
is_dirty
(
Cell[A]
self
) {
Cell[A]
self
.
Bool
is_dirty
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
(self : Cell[A], new_dirty : Bool) -> Unit
set_dirty
(
Cell[A]
self
,
Bool
new_dirty
) {
Cell[A]
self
.
Bool
is_dirty
=
Bool
new_dirty
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
(self : Cell[A]) -> Bool
is_changed
(
Cell[A]
self
) {
Cell[A]
self
.
Bool
is_changed
} impl[A]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
] with
(_self : Cell[A]) -> Unit
evaluate
(
Cell[A]
_self
) {
() } impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
(self : Thunk[A]) -> Bool
is_changed
(
Thunk[A]
self
) {
Thunk[A]
self
.
Bool
is_changed
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
(self : Thunk[A]) -> Array[&Node]
outgoing_edges
(
Thunk[A]
self
) {
Thunk[A]
self
.
Array[&Node]
outgoing_edges
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
(self : Thunk[A]) -> Array[&Node]
incoming_edges
(
Thunk[A]
self
) {
Thunk[A]
self
.
Array[&Node]
incoming_edges
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
(self : Thunk[A]) -> Bool
is_dirty
(
Thunk[A]
self
) {
Thunk[A]
self
.
Bool
is_dirty
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
(self : Thunk[A], new_dirty : Bool) -> Unit
set_dirty
(
Thunk[A]
self
,
Bool
new_dirty
) {
Thunk[A]
self
.
Bool
is_dirty
=
Bool
new_dirty
} impl[A :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
for
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
] with
(self : Thunk[A]) -> Unit
evaluate
(
Thunk[A]
self
) {
// push self into node_stack top // now self is active target
Array[&Node]
node_stack
.
(self : Array[&Node], value : &Node) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Thunk[A]
self
)
// `self.thunk` might contains `source.get()`, // such as `s1.get()`, `s2.get()` and `s3.get()` // // when call `Thunk::get` or `Cell::get`, // they will treat `node_stack.last()` as themself's target. // if source is `Cell`, then it only record `incoming_edges`. // if source is `Thunk`, then it record `incoming_edges` and `outgoing_edges`, connect each other. // let
A
value
= (
Thunk[A]
self
.
() -> A
thunk
)()
Thunk[A]
self
.
Bool
is_changed
= match
Thunk[A]
self
.
A?
value
{
A?
None
=> true
(A) -> A?
Some
(
A
v
) =>
A
v
(x : A, y : A) -> Bool
!=
A
value
}
Thunk[A]
self
.
A?
value
=
(A) -> A?
Some
(
A
value
)
// pop self from node_stack // now self is no longer active target
Array[&Node]
node_stack
.
(self : Array[&Node]) -> &Node

Removes and returns the last element from the array.

Parameters:

  • array : The array from which to remove and return the last element.

Returns the last element of the array before removal.

Example:

  let arr = [1, 2, 3]
  inspect(arr.unsafe_pop(), content="3")
  inspect(arr, content="[1, 2]")
unsafe_pop
() |>
(t : &Node) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
}

The only complicated implementation is Thunk's evaluate. Here we need first to push the thunk on stack for dependency recording. node_stack is defined as below:

let 
Array[&Node]
node_stack
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
] = []

Then do the real computation and compare it with the last value to update self.is_changed. is_changed is used later to determine whether we need to recompute a thunk.

dirty and propagate are almost the same as the pseudo code described above:

fn 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
(self : &Node) -> Unit
dirty
(
&Node
self
: &
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
) ->
Unit
Unit
{
for
&Node
dependent
in
&Node
self
.
(&Node) -> Array[&Node]
incoming_edges
() {
if
(x : Bool) -> Bool

Performs logical negation on a boolean value.

Parameters:

  • value : The boolean value to negate.

Returns the logical NOT of the input value: true if the input is false, and false if the input is true.

Example:

  inspect(not(true), content="false")
  inspect(not(false), content="true")
not
(
&Node
dependent
.
(&Node) -> Bool
is_dirty
()) {
&Node
dependent
.
(&Node, Bool) -> Unit
set_dirty
(true)
&Node
dependent
.
(self : &Node) -> Unit
dirty
()
} } }
fn 
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
(self : &Node) -> Unit
propagate
(
&Node
self
: &
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
Node
) ->
Unit
Unit
{
if
&Node
self
.
(&Node) -> Bool
is_dirty
() {
&Node
self
.
(&Node, Bool) -> Unit
set_dirty
(false)
for
&Node
dependency
in
&Node
self
.
(&Node) -> Array[&Node]
outgoing_edges
() {
&Node
dependency
.
(self : &Node) -> Unit
propagate
()
if
&Node
dependency
.
(&Node) -> Bool
is_changed
() {
&Node
self
.
(&Node) -> Array[&Node]
incoming_edges
().
(self : Array[&Node]) -> Unit

Clears the array, removing all values.

This method has no effect on the allocated capacity of the array, only setting the length to 0.

Example

  let v = [3, 4, 5]
  v.clear()
  assert_eq(v.length(), 0)
clear
()
&Node
self
.
(&Node) -> Array[&Node]
outgoing_edges
().
(self : Array[&Node]) -> Unit

Clears the array, removing all values.

This method has no effect on the allocated capacity of the array, only setting the length to 0.

Example

  let v = [3, 4, 5]
  v.clear()
  assert_eq(v.length(), 0)
clear
()
&Node
self
.
(&Node) -> Unit
evaluate
()
return } } } }

With all the foundation we build, the three main api: Cell::get, Cell:set, and Thunk::get are easy to implement.

To get value from a cell, it's simply just return the value filed in struct. But before that, we need first record it as a dependency if it's called inside Thunk::get.

fn[A] 
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
(self : Cell[A]) -> A
get
(
Cell[A]
self
:
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
]) ->

type parameter A

A
{
if
Array[&Node]
node_stack
.
(self : Array[&Node]) -> &Node?

Returns the last element of the array, or None if the array is empty.

Parameters:

  • array : The array to get the last element from.

Returns an optional value containing the last element of the array. The result is None if the array is empty, or Some(x) where x is the last element of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr.last(), content="Some(3)")
  let empty : Array[Int] = []
  inspect(empty.last(), content="None")
last
() is
(&Node) -> &Node?
Some
(
&Node
target
) {
&Node
target
.
(&Node) -> Array[&Node]
outgoing_edges
().
(self : Array[&Node], value : &Node) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Cell[A]
self
)
Cell[A]
self
.
Array[&Node]
incoming_edges
.
(self : Array[&Node], value : &Node) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
&Node
target
)
}
Cell[A]
self
.
A
value
}

Whenever we set a cell, we need to first make sure that the two states is_changed and dirty are updated correctly. Then mark every dependent as dirty.

fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
::
(self : Cell[A], new_value : A) -> Unit
set
(
Cell[A]
self
:
struct Cell[A] {
  mut is_dirty: Bool
  mut value: A
  mut is_changed: Bool
  incoming_edges: Array[&Node]
}
Cell
[

type parameter A

A
],
A
new_value
:

type parameter A

A
) ->
Unit
Unit
{
if
Cell[A]
self
.
A
value
(x : A, y : A) -> Bool
!=
A
new_value
{
Cell[A]
self
.
Bool
is_changed
= true
Cell[A]
self
.
A
value
=
A
new_value
Cell[A]
self
.
(self : Cell[A], new_dirty : Bool) -> Unit
set_dirty
(true)
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
(&Node) -> Unit
dirty
(
Cell[A]
self
)
} }

In Thunk::get, similar to Cell::get, we first need to record self as a dependency. After that we pattern match on self.value. If it's None, it means that this is the first time user tries to get the thunk's value, so we can safely just evaluate it. If it's Some, we use propagate to make sure that we only recompute thunks that's really needed.

fn[A : 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
]
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
::
(self : Thunk[A]) -> A
get
(
Thunk[A]
self
:
struct Thunk[A] {
  mut is_dirty: Bool
  mut value: A?
  mut is_changed: Bool
  thunk: () -> A
  incoming_edges: Array[&Node]
  outgoing_edges: Array[&Node]
}
Thunk
[

type parameter A

A
]) ->

type parameter A

A
{
if
Array[&Node]
node_stack
.
(self : Array[&Node]) -> &Node?

Returns the last element of the array, or None if the array is empty.

Parameters:

  • array : The array to get the last element from.

Returns an optional value containing the last element of the array. The result is None if the array is empty, or Some(x) where x is the last element of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr.last(), content="Some(3)")
  let empty : Array[Int] = []
  inspect(empty.last(), content="None")
last
() is
(&Node) -> &Node?
Some
(
&Node
target
) {
&Node
target
.
(&Node) -> Array[&Node]
outgoing_edges
().
(self : Array[&Node], value : &Node) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Thunk[A]
self
)
Thunk[A]
self
.
Array[&Node]
incoming_edges
.
(self : Array[&Node], value : &Node) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
&Node
target
)
} match
Thunk[A]
self
.
A?
value
{
A?
None
=>
Thunk[A]
self
.
(self : Thunk[A]) -> Unit
evaluate
()
(A) -> A?
Some
(_) =>
trait Node {
  is_dirty(Self) -> Bool
  set_dirty(Self, Bool) -> Unit
  incoming_edges(Self) -> Array[&Node]
  outgoing_edges(Self) -> Array[&Node]
  is_changed(Self) -> Bool
  evaluate(Self) -> Unit
}
&Node
::
(&Node) -> Unit
propagate
(
Thunk[A]
self
)
}
Thunk[A]
self
.
A?
value
.
(self : A?) -> A

Extract the value in Some.

If the value is None, it throws a panic.

unwrap
()
}

Reference

A Guide to MoonBit Python Integration

· 12 min read

Introduction

Python, with its concise syntax and vast ecosystem, has become one of the most popular programming languages today. However, discussions around its performance bottlenecks and the maintainability of its dynamic typing system in large-scale projects have never ceased. To address these challenges, the developer community has explored various optimization paths.

The python.mbt tool, officially launched by MoonBit, offers a new perspective. It allows developers to call Python code directly within the MoonBit environment. This combination aims to merge MoonBit's static type safety and high-performance potential with Python's mature ecosystem. Through python.mbt, developers can leverage MoonBit's static analysis capabilities, modern build and testing tools, while enjoying Python's rich library functions, making it possible to build large-scale, high-performance system-level software.

This article aims to delve into the working principles of python.mbt and provide a practical guide. It will answer common questions such as: How does python.mbt work? Is it slower than native Python due to an added intermediate layer? What are its advantages over existing tools like C++'s pybind11 or Rust's PyO3? To answer these questions, we first need to understand the basic workflow of the Python interpreter.

How the Python Interpreter Works

The Python interpreter executes code in three main stages:

  1. Parsing: This stage includes lexical analysis and syntax analysis. The interpreter breaks down human-readable Python source code into tokens and then organizes these tokens into a tree-like structure, the Abstract Syntax Tree (AST), based on syntax rules.

    For example, for the following Python code:

    def add(x, y):
      return x + y
    
    a = add(1, 2)
    print(a)
    

    We can use Python's ast module to view its generated AST structure:

    Module(
        body=[
            FunctionDef(
                name='add',
                args=arguments(
                    args=[
                        arg(arg='x'),
                        arg(arg='y')]),
                body=[
                    Return(
                        value=BinOp(
                            left=Name(id='x', ctx=Load()),
                            op=Add(),
                            right=Name(id='y', ctx=Load())))]),
            Assign(
                targets=[
                    Name(id='a', ctx=Store())],
                value=Call(
                    func=Name(id='add', ctx=Load()),
                    args=[
                        Constant(value=1),
                        Constant(value=2)])),
            Expr(
                value=Call(
                    func=Name(id='print', ctx=Load()),
                    args=[
                        Name(id='a', ctx=Load())]))])
    
  2. Compilation: Next, the Python interpreter compiles the AST into a lower-level, more linear intermediate representation called bytecode. This is a platform-independent instruction set designed for the Python Virtual Machine (PVM).

    Using Python's dis module, we can view the bytecode corresponding to the above code:

      2           LOAD_CONST               0 (<code object add>)
                  MAKE_FUNCTION
                  STORE_NAME               0 (add)
    
      5           LOAD_NAME                0 (add)
                  PUSH_NULL
                  LOAD_CONST               1 (1)
                  LOAD_CONST               2 (2)
                  CALL                     2
                  STORE_NAME               1 (a)
    
      6           LOAD_NAME                2 (print)
                  PUSH_NULL
                  LOAD_NAME                1 (a)
                  CALL                     1
                  POP_TOP
                  RETURN_CONST             3 (None)
    
  3. Execution: Finally, the Python Virtual Machine (PVM) executes the bytecode instructions one by one. Each instruction corresponds to a C function call in the CPython interpreter's underlying layer. For example, LOAD_NAME looks up a variable, and BINARY_OP performs a binary operation. It is this process of interpreting and executing instructions one by one that is the main source of Python's performance overhead. A simple 1 + 2 operation involves the entire complex process of parsing, compilation, and virtual machine execution.

Understanding this process helps us grasp the basic approaches to Python performance optimization and the design philosophy of python.mbt.

Paths to Optimizing Python Performance

Currently, there are two mainstream methods for improving Python program performance:

  1. Just-In-Time (JIT) Compilation: Projects like PyPy analyze a running program and compile frequently executed "hotspot" bytecode into highly optimized native machine code, thereby bypassing the PVM's interpretation and significantly speeding up computationally intensive tasks. However, JIT is not a silver bullet; it cannot solve the inherent problems of Python's dynamic typing, such as the difficulty of effective static analysis in large projects, which poses challenges for software maintenance.
  2. Native Extensions: Developers can use languages like C++ (with pybind11) or Rust (with PyO3) to directly call Python functions or to write performance-critical modules that are then called from Python. This method can achieve near-native performance, but it requires developers to be proficient in both Python and a complex system-level language, presenting a steep learning curve and a high barrier to entry for most Python programmers.

python.mbt is also a native extension. But compared to languages like C++ and Rust, it attempts to find a new balance between performance, ease of use, and engineering capabilities, with a greater emphasis on using Python features directly within the MoonBit language.

  1. High-Performance Core: MoonBit is a statically typed, compiled language whose code can be efficiently compiled into native machine code. Developers can implement computationally intensive logic in MoonBit to achieve high performance from the ground up.
  2. Seamless Python Calls: python.mbt interacts directly with CPython's C-API to call Python modules and functions. This means call overhead is minimized, bypassing Python's parsing and compilation stages and going straight to the virtual machine execution layer.
  3. Gentler Learning Curve: Compared to C++ and Rust, MoonBit's syntax is more modern and concise. It also has comprehensive support for functional programming, a documentation system, unit testing, and static analysis tools, making it more friendly to developers accustomed to Python.
  4. Improved Engineering and AI Collaboration: MoonBit's strong type system and clear interface definitions make code intent more explicit and easier for static analysis tools and AI-assisted programming tools to understand. This helps maintain code quality in large projects and improves the efficiency and accuracy of collaborative coding with AI.

Using Pre-wrapped Python Libraries in MoonBit

To facilitate developer use, MoonBit will officially wrap mainstream Python libraries once the build system and IDE are mature. After wrapping, users can use these Python libraries in their projects just like importing regular MoonBit packages. Let's take the matplotlib plotting library as an example.

First, add the matplotlib dependency in your project's root moon.pkg.json or via the terminal:

moon update
moon add Kaida-Amethyst/matplotlib

Then, declare the import in the moon.pkg.json of the sub-package where you want to use the library. Here, we follow Python's convention and set an alias plt:

{
  "import": [
    {
      "path": "Kaida-Amethyst/matplotlib",
      "alias": "plt"
    }
  ]
}

After configuration, you can call matplotlib in your MoonBit code to create plots:

let 
(Double) -> Double
sin
: (
Double
Double
) ->
Double
Double
=
(x : Double) -> Double

Calculates the sine of a number in radians. Handles special cases and edge conditions according to IEEE 754 standards.

Parameters:

  • x : The angle in radians for which to calculate the sine.

Returns the sine of the angle x.

Example:

inspect(@math.sin(0.0), content="0")
inspect(@math.sin(1.570796326794897), content="1") // pi / 2
inspect(@math.sin(2.0), content="0.9092974268256817")
inspect(@math.sin(-5.0), content="0.9589242746631385")
inspect(@math.sin(31415926535897.9323846), content="0.0012091232715481885")
inspect(@math.sin(@double.not_a_number), content="NaN")
inspect(@math.sin(@double.infinity), content="NaN")
inspect(@math.sin(@double.neg_infinity), content="NaN")
@math.sin
fn main { let
Array[Double]
x
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
(Int, (Int) -> Double) -> Array[Double]

Creates a new array of the specified length, where each element is initialized using an index-based initialization function.

Parameters:

  • length : The length of the new array. If length is less than or equal to 0, returns an empty array.
  • initializer : A function that takes an index (starting from 0) and returns a value of type T. This function is called for each index to initialize the corresponding element.

Returns a new array of type Array[T] with the specified length, where each element is initialized using the provided function.

Example:

  let arr = Array::makei(3, i => i * 2)
  inspect(arr, content="[0, 2, 4]")
makei
(100, fn(
Int
i
) {
Int
i
.
(self : Int) -> Double

Converts a 32-bit integer to a double-precision floating-point number. The conversion preserves the exact value since all integers in the range of Int can be represented exactly as Double values.

Parameters:

  • self : The 32-bit integer to be converted.

Returns a double-precision floating-point number that represents the same numerical value as the input integer.

Example:

  let n = 42
  inspect(n.to_double(), content="42")
  let neg = -42
  inspect(neg.to_double(), content="-42")
to_double
()
(self : Double, other : Double) -> Double

Multiplies two double-precision floating-point numbers. This is the implementation of the * operator for Double type.

Parameters:

  • self : The first double-precision floating-point operand.
  • other : The second double-precision floating-point operand.

Returns a new double-precision floating-point number representing the product of the two operands. Special cases follow IEEE 754 standard:

  • If either operand is NaN, returns NaN
  • If one operand is infinity and the other is zero, returns NaN
  • If one operand is infinity and the other is a non-zero finite number, returns infinity with the appropriate sign
  • If both operands are infinity, returns infinity with the appropriate sign

Example:

  inspect(2.5 * 2.0, content="5")
  inspect(-2.0 * 3.0, content="-6")
  let nan = 0.0 / 0.0 // NaN
  inspect(nan * 1.0, content="NaN")
*
0.1 })
let
Array[Double]
y
=
Array[Double]
x
.
(self : Array[Double], f : (Double) -> Double) -> Array[Double]

Maps a function over the elements of the array.

Example

  let v = [3, 4, 5]
  let v2 = v.map((x) => {x + 1})
  assert_eq(v2, [4, 5, 6])
map
(
(Double) -> Double
sin
)
// To ensure type safety, the wrapped subplots interface always returns a tuple of a fixed type. // This avoids the dynamic behavior in Python where the return type depends on the arguments. let (_,
Unit
axes
) =
(Int, Int) -> (Unit, Unit)
plt::subplots
(1, 1)
// Use the .. cascade call syntax
Unit
axes
[0
(Int) -> Unit
]
[0]
..
(Array[Double], Array[Double], Unit, Unit, Int) -> Unit
plot
(
Array[Double]
x
,
Array[Double]
y
,
Unit
color
=
Unit
Green
,
Unit
linestyle
=
Unit
Dashed
,
Int
linewidth
= 2)
..
(String) -> Unit
set_title
("Sine of x")
..
(String) -> Unit
set_xlabel
("x")
..
(String) -> Unit
set_ylabel
("sin(x)")
() -> Unit
@plt.show
()
}

Currently, on macOS and Linux, MoonBit's build system can automatically handle dependencies. On Windows, users may need to manually install a C compiler and configure the Python environment. Future MoonBit IDEs will aim to simplify this process.

Using Unwrapped Python Modules in MoonBit

The Python ecosystem is vast, and even with AI technology, relying solely on official wrappers is not realistic. Fortunately, we can use the core features of python.mbt to interact directly with any Python module. Below, we demonstrate this process using the simple time module from the Python standard library.

Introducing python.mbt

First, ensure your MoonBit toolchain is up to date, then add the python.mbt dependency:

moon update
moon add Kaida-Amethyst/python

Next, import it in your package's moon.pkg.json:

{
  "import": ["Kaida-Amethyst/python"]
}

python.mbt automatically handles the initialization (Py_Initialize) and shutdown of the Python interpreter, so developers don't need to manage it manually.

Importing Python Modules

Use the @python.pyimport function to import modules. To avoid performance loss from repeated imports, it is recommended to use a closure technique to cache the imported module object:

// Define a struct to hold the Python module object for enhanced type safety
pub struct TimeModule {
  
?
time_mod
: PyModule
} // Define a function that returns a closure for getting a TimeModule instance fn
() -> () -> TimeModule
import_time_mod
() -> () ->
struct TimeModule {
  time_mod: ?
}
TimeModule
{
// The import operation is performed only on the first call guard
(String) -> Unit
@python.pyimport
("time") is
(?) -> Unit
Some
(
?
time_mod
) else {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Failed to load Python module: time")
() -> () -> TimeModule
panic
("ModuleLoadError")
} let
TimeModule
time_mod
=
struct TimeModule {
  time_mod: ?
}
TimeModule
::{
?
time_mod
}
// The returned closure captures the time_mod variable fn () {
TimeModule
time_mod
}
} // Create a global time_mod "getter" function let
() -> TimeModule
time_mod
: () ->
struct TimeModule {
  time_mod: ?
}
TimeModule
=
() -> () -> TimeModule
import_time_mod
()

In subsequent code, we should always call time_mod() to get the module, not import_time_mod.

Converting Between MoonBit and Python Objects

To call Python functions, we need to convert between MoonBit objects and Python objects (PyObject).

  1. Integers: Use PyInteger::from to create a PyInteger from an Int64, and to_int64() for the reverse conversion.

    test "py_integer_conversion" {
      let 
    Int64
    n
    :
    Int64
    Int64
    = 42
    let
    &Show
    py_int
    =
    (Int64) -> &Show
    PyInteger::from
    (
    Int64
    n
    )
    (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

      inspect(42, content="42")
      inspect("hello", content="hello")
      inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    py_int
    ,
    String
    content
    ="42")
    (a : Int64, b : Int64, msg? : String, loc~ : SourceLoc = _) -> Unit raise

    Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

    Parameters:

    • a : First value to compare.
    • b : Second value to compare.
    • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

    Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

    Example:

      assert_eq(1, 1)
      assert_eq("hello", "hello")
    assert_eq
    (
    &Show
    py_int
    .
    () -> Int64
    to_int64
    (), 42L)
    }
  2. Floats: Use PyFloat::from and to_double.

    test "py_float_conversion" {
      let 
    Double
    n
    :
    Double
    Double
    = 3.5
    let
    &Show
    py_float
    =
    (Double) -> &Show
    PyFloat::from
    (
    Double
    n
    )
    (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

      inspect(42, content="42")
      inspect("hello", content="hello")
      inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    py_float
    ,
    String
    content
    ="3.5")
    (a : Double, b : Double, msg? : String, loc~ : SourceLoc = _) -> Unit raise

    Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

    Parameters:

    • a : First value to compare.
    • b : Second value to compare.
    • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

    Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

    Example:

      assert_eq(1, 1)
      assert_eq("hello", "hello")
    assert_eq
    (
    &Show
    py_float
    .
    () -> Double
    to_double
    (), 3.5)
    }
  3. Strings: Use PyString::from and to_string.

    test "py_string_conversion" {
      let 
    &Show
    py_str
    =
    (String) -> &Show
    PyString::from
    ("hello")
    (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

      inspect(42, content="42")
      inspect("hello", content="hello")
      inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    py_str
    ,
    String
    content
    ="'hello'")
    (a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise

    Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

    Parameters:

    • a : First value to compare.
    • b : Second value to compare.
    • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

    Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

    Example:

      assert_eq(1, 1)
      assert_eq("hello", "hello")
    assert_eq
    (
    &Show
    py_str
    .
    (&Show) -> String
    to_string
    (), "hello")
    }
  4. Lists: You can create an empty PyList and append elements, or create one directly from an Array[&IsPyObject].

    test "py_list_from_array" {
      let 
    Unit
    one
    =
    (Int) -> Unit
    PyInteger::from
    (1)
    let
    Unit
    two
    =
    (Double) -> Unit
    PyFloat::from
    (2.0)
    let
    Unit
    three
    =
    (String) -> Unit
    PyString::from
    ("three")
    let
    Array[Unit]
    arr
    Array[Unit]
    :
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    Array[Unit]
    [&IsPyObject]
    = [
    Unit
    one
    ,
    Unit
    two
    ,
    Unit
    three
    ]
    let
    &Show
    list
    =
    (Array[Unit]) -> &Show
    PyList::from
    (
    Array[Unit]
    arr
    )
    (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

      inspect(42, content="42")
      inspect("hello", content="hello")
      inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    list
    ,
    String
    content
    ="[1, 2.0, 'three']")
    }
  5. Tuples: PyTuple requires specifying the size first, then filling elements one by one using the set method.

    test "py_tuple_creation" {
      let 
    &Show
    tuple
    =
    (Int) -> &Show
    PyTuple::new
    (3)
    &Show
    tuple
    ..
    (Int, Unit) -> Unit
    set
    (0,
    (Int) -> Unit
    PyInteger::from
    (1))
    ..
    (Int, Unit) -> Unit
    set
    (1,
    (Double) -> Unit
    PyFloat::from
    (2.0))
    ..
    (Int, Unit) -> Unit
    set
    (2,
    (String) -> Unit
    PyString::from
    ("three"))
    (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

      inspect(42, content="42")
      inspect("hello", content="hello")
      inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    tuple
    ,
    String
    content
    ="(1, 2.0, 'three')")
    }
  6. Dictionaries: PyDict mainly supports strings as keys. Use new to create a dictionary and set to add key-value pairs. For non-string keys, use set_by_obj.

    test "py_dict_creation" {
      let 
    &Show
    dict
    =
    () -> &Show
    PyDict::new
    ()
    &Show
    dict
    ..
    (String, Unit) -> Unit
    set
    ("one",
    (Int) -> Unit
    PyInteger::from
    (1))
    ..
    (String, Unit) -> Unit
    set
    ("two",
    (Double) -> Unit
    PyFloat::from
    (2.0))
    (obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

    Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

    Parameters:

    • object : The object to be inspected. Must implement the Show trait.
    • content : The expected string representation of the object. Defaults to an empty string.
    • location : Source code location information for error reporting. Automatically provided by the compiler.
    • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

    Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

    Example:

      inspect(42, content="42")
      inspect("hello", content="hello")
      inspect([1, 2, 3], content="[1, 2, 3]")
    inspect
    (
    &Show
    dict
    ,
    String
    content
    ="{'one': 1, 'two': 2.0}")
    }

When getting elements from Python composite types, python.mbt performs runtime type checking and returns an Optional[PyObjectEnum] to ensure type safety.

test "py_list_get" {
  let 
Unit
list
=
() -> Unit
PyList::new
()
Unit
list
.
(Unit) -> Unit
append
(
(Int) -> Unit
PyInteger::from
(1))
Unit
list
.
(Unit) -> Unit
append
(
(String) -> Unit
PyString::from
("hello"))
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Unit
list
.
(Int) -> Unit
get
(0).
() -> &Show
unwrap
(),
String
content
="PyInteger(1)")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Unit
list
.
(Int) -> Unit
get
(1).
() -> &Show
unwrap
(),
String
content
="PyString('hello')")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
Unit
list
.
(Int) -> &Show
get
(2),
String
content
="None") // Index out of bounds returns None
}

Calling Functions in a Module

Calling a function is a two-step process: first, get the function object with get_attr, then execute the call with invoke. The return value of invoke is a PyObject that requires pattern matching and type conversion.

Here is the MoonBit wrapper for time.sleep and time.time:

// Wrap time.sleep
pub fn 
(seconds : Double) -> Unit
sleep
(
Double
seconds
:
Double
Double
) ->
Unit
Unit
{
let
TimeModule
lib
=
() -> TimeModule
time_mod
()
guard
TimeModule
lib
.
?
time_mod
.
(String) -> Unit
get_attr
("sleep") is
(_/0) -> Unit
Some
(
(Unit) -> _/0
PyCallable
(
Unit
f
)) else {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("get function `sleep` failed!")
() -> Unit
panic
()
} let
Unit
args
=
(Int) -> Unit
PyTuple::new
(1)
Unit
args
.
(Int, Unit) -> Unit
set
(0,
(Double) -> Unit
PyFloat::from
(
Double
seconds
))
match (try?
Unit
f
.
(Unit) -> Unit
invoke
(
Unit
args
)) {
(Unit) -> Result[Unit, Error]
Ok
(_) =>
Unit
Ok
(())
(Error) -> Result[Unit, Error]
Err
(
Error
e
) => {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("invoke `sleep` failed!")
() -> Unit
panic
()
} } } // Wrap time.time pub fn
() -> Double
time
() ->
Double
Double
{
let
TimeModule
lib
=
() -> TimeModule
time_mod
()
guard
TimeModule
lib
.
?
time_mod
.
(String) -> Unit
get_attr
("time") is
(_/0) -> Unit
Some
(
(Unit) -> _/0
PyCallable
(
Unit
f
)) else {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("get function `time` failed!")
() -> Double
panic
()
} match (try?
Unit
f
.
() -> Unit
invoke
()) {
(Unit) -> Result[Unit, Error]
Ok
(
(_/0) -> Unit
Some
(
(Unit) -> _/0
PyFloat
(
Unit
t
))) =>
Unit
t
.
() -> Double
to_double
()
_ => {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("invoke `time` failed!")
() -> Double
panic
()
} } }

After wrapping, we can use them in a type-safe way in MoonBit:

test "sleep" {
  let 
Unit
start
=
() -> Double
time
().
() -> Unit
unwrap
()
(seconds : Double) -> Unit
sleep
(1)
let
Unit
end
=
() -> Double
time
().
() -> Unit
unwrap
()
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("start = \{
Unit
start
}")
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("end = \{
Unit
end
}")
}

Practical Advice

  1. Define Clear Boundaries: Treat python.mbt as the "glue layer" connecting MoonBit and the Python ecosystem. Keep core computation and business logic in MoonBit to leverage its performance and type system advantages, and only use python.mbt when necessary to call Python-exclusive libraries.

  2. Use ADTs Instead of String Magic: Many Python functions accept specific strings as arguments to control behavior. In MoonBit wrappers, these "magic strings" should be converted to Algebraic Data Types (ADTs), i.e., enums. This leverages MoonBit's type system to move runtime value checks to compile time, greatly enhancing code robustness.

  3. Thorough Error Handling: The examples in this article use panic or return simple strings for brevity. In production code, you should define dedicated error types and pass and handle them through the Result type, providing clear error context.

  4. Map Keyword Arguments: Python functions extensively use keyword arguments (kwargs), such as plot(color='blue', linewidth=2). This can be elegantly mapped to MoonBit's Labeled Arguments. When wrapping, prioritize using labeled arguments to provide a similar development experience.

    For example, a Python function that accepts kwargs:

    # graphics.py
    def draw_line(points, color="black", width=1):
        # ... drawing logic ...
        print(f"Drawing line with color {color} and width {width}")
    

    Its MoonBit wrapper can be designed as:

    fn draw_line(points: Array[Point], color~: Color = Black, width: Int = 1) -> Unit {
      let points : PyList = ... // convert Array[Point] to PyList
    
      // construct args
      let args = PyTuple::new(1)
      args .. set(0, points)
    
      // construct kwargs
      let kwargs = PyDict::new()
      kwargs
      ..set("color", PyString::from(color))
      ...set("width", PyInteger::from(width))
      match (try? f.invoke(args~, kwargs~)) {
        Ok(_) => ()
        _ => {
          // handle error
        }
      }
    }
    
  5. Beware of Dynamism: Always remember that Python is dynamically typed. Any data obtained from Python should be treated as "untrusted" and must undergo strict type checking and validation. Avoid using unwrap as much as possible; instead, use pattern matching to safely handle all possible cases.

Conclusion

This article has outlined the working principles of python.mbt and demonstrated how to use it to call Python code in MoonBit, whether through pre-wrapped libraries or by interacting directly with Python modules. python.mbt is not just a tool; it represents a fusion philosophy: combining MoonBit's static analysis, high performance, and engineering advantages with Python's vast and mature ecosystem. We hope this article provides developers in the MoonBit and Python communities with a new, more powerful option for building future software.

A Guide to MoonBit C-FFI

· 16 min read


Introduction

MoonBit is a modern functional programming language featuring a robust type system, highly readable syntax, and a toolchain designed for AI. However, reinventing the wheel is not always the best approach. Countless time-tested, high-performance libraries are written in C (or languages with a C-compatible ABI, like C++, Rust). From low-level hardware manipulation to complex scientific computing and graphics rendering, the C ecosystem is a treasure trove of powerful tools.

So, can we make the modern MoonBit work in harmony with these classic C libraries, allowing the pioneers of the new world to wield the powerful tools of the old? The answer is a resounding yes. Through the C Foreign Function Interface (C-FFI), MoonBit can call C functions, bridging these two worlds.

This article will be your guide, leading you step-by-step through the mysteries of MoonBit's C-FFI. We will use a concrete example—creating MoonBit bindings for a C math library called mymath—to learn how to handle different data types, pointers, structs, and even function pointers.

Prerequisites

To connect to any C library, we need to know the functions in its header file, how to find the header file, and how to find the library file. For our task, the header file for the C math library is mymath.h. It defines the various functions and types we want to call from MoonBit. We'll assume mymath is installed on the system, and we'll use -I/usr/include to find the header file and -L/usr/lib -lmymath to link the library during compilation. Here is a part of our mymath.h:

// mymath.h

// --- Basic Functions ---
void print_version();
int version_major();
int is_normal(double input);

// --- Floating-Point Calculations ---
float sinf(float input);
float cosf(float input);
float tanf(float input);
double sin(double input);
double cos(double input);
double tan(double input);

// --- Strings and Pointers ---
int parse_int(char* str);
char* version();
int tan_with_errcode(double input, double* output);

// --- Array Operations ---
int sin_array(int input_len, double* inputs, double* outputs);
int cos_array(int input_len, double* inputs, double* outputs);
int tan_array(int input_len, double* inputs, double* outputs);

// --- Structs and Complex Types ---
typedef struct {
  double real;
  double img;
} Complex;

Complex* new_complex(double r, double i);
void multiply(Complex* a, Complex* b, Complex** result);
void init_n_complexes(int n, Complex** complex_array);

// --- Function Pointers ---
void for_each_complex(int n, Complex** arr, void (*call_back)(Complex*));

The Groundwork

Before writing any FFI code, we need to build the bridge between MoonBit and C code.

Compiling to Native

First, the MoonBit code needs to be compiled into native machine code. This can be done with the following command:

moon build --target native

This command will compile your MoonBit project into C code and then use the system's C compiler (like GCC or Clang) to compile it into a final executable. The compiled C files are located in the target/native/release/build/ directory, stored in subdirectories corresponding to the package name. For example, main/main.mbt will be compiled to target/native/release/build/main/main.c.

Configuring Linkage

Compilation alone is not enough. We also need to tell the MoonBit compiler how to find and link to our mymath library. This is configured in the project's moon.pkg.json file.

{
  "supported-targets": ["native"],
  "link": {
    "native": {
      "cc": "clang",
      "cc-flags": "-I/usr/include",
      "cc-link-flags": "-L/usr/lib -lmymath"
    }
  }
}
  • cc: Specifies the compiler to use for C code, e.g., clang or gcc.
  • cc-flags: Flags needed when compiling C files, typically used to specify header search paths (-I).
  • cc-link-flags: Flags needed during linking, typically used to specify library search paths (-L) and the specific libraries to link (-l).

We also need a "glue" C file, which we'll name cwrap.c, to include the C library's header file and MoonBit's runtime header file.

// cwrap.c
#include <mymath.h>
#include <moonbit.h>

This glue file also needs to be declared to the MoonBit compiler via moon.pkg.json:

{
  // ... other configurations
  "native-stub": ["cwrap.c"]
}

With these configurations in place, our project is ready to link with the mymath library.

The First FFI Call

With everything set up, let's make our first true cross-language call. To declare an external C function in MoonBit, the syntax is as follows:

extern "C" fn moonbit_function_name(arg: Type) -> ReturnType = "c_function_name"
  • extern "C": Tells the MoonBit compiler that this is an external C function.
  • moonbit_function_name: The function name used in the MoonBit code.
  • "c_function_name": The name of the C function to link to.

Let's try it out with the simplest function in mymath.h, version_major:

extern "C" fn version_major() -> 
Int
Int
= "version_major"

Note: MoonBit has powerful Dead Code Elimination (DCE). If you only declare the FFI function above but never actually call it in your code (e.g., in the main function), the compiler will consider it unused code and will not include its declaration in the final generated C code. So, make sure you call it at least once!

The real challenge lies in handling the data type differences between the two languages. For some complex type situations, readers will need some C language knowledge.

3.1 Basic Types

For basic numeric types, there is a direct and clear correspondence between MoonBit and C.

MoonBit TypeC TypeNotes
Intint32_t
Int64int64_t
UIntuint32_t
UInt64uint64_t
Floatfloat
Doubledouble
Boolint32_tThe C standard does not have a native bool, int32_t (0/1) is usually used.
Unitvoid (return value)Used to represent that a C function has no return value.
Byteuint8_t

Based on this table, we can easily write FFI declarations for most of the simple functions in mymath.h:

extern "C" fn print_version() -> 
Unit
Unit
= "print_version"
extern "C" fn version_major() ->
Int
Int
= "version_major"
// The return value is semantically a boolean, using MoonBit's Bool type is clearer extern "C" fn is_normal(input:
Double
Double
) ->
Bool
Bool
= "is_normal"
extern "C" fn sinf(input:
Float
Float
) ->
Float
Float
= "sinf"
extern "C" fn cosf(input:
Float
Float
) ->
Float
Float
= "cosf"
extern "C" fn tanf(input:
Float
Float
) ->
Float
Float
= "tanf"
extern "C" fn sin(input:
Double
Double
) ->
Double
Double
= "sin"
extern "C" fn cos(input:
Double
Double
) ->
Double
Double
= "cos"
extern "C" fn tan(input:
Double
Double
) ->
Double
Double
= "tan"

3.2 Strings

Things get interesting when we encounter strings. You might instinctively map C's char* to MoonBit's String, but this is a common pitfall.

MoonBit's String and C's char* have completely different memory layouts. char* is a pointer to a -terminated sequence of bytes, while MoonBit's String is a GC-managed, complex object containing length information and UTF-16 encoded data.

Passing Arguments: From MoonBit to C

When we need to pass a MoonBit string to a C function that accepts a char* (like parse_int), we need to perform a manual conversion. A recommended approach is to convert it to the Bytes type.

// A helper function to convert a MoonBit String to the null-terminated byte array expected by C
fn 
(s : String) -> Bytes
string_to_c_bytes
(
String
s
:
String
String
) ->
Bytes
Bytes
{
let mut
Array[Byte]
arr
=
String
s
.
(self : String) -> Bytes

String holds a sequence of UTF-16 code units encoded in little endian format

to_bytes
().
(self : Bytes) -> Array[Byte]

Converts a bytes sequence into an array of bytes.

Parameters:

  • bytes : A sequence of bytes to be converted into an array.

Returns an array containing the same bytes as the input sequence.

Example:

  let bytes = b"hello"
  let arr = bytes.to_array()
  inspect(arr, content="[b'\\x68', b'\\x65', b'\\x6C', b'\\x6C', b'\\x6F']")
to_array
()
// Ensure it's null-terminated if
Array[Byte]
arr
.
(self : Array[Byte]) -> Byte?

Returns the last element of the array, or None if the array is empty.

Parameters:

  • array : The array to get the last element from.

Returns an optional value containing the last element of the array. The result is None if the array is empty, or Some(x) where x is the last element of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr.last(), content="Some(3)")
  let empty : Array[Int] = []
  inspect(empty.last(), content="None")
last
()
(x : Byte?, y : Byte?) -> Bool
!=
(Byte) -> Byte?
Some
(0) {
Array[Byte]
arr
.
(self : Array[Byte], value : Byte) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(0)
}
(Array[Byte]) -> Bytes
Bytes::
(arr : Array[Byte]) -> Bytes

Creates a new bytes sequence from a byte array.

Parameters:

  • array : An array of bytes to be converted.

Returns a new bytes sequence containing the same bytes as the input array.

Example:

  let arr = [b'h', b'i']
  let bytes = @bytes.from_array(arr)
  inspect(
    bytes, 
    content=(
      #|b"\x68\x69"
    ),
  )
from_array
(
Array[Byte]
arr
)
} // FFI declaration, note the parameter type is Bytes #borrow(s) // Tell the compiler we are just borrowing s, don't increase its reference count extern "C" fn __parse_int(s:
Bytes
Bytes
) ->
Int
Int
= "parse_int"
// Wrap it in a user-friendly MoonBit function fn
(str : String) -> Int
parse_int
(
String
str
:
String
String
) ->
Int
Int
{
let
Bytes
s
=
(s : String) -> Bytes
string_to_c_bytes
(
String
str
)
(s : Bytes) -> Int
__parse_int
(
Bytes
s
)
}

The #borrow Annotation The borrow annotation is an optimization hint. It tells the compiler that the C function only "borrows" this parameter and will not take ownership of it. This can avoid unnecessary reference counting operations and prevent potential memory leaks.

Return Values: From C to MoonBit

Conversely, when a C function returns a char* (like version), the situation is more complex. We absolutely must not declare it to return Bytes or String directly:

// Incorrect!
extern "C" fn version() -> 
Bytes
Bytes
= "version"

This is because the C function returns a raw pointer, which lacks the header information required by the MoonBit GC. A direct conversion like this will lead to a runtime crash.

The correct approach is to treat the returned char* as an opaque handle, and then write a conversion function in the C "glue" code to manually convert it into a valid MoonBit string.

MoonBit side:

// 1. Declare an external type to represent the C string pointer
#extern
type CStr

// 2. Declare an FFI function that calls the C wrapper
extern "C" fn 
type CStr
CStr
::to_string(self:
type CStr
Self
) ->
String
String
= "cstr_to_moonbit_str"
// 3. Declare the original C function, which returns our opaque type extern "C" fn __version() ->
type CStr
CStr
= "version"
// 4. Wrap it in a safe MoonBit function fn
() -> String
version
() ->
String
String
{
() -> CStr
__version
().
(self : CStr) -> String
to_string
()
}

C side (add to cwrap.c):

#include <string.h> // for strlen

// This function is responsible for correctly converting a char* to a moonbit_string_t with a GC header
moonbit_string_t cstr_to_moonbit_str(char *ptr) {
  if (ptr == NULL) {
    return moonbit_make_string(0, 0);
  }
  int32_t len = strlen(ptr);
  // moonbit_make_string allocates a MoonBit string object with a GC header
  moonbit_string_t ms = moonbit_make_string(len, 0);
  for (int i = 0; i < len; i++) {
    ms[i] = (uint16_t)ptr[i]; // Assuming ASCII compatibility
  }
  // Note: Whether to free(ptr) depends on the C library's API contract.
  // If the memory returned by version() needs to be freed by the caller, it should be freed here.
  return ms;
}

This pattern, while a bit cumbersome at first glance, ensures memory safety and is the standard way to handle C string return values.

3.3 The Art of Pointers: Passing by Reference and Arrays

C extensively uses pointers for "output parameters" and passing arrays. MoonBit provides specialized types for this.

"Output" Parameters for a Single Value

When a C function uses a pointer to return an additional value, like tan_with_errcode(double input, double* output), MoonBit uses the Ref[T] type.

extern "C" fn tan_with_errcode(input: 
Double
Double
, output:
struct Ref[A] {
  mut val: A
}
Ref
[
Double
Double
]) ->
Int
Int
= "tan_with_errcode"

Ref[T] in MoonBit is a struct containing a single field of type T. When passed to C, MoonBit passes the address of this struct. From C's perspective, a pointer to struct { T val; } is equivalent in memory address to a pointer to T, so it works directly.

Arrays: Passing Collections of Data

When a C function needs to process an array (e.g., double* inputs), MoonBit uses the FixedArray[T] type. FixedArray[T] is a contiguous block of T elements in memory, and its pointer can be passed directly to C.

extern "C" fn sin_array(len: 
Int
Int
, inputs:
type FixedArray[A]
FixedArray
[
Double
Double
], outputs:
type FixedArray[A]
FixedArray
[
Double
Double
]) ->
Int
Int
= "sin_array"
extern "C" fn cos_array(len:
Int
Int
, inputs:
type FixedArray[A]
FixedArray
[
Double
Double
], outputs:
type FixedArray[A]
FixedArray
[
Double
Double
]) ->
Int
Int
= "cos_array"
extern "C" fn tan_array(len:
Int
Int
, inputs:
type FixedArray[A]
FixedArray
[
Double
Double
], outputs:
type FixedArray[A]
FixedArray
[
Double
Double
]) ->
Int
Int
= "tan_array"

3.4 External Types: Embracing Opaque C Structs

For C structs, like Complex, the best practice is usually to treat it as an "Opaque Type". We only create a reference (or handle) to it in MoonBit, without caring about its internal fields.

This is achieved with the #extern type syntax:

#extern
type Complex

This declaration tells MoonBit: "There is an external type named Complex. You don't need to know its internal structure, just treat it as a pointer-sized handle." In the generated C code, the Complex type will be treated as void*. This is usually safe because all operations on Complex are done within the C library, and the MoonBit side is only responsible for passing the pointer.

Based on this principle, we can write FFIs for the Complex-related functions in mymath.h:

// C: Complex* new_complex(double r, double i);
// Returns a pointer to Complex, which is a Complex handle in MoonBit
extern "C" fn new_complex(r: 
Double
Double
, i:
Double
Double
) ->
type Complex
Complex
= "new_complex"
// C: void multiply(Complex* a, Complex* b, Complex** result); // Complex* corresponds to Complex, and Complex** corresponds to Ref[Complex] extern "C" fn multiply(a:
type Complex
Complex
, b:
type Complex
Complex
, res:
struct Ref[A] {
  mut val: A
}
Ref
[
type Complex
Complex
]) ->
Unit
Unit
= "multiply"
// C: void init_n_complexes(int n, Complex** complex_array); // Complex** is used as an array here, corresponding to FixedArray[Complex] extern "C" fn init_n_complexes(n:
Int
Int
, complex_array:
type FixedArray[A]
FixedArray
[
type Complex
Complex
]) ->
Unit
Unit
= "init_n_complexes"

Best Practice: Encapsulate Raw FFIs Directly exposing FFI functions can be confusing for users (e.g., Ref and FixedArray). It is strongly recommended to build a more user-friendly API for MoonBit users on top of the FFI declarations.

// Define methods on the Complex type to hide FFI details
fn 
type Complex
Complex
::
(self : Complex, other : Complex) -> Complex
mul
(
Complex
self
:
type Complex
Complex
,
Complex
other
:
type Complex
Complex
) ->
type Complex
Complex
{
// Create a temporary Ref to receive the result let
Ref[Complex]
res
:
struct Ref[A] {
  mut val: A
}
Ref
[
type Complex
Complex
] =
struct Ref[A] {
  mut val: A
}
Ref
::{
Complex
val
:
(r : Double, i : Double) -> Complex
new_complex
(0, 0) }
(a : Complex, b : Complex, res : Ref[Complex]) -> Unit
multiply
(
Complex
self
,
Complex
other
,
Ref[Complex]
res
)
Ref[Complex]
res
.
Complex
val
// Return the result
} fn
(n : Int) -> Array[Complex]
init_n
(
Int
n
:
Int
Int
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
type Complex
Complex
] {
// Use FixedArray::make to create the array let
FixedArray[Complex]
arr
=
type FixedArray[A]
FixedArray
::
(len : Int, init : Complex) -> FixedArray[Complex]

Creates a new fixed-size array with the specified length, initializing all elements with the given value.

Parameters:

  • length : The length of the array to create. Must be non-negative.
  • initial_value : The value used to initialize all elements in the array.

Returns a new fixed-size array of type FixedArray[T] with length elements, where each element is initialized to initial_value.

Throws a panic if length is negative.

Example:

  let arr = FixedArray::make(3, 42)
  inspect(arr[0], content="42")
  inspect(arr.length(), content="3")

WARNING: A common pitfall is creating with the same initial value, for example:

  let two_dimension_array = FixedArray::make(10, FixedArray::make(10, 0))
  two_dimension_array[0][5] = 10
  assert_eq(two_dimension_array[5][5], 10)

This is because all the cells reference to the same object (the FixedArray[Int] in this case). One should use makei() instead which creates an object for each index.

make
(
Int
n
,
(r : Double, i : Double) -> Complex
new_complex
(0, 0))
(n : Int, complex_array : FixedArray[Complex]) -> Unit
init_n_complexes
(
Int
n
,
FixedArray[Complex]
arr
)
// Convert FixedArray to the more user-friendly Array
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
(FixedArray[Complex]) -> Array[Complex]

Creates a new dynamic array from a fixed-size array.

Parameters:

  • arr : The fixed-size array to convert. The elements of this array will be copied to the new array.

Returns a new dynamic array containing all elements from the input fixed-size array.

Example:

  let fixed = FixedArray::make(3, 42)
  let dynamic = Array::from_fixed_array(fixed)
  inspect(dynamic, content="[42, 42, 42]")
from_fixed_array
(
FixedArray[Complex]
arr
)
}

3.5 Function Pointers: When C Needs to Call Back

The most complex function in mymath.h is for_each_complex, which takes a function pointer as an argument.

void for_each_complex(int n, Complex** arr, void (*call_back)(Complex*));

A common misconception is to try to map MoonBit's closure type (Complex) -> Unit directly to a C function pointer. This is not possible because a MoonBit closure is internally a struct with two parts: a pointer to the actual function code, and a pointer to its captured environment data.

To pass a pure, environment-free function pointer, MoonBit provides the FuncRef type:

extern "C" fn for_each_complex(
  n: 
Int
Int
,
arr:
type FixedArray[A]
FixedArray
[
type Complex
Complex
],
call_back: FuncRef[(
type Complex
Complex
) ->
Unit
Unit
] // Use FuncRef to wrap the function type
) ->
Unit
Unit
= "for_each_complex"

Any function type wrapped in FuncRef will be converted to a standard C function pointer when passed to C.

How to declare a FuncRef? Just use let. As long as the function does not capture external variables, the declaration will succeed.

fn 
(c : Complex) -> Unit
print_complex
(
Complex
c
:
type Complex
Complex
) ->
Unit
Unit
{ ... }
fn main { let
FuncRef[(Complex) -> Unit]
print_complex
FuncRef[(Complex) -> Unit]
: FuncRef[(
type Complex
Complex
FuncRef[(Complex) -> Unit]
) ->
Unit
Unit
FuncRef[(Complex) -> Unit]
]
= (
Complex
c
) =>
(c : Complex) -> Unit
print_complex
(
Complex
c
)
// ... }

Advanced Topic: GC Management

We have covered most of the type conversion issues, but there is still a very important issue: memory management. C relies on manual malloc/free, while MoonBit has automatic garbage collection (GC). When a C library creates an object (like new_complex), who is responsible for freeing it?

Can we do without GC?

Some library authors may choose not to implement GC, leaving all destruction operations to the user. This approach has its merits in some libraries, such as some high-performance computing libraries, graphics libraries, etc. To improve performance or stability, they may abandon some GC features, but this raises the bar for programmers. Most libraries still need to provide GC to enhance the user experience.

Ideally, we want MoonBit's GC to automatically manage the lifecycle of these C objects. MoonBit provides two mechanisms to achieve this.

4.1 The Simple Case

If the C struct is very simple and you are sure that its memory layout is stable across all platforms, you can redefine it directly in MoonBit.

// mymath.h: typedef struct { double real; double img; } Complex;
// MoonBit:
struct Complex {
  r: Double,
  i: Double
}

By doing this, Complex becomes a true MoonBit object. The MoonBit compiler will automatically manage its memory and add a GC header. When you pass it to a C function, MoonBit will pass a pointer to its data part, which is usually feasible.

But this method has significant limitations:

  • It requires you to know the exact memory layout, alignment, etc., of the C struct, which can be fragile.
  • If a C function returns a Complex*, you cannot use it directly. You must, like handling string return values, write a C wrapper function to copy the data from the C struct into a newly created MoonBit Complex object with a GC header.

Therefore, this method is only suitable for the simplest cases. For most scenarios, we recommend a more robust finalizer solution.

4.2 The Complex Situation: Using Finalizers

This is a more general and safer method. The core idea is to create a MoonBit object to "wrap" the C pointer and tell the MoonBit GC that when this wrapper object is collected, a specific C function (a finalizer) should be called to release the underlying C pointer.

This process involves several steps:

1. Declare two types in MoonBit

#extern
type C_Complex // Represents the raw, opaque C pointer

type Complex C_Complex // A MoonBit type that wraps a C_Complex internally

type Complex C_Complex is a special declaration that creates a MoonBit object type named Complex, which has an internal field of type C_Complex. We can access this internal field with the .inner() method.

2. Provide a finalizer and wrapper functions in C

We need a C function to free the Complex object, and a function to create our GC-enabled MoonBit wrapper object.

C side (add to cwrap.c):

// The mymath library should provide a function to free Complex, let's assume it's free_complex
// void free_complex(Complex* c);

// We need a void* version of the finalizer for the MoonBit GC to use
void free_complex_finalizer(void* obj) {
    // The layout of a MoonBit external object is { void (*finalizer)(void*); T data; }
    // We need to extract the real Complex pointer from obj
    // Assuming the MoonBit Complex wrapper has only one field
    Complex* c_obj = *((Complex**)obj);
    free_complex(c_obj); // Call the real finalizer, if provided by the mymath library
    // free(c_obj); // If it was allocated with standard malloc
}

// Define what the MoonBit Complex wrapper looks like in C
typedef struct {
  Complex* val;
} MoonBit_Complex;

// Function to create the MoonBit wrapper object
MoonBit_Complex* new_mbt_complex(Complex* c_complex) {
  // `moonbit_make_external_obj` is the key
  // It creates a GC-managed external object and registers its finalizer.
  MoonBit_Complex* mbt_complex = moonbit_make_external_obj(
      &free_complex_finalizer,
      sizeof(MoonBit_Complex)
  );
  mbt_complex->val = c_complex;
  return mbt_complex;
}

3. Use the wrapper function in MoonBit

Now, instead of calling new_complex directly, we call our wrapper function new_mbt_complex.

// FFI declaration pointing to our C wrapper function
extern "C" fn __new_managed_complex(c_complex: 
type C_Complex
C_Complex
) ->
type Complex
Complex
= "new_mbt_complex"
// The original C new_complex function returns a raw pointer extern "C" fn __new_unmanaged_complex(r:
Double
Double
, i:
Double
Double
) ->
type C_Complex
C_Complex
= "new_complex"
// The final, safe, GC-friendly new function provided to the user fn
type Complex
Complex
::
(r : Double, i : Double) -> Complex
new
(
Double
r
:
Double
Double
,
Double
i
:
Double
Double
) ->
type Complex
Complex
{
let
C_Complex
c_ptr
=
(r : Double, i : Double) -> C_Complex
__new_unmanaged_complex
(
Double
r
,
Double
i
)
(c_complex : C_Complex) -> Complex
__new_managed_complex
(
C_Complex
c_ptr
)
}

Now, when an object created by Complex::new is no longer used in MoonBit, the GC will automatically call free_complex_finalizer, safely freeing the memory allocated by the C library.

When we need to pass our managed Complex object to other C functions, we just use the .inner() method:

// Assume there is a C function `double length(Complex*);`
extern "C" fn length(c_complex: 
type C_Complex
C_Complex
) ->
Double
Double
= "length"
fn
type Complex
Complex
::
(self : Complex) -> Double
length
(
Complex
self
:
type Complex
Self
) ->
Double
Double
{
// self.inner() returns the internal C_Complex (i.e., the C pointer)
(c_complex : C_Complex) -> Double
length
(
Complex
self
.
() -> C_Complex
inner
())
}

Conclusion

This article has guided you through the process of C-FFI in MoonBit, from basic types to complex struct types and function pointer types. Finally, it discussed the GC problem of MoonBit managing C objects. We hope this will be helpful for the library development of our readers.

Dancing with LLVM: A Moonbit Chronicle (Part 2) - LLVM Backend Generation

· 17 min read


Introduction

In the process of programming language design, the frontend is responsible for understanding and verifying the structure and semantics of a program, while the compiler backend takes on the task of translating these abstract concepts into executable machine code. The implementation of the backend not only requires a deep understanding of the target architecture but also mastery of complex optimization techniques to generate efficient code.

LLVM (Low Level Virtual Machine), as a comprehensive modern compiler infrastructure, provides us with a powerful and flexible solution. By converting a program into LLVM Intermediate Representation (IR), we can leverage LLVM's mature toolchain to compile the code to various target architectures, including RISC-V, ARM, and x86.

Moonbit's LLVM Ecosystem

Moonbit officially provides two important LLVM-related projects:

  • llvm.mbt: Moonbit language bindings for the original LLVM, providing direct access to the llvm-c interface. It requires the installation of the complete LLVM toolchain, can only generate for native backends, and requires you to handle compilation and linking yourself, but it can generate IR that is fully compatible with the original LLVM.
  • MoonLLVM: A pure Moonbit implementation of an LLVM-like system. It can generate LLVM IR without external dependencies and supports JavaScript and WebAssembly backends.

This article chooses llvm.mbt as our tool. Its API design is inspired by the highly acclaimed inkwell library in the Rust ecosystem.

In the previous article, "Dancing with LLVM: A Moonbit Chronicle (Part 1) - Implementing the Frontend," we completed the conversion from source code to a typed abstract syntax tree. This article will build on that achievement, focusing on the core techniques and implementation details of code generation.


Chapter 1: Representing the LLVM Type System in Moonbit

Before diving into code generation, we first need to understand how llvm.mbt represents LLVM's various concepts within Moonbit's type system. LLVM's type system is quite complex, containing multiple levels such as basic types, composite types, and function types.

Trait Objects: An Abstract Representation of Types

In the API design of llvm.mbt, you will frequently encounter the core concept of &Type. This is not a concrete struct or enum, but a Trait Object—which can be understood as the functional equivalent of an abstract base class in object-oriented programming.

// &Type is a trait object representing any LLVM type
let 
Unit
some_type
: &Type =
Unit
context
.
() -> Unit
i32_type
()

Type Identification and Conversion

To determine the specific type of a &Type, we need to perform a runtime type check using the as_type_enum interface:

pub fn 
(ty : Unit) -> String
identify_type
(
Unit
ty
: &Type) ->
String
String
{
match
Unit
ty
.
() -> Unit
as_type_enum
() {
(Unit) -> Unit
IntType
(
Unit
int_ty
) => "Integer type with \{
Unit
int_ty
.
() -> Unit
get_bit_width
()} bits"
(_/0) -> Unit
FloatType
(
_/0
float_ty
) => "Floating point type"
(_/0) -> Unit
PointerType
(
_/0
ptr_ty
) => "Pointer type"
(_/0) -> Unit
FunctionType
(
_/0
func_ty
) => "Function type"
(_/0) -> Unit
ArrayType
(
_/0
array_ty
) => "Array type"
(_/0) -> Unit
StructType
(
_/0
struct_ty
) => "Structure type"
(_/0) -> Unit
VectorType
(
_/0
vec_ty
) => "Vector type"
(_/0) -> Unit
ScalableVectorType
(
_/0
svec_ty
) => "Scalable vector type"
(_/0) -> Unit
MetadataType
(
_/0
meta_ty
) => "Metadata type"
} }

Safe Type Conversion Strategies

When we are certain that a &Type has a specific type, there are several conversion methods to choose from:

  1. Direct Conversion (for deterministic scenarios)

    let 
    Unit
    ty
    : &Type =
    Unit
    context
    .
    () -> Unit
    i32_type
    ()
    let
    ?
    i32_ty
    =
    Unit
    ty
    .
    () -> ?
    into_int_type
    () // Direct conversion, errors are handled by llvm.mbt
    let
    ?
    bit_width
    =
    ?
    i32_ty
    .
    () -> ?
    get_bit_width
    () // Call a method specific to IntType
  2. Defensive Conversion (recommended for production environments)

    let 
    Unit
    ty
    : &Type =
    () -> Unit
    get_some_type
    () // An unknown type obtained from somewhere
    guard ty.as_type_enum() is IntType(i32_ty) else { raise CodeGenError("Expected integer type, got \{ty}") } // Now it's safe to use i32_ty let
    ?
    bit_width
    =
    ?
    i32_ty
    .
    () -> ?
    get_bit_width
    ()

Constructing Composite Types

LLVM supports various composite types, which are usually constructed through methods of basic types:

pub fn 
(context : ?) -> Unit
create_composite_types
(
?
context
: @llvm.Context) ->
Unit
Unit
{
let
Unit
i32_ty
=
?
context
.
() -> Unit
i32_type
()
let
Unit
f64_ty
=
?
context
.
() -> Unit
f64_type
()
// Array type: [16 x i32] let
Unit
i32_array_ty
=
Unit
i32_ty
.
(Int) -> Unit
array_type
(16)
// Function type: i32 (i32, i32) let
Unit
add_func_ty
=
Unit
i32_ty
.
(Array[Unit]) -> Unit
fn_type
([
Unit
i32_ty
,
Unit
i32_ty
])
// Struct type: {i32, f64} let
Unit
struct_ty
=
?
context
.
(Array[Unit]) -> Unit
struct_type
([
Unit
i32_ty
,
Unit
f64_ty
])
// Pointer type (all pointers are opaque in LLVM 18+) let
Unit
ptr_ty
=
Unit
i32_ty
.
() -> Unit
ptr_type
()
// Output type information for verification
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Array type: \{
Unit
i32_array_ty
}") // [16 x i32]
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Function type: \{
Unit
add_func_ty
}") // i32 (i32, i32)
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Struct type: \{
Unit
struct_ty
}") // {i32, f64}
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Pointer type: \{
Unit
ptr_ty
}") // ptr
}

Important Reminder: Opaque Pointers

Starting with LLVM version 18, all pointer types use the opaque pointer design. This means that regardless of the type they point to, all pointers are represented as ptr in the IR, and the specific type information they point to is no longer visible in the type system.


Chapter 2: The LLVM Value System and the BasicValue Concept

Compared to the type system, LLVM's value system is more complex. llvm.mbt, consistent with inkwell, divides values into two important abstract layers: Value and BasicValue. The difference lies in distinguishing the source of value creation from the way values are used:

  • Value: Focuses on how a value is produced (e.g., constants, instruction results).
  • BasicValue: Focuses on what basic type a value has (e.g., integer, float, pointer).

Practical Application Example

pub fn 
(context : ?, builder : ?) -> Unit
demonstrate_value_system
(
?
context
: Context,
?
builder
: Builder) ->
Unit
Unit
{
let
Unit
i32_ty
=
?
context
.
() -> Unit
i32_type
()
// Create two integer constants - these are directly IntValue let
Unit
const1
=
Unit
i32_ty
.
(Int) -> Unit
const_int
(10) // Value: IntValue, BasicValue: IntValue
let
Unit
const2
=
Unit
i32_ty
.
(Int) -> Unit
const_int
(20) // Value: IntValue, BasicValue: IntValue
// Perform an addition operation - the result is an InstructionValue let
Unit
add_result
=
?
builder
.
(Unit, Unit) -> Unit
build_int_add
(
Unit
const1
,
Unit
const2
)
// In different contexts, we need different perspectives: // As an instruction to check its properties let
Unit
instruction
=
Unit
add_result
.
() -> Unit
as_instruction
()
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Instruction opcode: \{
Unit
instruction
.
() -> Unit
get_opcode
()}")
// As a basic value to get its type let
Unit
basic_value
=
Unit
add_result
.
() -> Unit
into_basic_value
()
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("Result type: \{
Unit
basic_value
.
() -> Unit
get_type
()}")
// As an integer value for subsequent calculations let
Unit
int_value
=
Unit
add_result
.
() -> Unit
into_int_value
()
let
Unit
final_result
=
?
builder
.
(Unit, Unit) -> Unit
build_int_mul
(
Unit
int_value
,
Unit
const1
)
}

Complete Classification of Value Types

  1. ValueEnum: All possible value types

    pub enum ValueEnum {
      
    (?) -> ValueEnum
    IntValue
    (IntValue) // Integer value
    (?) -> ValueEnum
    FloatValue
    (FloatValue) // Floating-point value
    (?) -> ValueEnum
    PointerValue
    (PointerValue) // Pointer value
    (?) -> ValueEnum
    StructValue
    (StructValue) // Struct value
    (?) -> ValueEnum
    FunctionValue
    (FunctionValue) // Function value
    (?) -> ValueEnum
    ArrayValue
    (ArrayValue) // Array value
    (?) -> ValueEnum
    VectorValue
    (VectorValue) // Vector value
    (?) -> ValueEnum
    PhiValue
    (PhiValue) // Phi node value
    (?) -> ValueEnum
    ScalableVectorValue
    (ScalableVectorValue) // Scalable vector value
    (?) -> ValueEnum
    MetadataValue
    (MetadataValue) // Metadata value
    (?) -> ValueEnum
    CallSiteValue
    (CallSiteValue) // Call site value
    (?) -> ValueEnum
    GlobalValue
    (GlobalValue) // Global value
    (?) -> ValueEnum
    InstructionValue
    (InstructionValue) // Instruction value
    } derive(
    trait Show {
      output(Self, &Logger) -> Unit
      to_string(Self) -> String
    }

    Trait for types that can be converted to String

    Show
    )
  2. BasicValueEnum: Values that have a basic type

    pub enum BasicValueEnum {
      
    (?) -> BasicValueEnum
    ArrayValue
    (ArrayValue) // Array value
    (?) -> BasicValueEnum
    IntValue
    (IntValue) // Integer value
    (?) -> BasicValueEnum
    FloatValue
    (FloatValue) // Floating-point value
    (?) -> BasicValueEnum
    PointerValue
    (PointerValue) // Pointer value
    (?) -> BasicValueEnum
    StructValue
    (StructValue) // Struct value
    (?) -> BasicValueEnum
    VectorValue
    (VectorValue) // Vector value
    (?) -> BasicValueEnum
    ScalableVectorValue
    (ScalableVectorValue) // Scalable vector value
    } derive(
    trait Show {
      output(Self, &Logger) -> Unit
      to_string(Self) -> String
    }

    Trait for types that can be converted to String

    Show
    )

💡 Best Practices for Value Conversion

In the actual code generation process, we often need to convert between different value perspectives:

pub fn 
(instruction_result : Unit) -> Unit
value_conversion_patterns
(
Unit
instruction_result
: &Value) ->
Unit
Unit
{
// Pattern 1: I know what type this is, convert directly let
Unit
int_val
=
Unit
instruction_result
.
() -> Unit
into_int_value
()
// Pattern 2: I just need a basic value, I don't care about the specific type let
Unit
basic_val
=
Unit
instruction_result
.
() -> Unit
into_basic_value
()
// Pattern 3: Defensive programming, check before converting match
Unit
instruction_result
.
() -> Unit
as_value_enum
() {
// Handle integer values
(Unit) -> Unit
IntValue
(
Unit
int_val
) =>
(Unit) -> Unit
handle_integer
(
Unit
int_val
)
// Handle float values
(Unit) -> Unit
FloatValue
(
Unit
float_val
) =>
(Unit) -> Unit
handle_float
(
Unit
float_val
)
_ => raise
Error
CodeGenError
("Unexpected value type")
} }

Through this two-layer abstraction, llvm.mbt maintains the integrity of the LLVM value system while providing an intuitive and easy-to-use interface for Moonbit developers.


Chapter 3: Practical LLVM IR Generation

Now that we understand the type and value systems, let's demonstrate how to use llvm.mbt to generate LLVM IR with a complete example. This example will implement a simple muladd function, showing the entire process from initialization to instruction generation.

Infrastructure Initialization

Any LLVM program begins by establishing three core components:

pub fn 
() -> (?, ?, ?)
initialize_llvm
() -> (Context, Module, Builder) {
// 1. Create an LLVM context - a container for all LLVM objects let
?
context
=
() -> ?
@llvm.Context::create
()
// 2. Create a module - a container for functions and global variables let
?
module
=
?
context
.
(String) -> ?
create_module
("demo_module")
// 3. Create an IR builder - used to generate instructions let
?
builder
=
?
context
.
() -> ?
create_builder
()
(
?
context
,
?
module
,
?
builder
)
}

A Simple Function Generation Example

Let's implement a function that calculates (a * b) + c:

pub fn 
() -> String
generate_muladd_function
() ->
String
String
{
// Initialize LLVM infrastructure let (
?
context
,
?
module
,
?
builder
) =
() -> (?, ?, ?)
initialize_llvm
()
// Define the function signature let
Unit
i32_ty
=
?
context
.
() -> Unit
i32_type
()
let
Unit
func_type
=
Unit
i32_ty
.
(Array[Unit]) -> Unit
fn_type
([
Unit
i32_ty
,
Unit
i32_ty
,
Unit
i32_ty
])
let
Unit
func_value
=
?
module
.
(String, Unit) -> Unit
add_function
("muladd",
Unit
func_type
)
// Create the function entry basic block let
Unit
entry_block
=
?
context
.
(Unit, String) -> Unit
append_basic_block
(
Unit
func_value
, "entry")
?
builder
.
(Unit) -> Unit
position_at_end
(
Unit
entry_block
)
// Get the function parameters let
Unit
arg_a
=
Unit
func_value
.
(Int) -> Unit
get_nth_param
(0).
() -> Unit
unwrap
().
() -> Unit
into_int_value
()
let
Unit
arg_b
=
Unit
func_value
.
(Int) -> Unit
get_nth_param
(1).
() -> Unit
unwrap
().
() -> Unit
into_int_value
()
let
Unit
arg_c
=
Unit
func_value
.
(Int) -> Unit
get_nth_param
(2).
() -> Unit
unwrap
().
() -> Unit
into_int_value
()
// Generate calculation instructions let
Unit
mul_result
=
?
builder
.
(Unit, Unit) -> Unit
build_int_mul
(
Unit
arg_a
,
Unit
arg_b
).
() -> Unit
into_int_value
()
let
Unit
add_result
=
?
builder
.
(Unit, Unit) -> Unit
build_int_add
(
Unit
mul_result
,
Unit
arg_c
)
// Generate the return instruction let _ =
?
builder
.
(Unit) -> Unit
build_return
(
Unit
add_result
)
// Output the generated IR
?
module
.
() -> String
dump
()
}

Generated LLVM IR

Running the above code will produce the following LLVM Intermediate Representation:

; ModuleID = 'demo_module'
source_filename = "demo_module"

define i32 @muladd(i32 %0, i32 %1, i32 %2) {
entry:
  %3 = mul i32 %0, %1
  %4 = add i32 %3, %2
  ret i32 %4
}

💡 Code Generation Best Practices

  1. Naming Conventions

    For instructions that return a value, the build interface has a name label argument, which can be used to add a name to the result of the instruction.

    let 
    ?
    mul_result
    =
    Unit
    builder
    .
    (Unit, Unit, String) -> ?
    build_int_mul
    (
    Unit
    lhs
    ,
    Unit
    rhs
    ,
    String
    name
    ="temp_product")
    let
    ?
    final_result
    =
    Unit
    builder
    .
    (?, Unit, String) -> ?
    build_int_add
    (
    ?
    mul_result
    ,
    Unit
    offset
    ,
    String
    name
    ="final_sum")
  2. Error Handling

    Use raise instead of panic for error handling, and manage exceptions for situations that are not easy to determine directly.

    // Check for operations that might fail
    match func_value.get_nth_param(index) {
      Some(param) => param.into_int_value()
      None => raise CodeGenError("Function parameter \{index} not found")
    }
    

Chapter 4: TinyMoonbit Compiler Implementation

Now let's turn our attention to the actual compiler implementation, converting the abstract syntax tree we built in the previous article into LLVM IR.

Type Mapping: From Parser to LLVM

First, we need to establish a mapping between the TinyMoonbit type system and the LLVM type system:

pub struct CodeGen {
  
?
parser_program
: Program // AST representation of the source program
?
llvm_context
: @llvm.Context // LLVM context
?
llvm_module
: @llvm.Module // LLVM module
?
builder
: @llvm.Builder // IR builder
Map[String, ?]
llvm_functions
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[
String
String
, @llvm.FunctionValue] // Function map
} pub fn
(?, ?) -> Unit raise
convert_type
(
?
self
: Self,
?
parser_type
: Type) -> &@llvm.Type raise {
match
?
parser_type
{
Type::
?
Unit
=>
?
self
Unit
.
?
llvm_context
Unit
.
() -> Unit
void_type
Unit
() as &@llvm.Type
Type::
?
Bool
=>
?
self
.
?
llvm_context
.
() -> Unit
bool_type
()
Type::
?
Int
=>
?
self
.
?
llvm_context
.
() -> Unit
i32_type
()
Type::
?
Double
=>
?
self
.
?
llvm_context
.
() -> Unit
f64_type
()
// Can be extended with more types as needed } }

Environment Management: Mapping Variables to Values

During the code generation phase, we need to maintain a mapping from variable names to LLVM values:

pub struct Env {
  
Env?
parent
:
struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env
? // Reference to the parent environment
Map[String, Unit]
symbols
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[
String
String
, &@llvm.Value] // Local variable map
// Global information
CodeGen
codegen
:
struct CodeGen {
  parser_program: ?
  llvm_context: ?
  llvm_module: ?
  builder: ?
  llvm_functions: Map[String, ?]
}
CodeGen
// Reference to the code generator
?
parser_function
: Function // AST of the current function
?
llvm_function
: @llvm.FunctionValue // LLVM representation of the current function
} pub fn
(?, String) -> Unit?
get_symbol
(
?
self
: Self,
String
name
:
String
String
) -> &@llvm.Value? {
match
?
self
.
Map[String, Unit]
symbols
.
(self : Map[String, Unit], key : String) -> Unit?

Retrieves the value associated with a given key in the hash map.

Parameters:

  • self : The hash map to search in.
  • key : The key to look up in the map.

Returns Some(value) if the key exists in the map, None otherwise.

Example:

  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get
(
String
name
) {
(Unit) -> Unit?
Some
(
Unit
value
) =>
(Unit) -> Unit?
Some
(
Unit
value
)
Unit?
None
=>
match
?
self
.
Env?
parent
{
(Env) -> Env?
Some
(
Env
parent_env
) =>
Env
parent_env
.
(String) -> Unit?
get_symbol
(
String
name
)
Env?
None
=>
Unit?
None
} } }

Variable Handling: Memory Allocation Strategy

As a systems-level language, TinyMoonbit supports variable reassignment. In LLVM IR's SSA (Static Single Assignment) form, we need to use the alloca + load/store pattern to implement mutable variables:

pub fn Stmt::
(?, Env) -> Unit raise
emit
(
?
self
: Self,
Env
env
:
struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env
) ->
Unit
Unit
raise {
match
?
self
{
// Variable declaration: e.g., let x : Int = 5;
(String, Unit, Unit) -> ?
Let
(
String
var_name
,
Unit
var_type
,
Unit
init_expr
) => {
// Convert the type and allocate stack space let
Unit
llvm_type
=
Env
env
.
CodeGen
codegen
.
(Unit) -> Unit
convert_type
(
Unit
var_type
)
let
Unit
alloca
=
Env
env
.
CodeGen
codegen
.
?
builder
.
(Unit, String) -> Unit
build_alloca
(
Unit
llvm_type
,
String
var_name
)
// Record the allocated pointer in the symbol table
Env
env
.
Map[String, Unit]
symbols
.
(self : Map[String, Unit], key : String, value : Unit) -> Unit

Sets a key-value pair into the hash map. If the key already exists, updates its value. If the hash map is near full capacity, automatically grows the internal storage to accommodate more entries.

Parameters:

  • map : The hash map to modify.
  • key : The key to insert or update. Must implement Hash and Eq traits.
  • value : The value to associate with the key.

Example:

  let map : Map[String, Int] = Map::new()
  map.set("key", 42)
  inspect(map.get("key"), content="Some(42)")
  map.set("key", 24) // update existing key
  inspect(map.get("key"), content="Some(24)")
set
(
String
var_name
,
Unit
alloca
Unit
as &@llvm.Value
)
// Calculate the value of the initialization expression let
Unit
init_value
=
Unit
init_expr
.
(Env) -> Unit
emit
(
Env
env
).
() -> Unit
into_basic_value
()
// Store the initial value into the allocated memory let _ =
Env
env
.
CodeGen
codegen
.
?
builder
.
(Unit, Unit) -> Unit
build_store
(
Unit
alloca
,
Unit
init_value
)
} // Variable assignment: x = 10;
(Unit, Unit) -> ?
Assign
(
Unit
var_name
,
Unit
rhs_expr
) => {
// Get the memory address of the variable from the symbol table guard let
(_/0) -> Unit
Some
(
_/0
var_ptr
) =
Env
env
.
(Unit) -> Unit
get_symbol
(
Unit
var_name
) else {
raise
Error
CodeGenError
("Undefined variable: \{
Unit
var_name
}")
} // Calculate the value of the right-hand side expression let
Unit
rhs_value
=
Unit
rhs_expr
.
(Env) -> Unit
emit
(
Env
env
).
() -> Unit
into_basic_value
()
// Store the new value into the variable's memory let _ =
Env
env
.
CodeGen
codegen
.
?
builder
.
(Unit, Unit) -> Unit
build_store
(
Unit
var_ptr
,
Unit
rhs_value
)
} // Other statement types... _ => { /* Handle other statements */ } } }

Design Decision: Why use alloca?

In functional languages, immutable variables can be directly mapped to SSA values. However, TinyMoonbit supports variable reassignment, which conflicts with the SSA principle of "each variable is assigned only once."

The alloca + load/store pattern is the standard way to handle mutable variables:

  • alloca: Allocates memory space on the stack.
  • store: Writes a value to memory.
  • load: Reads a value from memory.

LLVM's optimization process will automatically convert simple allocas back to value form (the mem2reg optimization).

Expression Code Generation

Expression code generation is relatively straightforward, mainly involving calling the corresponding instruction-building methods based on the expression type:

fn Expr::
(?, Env) -> Unit raise
emit
(
?
self
: Self,
Env
env
:
struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env
) -> &@llvm.Value raise {
match
?
self
{
(Unit) -> ?
AtomExpr
(
Unit
atom_expr
, ..) =>
Unit
atom_expr
.
(Env) -> Unit
emit
(
Env
env
)
(String, Unit, _/0) -> ?
Unary
("-",
Unit
expr
,
_/0
ty
=
(_/0) -> _/0
Some
(
_/0
Int
)) => {
let
Unit
value
=
Unit
expr
.
() -> Unit
emit
().
() -> Unit
into_int_value
()
let
Unit
zero
=
Env
env
.
Unit
gen
.
Unit
llvm_ctx
.
() -> Unit
i32_type
().
() -> Unit
const_zero
()
Env
env
.
Unit
gen
.
?
builder
.
(Unit, Unit) -> Unit
build_int_sub
(
Unit
zero
,
Unit
value
)
}
(String, Unit, _/0) -> ?
Unary
("-",
Unit
expr
,
_/0
ty
=
(_/0) -> _/0
Some
(
_/0
Double
)) => {
let
Unit
value
=
Unit
expr
.
() -> Unit
emit
().
() -> Unit
into_float_value
()
Env
env
.
Unit
gen
.
?
builder
.
(Unit) -> Unit
build_float_neg
(
Unit
value
)
}
(String, Unit, Unit, _/0) -> ?
Binary
("+",
Unit
lhs
,
Unit
rhs
,
_/0
ty
=
(_/0) -> _/0
Some
(
_/0
Int
)) => {
let
Unit
lhs_val
=
Unit
lhs
.
() -> Unit
emit
().
() -> Unit
into_int_value
()
let
Unit
rhs_val
=
Unit
rhs
.
() -> Unit
emit
().
() -> Unit
into_int_value
()
Env
env
.
Unit
gen
.
?
builder
.
(Unit, Unit) -> Unit
build_int_add
(
Unit
lhs_val
,
Unit
rhs_val
)
} // ... others } }

Technical Detail: Floating-Point Negation

Note that when handling floating-point negation, we use build_float_neg instead of subtracting the operand from zero. This is because:

  1. IEEE 754 Standard: Floating-point numbers have special values (like NaN, ∞), and simple subtraction might produce incorrect results.
  2. Performance Considerations: Dedicated negation instructions are usually more efficient on modern processors.
  3. Precision Guarantee: Avoids unnecessary rounding errors.

Chapter 5: Implementation of Control Flow Instructions

Control flow is the backbone of program logic, including conditional branches and loop structures. In LLVM IR, control flow is implemented through Basic Blocks and branch instructions. Each basic block represents a sequence of instructions with no internal jumps, and blocks are connected by branch instructions.

Conditional Branches: Implementing if-else Statements

Conditional branches require creating multiple basic blocks to represent different execution paths:

fn Stmt::
(?, Env) -> Unit raise
emit
(
?
self
: Self,
Env
env
:
struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env
) ->
Unit
Unit
raise {
let
Unit
ctx
=
Env
env
.
Unit
gen
.
Unit
llvm_ctx
let
Unit
func
=
Env
env
.
Unit
llvm_func
let
?
builder
=
Env
env
.
Unit
gen
.
?
builder
match
?
self
{
(Unit, Unit, Unit) -> ?
If
(
Unit
cond
,
Unit
then_stmts
,
Unit
else_stmts
) => {
let
Unit
cond_val
=
Unit
cond
.
(Env) -> Unit
emit
(
Env
env
).
() -> Unit
into_int_value
()
// Create three basic blocks let
Unit
then_block
=
Unit
ctx
.
(Unit) -> Unit
append_basic_block
(
Unit
llvm_func
)
let
Unit
else_block
=
Unit
ctx
.
(Unit) -> Unit
append_basic_block
(
Unit
llvm_func
)
let
Unit
merge_block
=
Unit
ctx
.
(Unit) -> Unit
append_basic_block
(
Unit
llvm_func
)
// Create the jump instruction let _ =
?
builder
.
(Unit, Unit, Unit) -> Unit
build_conditional_branch
(
Unit
cond_val
,
Unit
then_block
,
Unit
else_block
,
) // Generate code for the then_block
?
builder
.
(Unit) -> Unit
position_at_end
(
Unit
then_block
)
let
Unit
then_env
=
?
self
.
() -> Unit
subenv
()
Unit
then_stmts
.
((Unit) -> Unit) -> Unit
each
(
Unit
s
=>
Unit
s
.
(Unit) -> Unit
emitStmt
(
Unit
then_env
))
let _ =
?
builder
.
(Unit) -> Unit
build_unconditional_branch
(
Unit
merge_block
)
// Generate code for the else_block
?
builder
.
(Unit) -> Unit
position_at_end
(
Unit
else_block
)
let
Unit
else_env
=
?
self
.
() -> Unit
subenv
()
Unit
else_stmts
.
((Unit) -> Unit) -> Unit
each
(
Unit
s
=>
Unit
s
.
(Unit) -> Unit
emitStmt
(
Unit
else_env
))
let _ =
?
builder
.
(Unit) -> Unit
build_unconditional_branch
(
Unit
merge_block
)
// After code generation is complete, the builder's position should be on the merge_block
?
builder
.
(Unit) -> Unit
position_at_end
(
Unit
merge_block
)
} // ... } }

Generated LLVM IR Example

For the following TinyMoonbit code:

if x > 0 {
  y = x + 1;
} else {
  y = x - 1;
}

It will generate LLVM IR similar to this:

  %1 = load i32, ptr %x, align 4
  %2 = icmp sgt i32 %1, 0
  br i1 %2, label %if.then, label %if.else

if.then:                                          ; preds = %0
  %3 = load i32, ptr %x, align 4
  %4 = add i32 %3, 1
  store i32 %4, ptr %y, align 4
  br label %if.end

if.else:                                          ; preds = %0
  %5 = load i32, ptr %x, align 4
  %6 = sub i32 %5, 1
  store i32 %6, ptr %y, align 4
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  ; Subsequent code...

Loop Structures: Implementing while Statements

The implementation of loops requires special attention to the correct connection of the condition check and the loop body:

fn Stmt::
(?, Env) -> Unit raise
emit
(
?
self
: Self,
Env
env
:
struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env
) ->
Unit
Unit
raise {
let
Unit
ctx
=
Env
env
.
Unit
gen
.
Unit
llvm_ctx
let
Unit
func
=
Env
env
.
Unit
llvm_func
let
?
builder
=
Env
env
.
Unit
gen
.
?
builder
match
?
self
{
(Unit, Unit) -> ?
While
(
Unit
cond
,
Unit
body
) => {
// Generate three blocks let
Unit
cond_block
=
Unit
ctx
.
(Unit) -> Unit
append_basic_block
(
Unit
llvm_func
)
let
Unit
body_block
=
Unit
ctx
.
(Unit) -> Unit
append_basic_block
(
Unit
llvm_func
)
let
Unit
merge_block
=
Unit
ctx
.
(Unit) -> Unit
append_basic_block
(
Unit
llvm_func
)
// First, unconditionally jump to the cond block let _ =
?
builder
.
(Unit) -> Unit
build_unconditional_branch
(
Unit
cond_block
)
?
builder
.
(Unit) -> Unit
position_at_end
(
Unit
cond_block
)
// Generate code within the cond block, as well as the conditional jump instruction let
Unit
cond_val
=
Unit
cond
.
() -> Unit
emit
().
() -> Unit
into_int_value
()
let _ =
?
builder
.
(Unit, Unit, Unit) -> Unit
build_conditional_branch
(
Unit
cond_val
,
Unit
body_block
,
Unit
merge_block
,
)
?
builder
.
(Unit) -> Unit
position_at_end
(
Unit
body_block
)
// Generate code for the body block, with an unconditional jump to the cond block at the end let
Unit
body_env
=
?
self
.
() -> Unit
subenv
()
Unit
body
.
((Unit) -> Unit) -> Unit
each
(
Unit
s
=>
Unit
s
.
(Unit) -> Unit
emitStmt
(
Unit
body_env
))
let _ =
?
builder
.
(Unit) -> Unit
build_unconditional_branch
(
Unit
cond_block
)
// After code generation is finished, jump to the merge block
?
builder
.
(Unit) -> Unit
position_at_end
(
Unit
merge_block
)
} // ... } }

Generated LLVM IR Example

For the TinyMoonbit code:

while i < 10 {
  i = i + 1;
}

It will generate:

  br label %while.cond

while.cond:                                       ; preds = %while.body, %0
  %1 = load i32, ptr %i, align 4
  %2 = icmp slt i32 %1, 10
  br i1 %2, label %while.body, label %while.end

while.body:                                       ; preds = %while.cond
  %3 = load i32, ptr %i, align 4
  %4 = add i32 %3, 1
  store i32 %4, ptr %i, align 4
  br label %while.cond

while.end:                                        ; preds = %while.cond
  ; Subsequent code...

💡 Control Flow Design Points

  1. Basic Block Naming Strategy

    The append_basic_block function also has a name label argument.

    // Use descriptive block names for easier debugging and understanding
    let 
    ?
    then_block
    =
    Unit
    context
    .
    (Unit, String) -> ?
    append_basic_block
    (
    Unit
    func
    ,
    String
    name
    ="if.then")
    let
    ?
    else_block
    =
    Unit
    context
    .
    (Unit, String) -> ?
    append_basic_block
    (
    Unit
    func
    ,
    String
    name
    ="if.else")
    let
    ?
    merge_block
    =
    Unit
    context
    .
    (Unit, String) -> ?
    append_basic_block
    (
    Unit
    func
    ,
    String
    name
    ="if.end")
  2. Scope Management

    // Create a separate scope for each branch and loop body
    let 
    ?
    branch_env
    =
    Unit
    env
    .
    () -> ?
    sub_env
    ()
    branch_stmts.each( stmt => stmt.emit(branch_env) }
  3. Builder Position Management

    At the end, be sure to place the instruction builder on the correct basic block.

    // Always ensure the builder points to the correct basic block
    builder.position_at_end(merge_block)
    // Generate instructions in this block...
    

Chapter 6: From LLVM IR to Machine Code

After generating the complete LLVM IR, we need to convert it into assembly code for the target machine. Although llvm.mbt provides a complete target machine configuration API, for learning purposes, we can use a simpler method.

Compiling with the llc Toolchain

The most direct method is to output the generated LLVM IR to a file and then use the LLVM toolchain to compile it:

Call the dump function of the Module, or you can use the println function.

let 
CodeGen
gen
:
struct CodeGen {
  parser_program: ?
  llvm_context: ?
  llvm_module: ?
  builder: ?
  llvm_functions: Map[String, ?]
}
CodeGen
= ...
let
?
prog
=
CodeGen
gen
.
?
llvm_prog
prog.dump() // dump is recommended as it will be slightly faster than println, with the same effect // or println(prog)

Complete Compilation Flow Example

Let's look at a complete compilation flow from source code to assembly code:

  1. TinyMoonbit Source Code

    fn 
    (n : Int) -> Int
    factorial
    (
    Int
    n
    :
    Int
    Int
    ) ->
    Int
    Int
    {
    if
    Int
    n
    (self_ : Int, other : Int) -> Bool
    <=
    1 {
    return 1; } return
    Int
    n
    (self : Int, other : Int) -> Int

    Multiplies two 32-bit integers. This is the implementation of the * operator for Int.

    Parameters:

    • self : The first integer operand.
    • other : The second integer operand.

    Returns the product of the two integers. If the result overflows the range of Int, it wraps around according to two's complement arithmetic.

    Example:

      inspect(42 * 2, content="84")
      inspect(-10 * 3, content="-30")
      let max = 2147483647 // Int.max_value
      inspect(max * 2, content="-2") // Overflow wraps around
    *
    (n : Int) -> Int
    factorial
    (
    Int
    n
    (self : Int, other : Int) -> Int

    Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

    Parameters:

    • self : The minuend (the number being subtracted from).
    • other : The subtrahend (the number to subtract).

    Returns the difference between self and other.

    Example:

      let a = 42
      let b = 10
      inspect(a - b, content="32")
      let max = 2147483647 // Int maximum value
      inspect(max - -1, content="-2147483648") // Overflow case
    -
    1);
    } fn main() -> Unit { let
    Int
    result
    :
    Int
    Int
    =
    (n : Int) -> Int
    factorial
    (5);
    (Int) -> Unit
    print_int
    (
    Int
    result
    );
    }
  2. Generated LLVM IR

    ; ModuleID = 'tinymoonbit'
    source_filename = "tinymoonbit"
    
    define i32 @factorial(i32 %0) {
    entry:
      %1 = alloca i32, align 4
      store i32 %0, ptr %1, align 4
      %2 = load i32, ptr %1, align 4
      %3 = icmp sle i32 %2, 1
      br i1 %3, label %4, label %6
    
    4:                                                ; preds = %entry
      ret i32 1
    
    6:                                                ; preds = %entry
      %7 = load i32, ptr %1, align 4
      %8 = load i32, ptr %1, align 4
      %9 = sub i32 %8, 1
      %10 = call i32 @factorial(i32 %9)
      %11 = mul i32 %7, %10
      ret i32 %11
    }
    
    define void @main() {
    entry:
      %0 = alloca i32, align 4
      %1 = call i32 @factorial(i32 5)
      store i32 %1, ptr %0, align 4
      %2 = load i32, ptr %0, align 4
      call void @print_int(i32 %2)
      ret void
    }
    
    declare void @print_int(i32 %0)
    
  3. Generating RISC-V Assembly with llc

    # Generate llvm ir
    moon run main --target native > fact.ll
    
    # Generate RISC-V 64-bit assembly code
    llc -march=riscv64 -mattr=+m -o fact.s fact.ll
    
  4. Generated RISC-V Assembly Snippet

    factorial:
    .Lfunc_begin0:
    	.cfi_startproc
    	addi	sp, sp, -32
    	.cfi_def_cfa_offset 32
    	sd	ra, 24(sp)
    	.cfi_offset ra, -8
    	sd	s0, 16(sp)
    	.cfi_offset s0, -16
    	addi	s0, sp, 32
    	.cfi_def_cfa s0, 0
    	sw	a0, -20(s0)
    	lw	a0, -20(s0)
    	li	a1, 1
    	blt	a1, a0, .LBB0_2
    	li	a0, 1
    	j	.LBB0_3
    .LBB0_2:
    	lw	a0, -20(s0)
    	lw	a1, -20(s0)
    	addi	a1, a1, -1
    	sw	a0, -24(s0)
    	mv	a0, a1
    	call	factorial
    	lw	a1, -24(s0)
    	mul	a0, a1, a0
    .LBB0_3:
    	ld	ra, 24(sp)
    	ld	s0, 16(sp)
    	addi	sp, sp, 32
    	ret
    

Conclusion

Through this two-part series, we have completed a fully functional, albeit simple, compiler implementation. From the lexical analysis of a character stream to the construction of an abstract syntax tree, and finally to the generation of LLVM IR and machine code output.

Review

Part 1:

  • An elegant lexer based on pattern matching
  • Implementation of a recursive descent parser
  • A complete type-checking system
  • Scope management with an environment chain

Part 2:

  • A deep dive into the LLVM type and value systems
  • Variable management strategies in SSA form
  • Correct implementation of control flow instructions
  • A complete code generation pipeline

Moonbit's Advantages in Compiler Development

Through this practical project, we have gained a deep appreciation for Moonbit's unique value in the field of compiler construction:

  1. Expressive Pattern Matching: Greatly simplifies the complexity of AST processing and type analysis.
  2. Functional Programming Paradigm: Immutable data structures and pure functions make the compiler logic clearer and more reliable.
  3. Modern Type System: Trait objects, generics, and error handling mechanisms provide ample abstraction capabilities.
  4. Excellent Engineering Features: Features like derive and JSON serialization significantly improve development efficiency.

Final Words

Compiler technology represents the perfect combination of computer science theory and engineering practice. With a modern tool like Moonbit, we can explore this ancient yet vibrant field in a more elegant and efficient way.

We hope this series of articles will provide readers with a powerful aid on their journey into compiler design.

Recommended Learning Resources


Dancing with LLVM: A Moonbit Chronicle (Part 1) - Implementing the Frontend

· 16 min read


Introduction

Programming language design and compiler implementation have long been considered among the most challenging topics in computer science. The traditional path to learning compilers often requires students to first master a complex set of theoretical foundations:

  • Automata Theory: Finite state machines and regular expressions
  • Type Theory: The mathematical underpinnings of λ-calculus and type systems
  • Computer Architecture: Low-level implementation from assembly language to machine code

However, Moonbit, a functional programming language designed for the modern development landscape, offers a fresh perspective. It not only features a rigorous type system and exceptional memory safety guarantees but, more importantly, its rich syntax and toolchain tailored for the AI era make it an ideal choice for learning and implementing compilers.

Series Overview This series of articles will delve into the core concepts and best practices of modern compiler implementation by building a small programming language compiler called TinyMoonbit.

  • Part 1: Focuses on the implementation of the language frontend, including lexical analysis, parsing, and type checking, ultimately generating an abstract syntax tree with complete type annotations.
  • Part 2: Dives into the code generation phase, utilizing Moonbit's official llvm.mbt binding library to convert the abstract syntax tree into LLVM intermediate representation and finally generate RISC-V assembly code.

TinyMoonbit Language Design

TinyMoonbit is a systems-level programming language with an abstraction level comparable to C. Although its syntax heavily borrows from Moonbit, TinyMoonbit is not a subset of the Moonbit language. Instead, it is a simplified version designed to test the feature completeness of llvm.mbt while also serving an educational purpose.

Note: Due to space constraints, the TinyMoonbit implementation discussed in this series is simpler than the actual TinyMoonbit. For the complete version, please refer to TinyMoonbitLLVM.

Core Features

TinyMoonbit provides the fundamental features required for modern systems programming:

  • Low-level Memory Operations: Direct pointer manipulation and memory management
  • Control Flow Structures: Conditional branches, loops, and function calls
  • Type Safety: Static type checking and explicit type declarations
  • Simplified Design: To reduce implementation complexity, advanced features like type inference and closures are not supported.

Syntax Example

Let's demonstrate TinyMoonbit's syntax with a classic implementation of the Fibonacci sequence:

extern fn 
(x : Int) -> Unit
print_int
(
Int
x
:
Int
Int
) ->
Unit
Unit
;
// Recursive implementation of the Fibonacci sequence fn
(n : Int) -> Int
fib
(
Int
n
:
Int
Int
) ->
Int
Int
{
if
Int
n
(self_ : Int, other : Int) -> Bool
<=
1 {
return
Int
n
;
} return
(n : Int) -> Int
fib
(
Int
n
(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
-
1)
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
(n : Int) -> Int
fib
(
Int
n
(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
-
2);
} fn main {
(x : Int) -> Unit
print_int
(
(n : Int) -> Int
fib
(10));
}

Compilation Target

After the complete compilation process, the above code will generate the following LLVM Intermediate Representation:

; ModuleID = 'tinymoonbit'
source_filename = "tinymoonbit"

define i32 @fib(i32 %0) {
entry:
  %1 = alloca i32, align 4
  store i32 %0, ptr %1, align 4
  %2 = load i32, ptr %1, align 4
  %3 = icmp sle i32 %2, 1
  br i1 %3, label %4, label %6

4:                                                ; preds = %entry
  %5 = load i32, ptr %1, align 4
  ret i32 %5

6:                                                ; preds = %4, %entry
  %7 = load i32, ptr %1, align 4
  %8 = sub i32 %7, 1
  %9 = call i32 @fib(i32 %8)
  %10 = load i32, ptr %1, align 4
  %11 = sub i32 %10, 2
  %12 = call i32 @fib(i32 %11)
  %13 = add i32 %9, %12
  ret i32 %13
}

define void @main() {
entry:
  %0 = call i32 @fib(i32 10)
  call void @print_int(i32 %0)
}

declare void @print_int(i32 %0)

Chapter 2: Lexical Analysis

Lexical Analysis is the first stage of the compilation process. Its core mission is to convert a continuous stream of characters into a sequence of meaningful tokens. This seemingly simple conversion process is, in fact, the cornerstone of the entire compiler pipeline.

From Characters to Symbols: Token Design and Implementation

Consider the following code snippet:

let 
Int
x
:
Int
Int
= 5;

After being processed by the lexer, it will produce the following sequence of tokens:

(Keyword "let") → (Identifier "x") → (Symbol ":") →
(Type "Int") → (Operator "=") → (IntLiteral 5) → (Symbol ";")

This conversion process needs to handle various complex situations:

  1. Whitespace Filtering: Skipping spaces, tabs, and newlines.
  2. Keyword Recognition: Distinguishing reserved words from user-defined identifiers.
  3. Numeric Parsing: Correctly identifying the boundaries of integers and floating-point numbers.
  4. Operator Handling: Differentiating between single-character and multi-character operators.

Token Type System Design

Based on the TinyMoonbit syntax specification, we classify all possible symbols into the following token types:

pub enum Token {
  
(Bool) -> Token
Bool
(
Bool
Bool
) // Boolean values: true, false
(Int) -> Token
Int
(
Int
Int
) // Integers: 1, 2, 3, ...
(Double) -> Token
Double
(
Double
Double
) // Floating-point numbers: 1.0, 2.5, 3.14, ...
(String) -> Token
Keyword
(
String
String
) // Reserved words: let, if, while, fn, return
(String) -> Token
Upper
(
String
String
) // Type identifiers: start with an uppercase letter, e.g., Int, Double, Bool
(String) -> Token
Lower
(
String
String
) // Variable identifiers: start with a lowercase letter, e.g., x, y, result
(String) -> Token
Symbol
(
String
String
) // Operators and punctuation: +, -, *, :, ;, ->
(Char) -> Token
Bracket
(
Char
Char
) // Brackets: (, ), [, ], {, }
Token
EOF
// End-of-file marker
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
)

Leveraging Pattern Matching

Moonbit's powerful pattern matching capabilities allow us to implement the lexer in an unprecedentedly elegant way. Compared to the traditional finite state machine approach, this pattern-matching-based implementation is more intuitive and easier to understand.

Core Analysis Function

pub fn 
(code : String) -> Array[Token]
lex
(
String
code
:
String
String
) ->
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
] {
let
Array[Token]
tokens
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
(capacity? : Int) -> Array[Token]

Creates a new empty array with an optional initial capacity.

Parameters:

  • capacity : The initial capacity of the array. If 0 (default), creates an array with minimum capacity. Must be non-negative.

Returns a new empty array of type Array[T] with the specified initial capacity.

Example:

  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new
()
loop
String
code
[:] {
// Skip whitespace characters
@string.View
[' ' | '\n' | '\r' | '\t', ..rest]
=>
continue
@string.View
rest
// Handle single-line comments
@string.View
[.."//", ..rest]
=>
continue loop
@string.View
rest
{
@string.View
['\n' | '\r', ..rest_str]
=> break
@string.View
rest_str
@string.View
[_, ..rest_str]
=> continue
@string.View
rest_str
@string.View
[] as rest_str
=> break
@string.View
rest_str
} // Recognize multi-character operators (order is important!)
@string.View
[.."->", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("->")); continue
@string.View
rest
}
@string.View
[.."==", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("==")); continue
@string.View
rest
}
@string.View
[.."!=", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("!=")); continue
@string.View
rest
}
@string.View
[.."<=", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("<=")); continue
@string.View
rest
}
@string.View
[..">=", ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
(">=")); continue
@string.View
rest
}
// Recognize single-character operators and punctuation [':' | '.' | ',' | ';' | '+' | '-' | '*' | '/' | '%' | '>' | '<' | '=' as c, ..rest] => {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Symbol
("\{
Char
c
}"))
continue
@string.View
rest
} // Recognize brackets
@string.View
[
Char
'(' | ')' | '[' | ']' | '{' | '}' as c
@string.View
, ..rest]
=> {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(Char) -> Token
Bracket
(
Char
c
))
continue
@string.View
rest
} // Recognize identifiers and literals
@string.View
['a'..='z', ..] as code
=> {
let (
Token
tok
,
@string.View
rest
) =
(@string.View) -> (Token, @string.View)
lex_ident
(
@string.View
code
);
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Token
tok
)
continue
@string.View
rest
} ['A'..='Z', ..] => { ... } ['0'..='9', ..] => { ... } // Reached the end of the file [] => {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Token
EOF
); break
Array[Token]
tokens
}
} }

Keyword Recognition Strategy

Identifier parsing requires special handling for keyword recognition:

pub fn 
(rest : @string.View) -> (Token, @string.View)
let_ident
(
@string.View
rest
:
#builtin.valtype
type @string.View

A @string.View represents a view of a String that maintains proper Unicode character boundaries. It allows safe access to a substring while handling multi-byte characters correctly.

@string.View
) -> (
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
,
#builtin.valtype
type @string.View

A @string.View represents a view of a String that maintains proper Unicode character boundaries. It allows safe access to a substring while handling multi-byte characters correctly.

@string.View
) {
// Predefined keyword map let
Unit
keyword_map
=
Unit
Map
.
(Array[(String, Token)]) -> Unit
from_array
([
("let", Token::
(String) -> Token
Keyword
("let")),
("fn", Token::
(String) -> Token
Keyword
("fn")),
("if", Token::
(String) -> Token
Keyword
("if")),
("else", Token::
(String) -> Token
Keyword
("else")),
("while", Token::
(String) -> Token
Keyword
("while")),
("return", Token::
(String) -> Token
Keyword
("return")),
("extern", Token::
(String) -> Token
Keyword
("extern")),
("true", Token::
(Bool) -> Token
Bool
(true)),
("false", Token::
(Bool) -> Token
Bool
(false)),
]) let
Array[Char]
identifier_chars
=
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
::
(capacity? : Int) -> Array[Char]

Creates a new empty array with an optional initial capacity.

Parameters:

  • capacity : The initial capacity of the array. If 0 (default), creates an array with minimum capacity. Must be non-negative.

Returns a new empty array of type Array[T] with the specified initial capacity.

Example:

  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new
()
let
@string.View
remaining
= loop
@string.View
rest
{
@string.View
[
Char
'a'..='z' | 'A'..='Z' | '0'..='9' | '_' as c
@string.View
, ..rest_str]
=> {
Array[Char]
identifier_chars
.
(self : Array[Char], value : Char) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Char
c
)
continue
@string.View
rest_str
}
@string.View
_ as rest_str
=> break
@string.View
rest_str
} let
String
ident
=
(Array[Char]) -> String
String::
(chars : Array[Char]) -> String

Convert char array to string.

  let s = @string.from_array(['H', 'e', 'l', 'l', 'o'])
  assert_eq(s, "Hello")

Do not convert large datas to Array[Char] and build a string with String::from_array.

For efficiency considerations, it's recommended to use Buffer instead.

from_array
(
Array[Char]
identifier_chars
)
let
Token
token
=
Unit
keyword_map
.
(String) -> Unit
get
(
String
ident
).
(() -> Token) -> Token
or_else
(() => Token::
(String) -> Token
Lower
(
String
ident
))
(
Token
token
,
@string.View
remaining
)
}

💡 In-depth Analysis of Moonbit Syntax Features

The implementation of the lexer above fully demonstrates several outstanding advantages of Moonbit in compiler development:

  1. Functional Loop Construct

    loop initial_value {
      pattern1 => continue new_value1
      pattern2 => continue new_value2
      pattern3 => break final_value
    }
    

    loop is not a traditional loop structure but a functional loop:

    • It accepts an initial parameter as the loop state.
    • It handles different cases through pattern matching.
    • continue passes the new state to the next iteration.
    • break terminates the loop and returns the final value.
  2. String Views and Pattern Matching

    Moonbit's string pattern matching feature greatly simplifies text processing:

    // Match a single character
    ['a', ..rest] => // Starts with the character 'a'
    
    // Match a character range
    ['a'..='z' as c, ..rest] => // A lowercase letter, bound to the variable c
    
    // Match a string literal
    [.."hello", ..rest] => // Equivalent to ['h','e','l','l','o', ..rest]
    
    // Match multiple possible characters
    [' ' | '\t' | '\n', ..rest] => // Any whitespace character
    
  3. The Importance of Pattern Matching Priority

    ⚠️ Important Reminder: The order of matching is crucial.

    When writing pattern matching rules, you must place more specific patterns before more general ones. For example:

    // ✅ Correct order
    loop code[:] {
      [.."->", ..rest] => { ... }     // Match multi-character operators first
      ['-' | '>' as c, ..rest] => { ... }  // Then match single characters
    }
    
    // ❌ Incorrect order - "->" will never be matched
    loop code[:] {
      ['-' | '>' as c, ..rest] => { ... }
      [.."->", ..rest] => { ... }     // This will never be executed
    }
    

By using this pattern-matching-based approach, we not only avoid complex state machine implementations but also achieve a clearer and more maintainable code structure.


Chapter 3: Parsing and Abstract Syntax Tree Construction

Syntactic Analysis (or Parsing) is the second core stage of the compiler. Its task is to reorganize the sequence of tokens produced by lexical analysis into a hierarchical Abstract Syntax Tree (AST). This process not only verifies whether the program conforms to the language's grammatical rules but also provides a structured data representation for subsequent semantic analysis and code generation.

Abstract Syntax Tree Design: A Structured Representation of the Program

Before building the parser, we need to carefully design the structure of the AST. This design determines how the program's syntactic structure is represented and how subsequent compilation stages will process these structures.

1. Core Type System

First, we define the representation of the TinyMoonbit type system in the AST:

pub enum Type {
  
Type
Unit
// Unit type, represents no return value
Type
Bool
// Boolean type: true, false
Type
Int
// 32-bit signed integer
Type
Double
// 64-bit double-precision floating-point number
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
)
pub fn
(type_name : String) -> Type
parse_type
(
String
type_name
:
String
String
) ->
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
{
match
String
type_name
{
"Unit" => Type::
Type
Unit
"Bool" => Type::
Type
Bool
"Int" => Type::
Type
Int
"Double" => Type::
Type
Double
_ =>
(msg : String) -> Type

Aborts the program with an error message. Always causes a panic, regardless of the message provided.

Parameters:

  • message : A string containing the error message to be displayed when aborting.

Returns a value of type T. However, this function never actually returns a value as it always causes a panic.

abort
("Unknown type: \{
String
type_name
}")
} }

2. Layered AST Node Design

We use a layered design to clearly represent the different abstraction levels of the program:

  1. Atomic Expressions (AtomExpr) Represent the most basic, indivisible expression units:

    pub enum AtomExpr {
      
    (Bool) -> AtomExpr
    Bool
    (
    Bool
    Bool
    ) // Boolean literal
    (Int) -> AtomExpr
    Int
    (
    Int
    Int
    ) // Integer literal
    (Double) -> AtomExpr
    Double
    (
    Double
    Double
    ) // Floating-point literal
    (String, ty~ : Type?) -> AtomExpr
    Var
    (
    String
    String
    , mut
    Type?
    ty
    ~ :
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    ?) // Variable reference
    (Expr, ty~ : Type?) -> AtomExpr
    Paren
    (
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    , mut
    Type?
    ty
    ~ :
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    ?) // Parenthesized expression
    (String, Array[Expr], ty~ : Type?) -> AtomExpr
    Call
    (
    String
    String
    ,
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ], mut
    Type?
    ty
    ~ :
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    ?) // Function call
    } derive(
    trait Show {
      output(Self, &Logger) -> Unit
      to_string(Self) -> String
    }

    Trait for types that can be converted to String

    Show
    ,
    trait Eq {
      equal(Self, Self) -> Bool
      op_equal(Self, Self) -> Bool
    }

    Trait for types whose elements can test for equality

    Eq
    ,
    trait ToJson {
      to_json(Self) -> Json
    }

    Trait for types that can be converted to Json

    ToJson
    )
  2. Compound Expressions (Expr) More complex structures that can contain operators and multiple sub-expressions:

    pub enum Expr {
      
    (AtomExpr, ty~ : Type?) -> Expr
    AtomExpr
    (
    enum AtomExpr {
      Bool(Bool)
      Int(Int)
      Double(Double)
      Var(String, ty~ : Type?)
      Paren(Expr, ty~ : Type?)
      Call(String, Array[Expr], ty~ : Type?)
    } derive(Show, Eq, ToJson)
    AtomExpr
    , mut
    Type?
    ty
    ~ :
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    ?) // Wrapper for atomic expressions
    (String, Expr, ty~ : Type?) -> Expr
    Unary
    (
    String
    String
    ,
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    , mut
    Type?
    ty
    ~ :
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    ?) // Unary operation: -, !
    (String, Expr, Expr, ty~ : Type?) -> Expr
    Binary
    (
    String
    String
    ,
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ,
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    , mut
    Type?
    ty
    ~ :
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    ?) // Binary operation: +, -, *, /, ==, !=, etc.
    } derive(
    trait Show {
      output(Self, &Logger) -> Unit
      to_string(Self) -> String
    }

    Trait for types that can be converted to String

    Show
    ,
    trait Eq {
      equal(Self, Self) -> Bool
      op_equal(Self, Self) -> Bool
    }

    Trait for types whose elements can test for equality

    Eq
    ,
    trait ToJson {
      to_json(Self) -> Json
    }

    Trait for types that can be converted to Json

    ToJson
    )
  3. Statements (Stmt) Represent executable units in the program:

    pub enum Stmt {
      
    (String, Type, Expr) -> Stmt
    Let
    (
    String
    String
    ,
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    ,
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ) // Variable declaration: let x : Int = 5;
    (String, Expr) -> Stmt
    Assign
    (
    String
    String
    ,
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ) // Assignment statement: x = 10;
    (Expr, Array[Stmt], Array[Stmt]) -> Stmt
    If
    (
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ,
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [
    enum Stmt {
      Let(String, Type, Expr)
      Assign(String, Expr)
      If(Expr, Array[Stmt], Array[Stmt])
      While(Expr, Array[Stmt])
      Return(Expr?)
      Expr(Expr)
    } derive(Show, Eq, ToJson)
    Stmt
    ],
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [
    enum Stmt {
      Let(String, Type, Expr)
      Assign(String, Expr)
      If(Expr, Array[Stmt], Array[Stmt])
      While(Expr, Array[Stmt])
      Return(Expr?)
      Expr(Expr)
    } derive(Show, Eq, ToJson)
    Stmt
    ]) // Conditional branch: if-else
    (Expr, Array[Stmt]) -> Stmt
    While
    (
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ,
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [
    enum Stmt {
      Let(String, Type, Expr)
      Assign(String, Expr)
      If(Expr, Array[Stmt], Array[Stmt])
      While(Expr, Array[Stmt])
      Return(Expr?)
      Expr(Expr)
    } derive(Show, Eq, ToJson)
    Stmt
    ]) // Loop statement: while
    (Expr?) -> Stmt
    Return
    (
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ?) // Return statement: return expr;
    (Expr) -> Stmt
    Expr
    (
    enum Expr {
      AtomExpr(AtomExpr, ty~ : Type?)
      Unary(String, Expr, ty~ : Type?)
      Binary(String, Expr, Expr, ty~ : Type?)
    } derive(Show, Eq, ToJson)
    Expr
    ) // Expression statement
    } derive(
    trait Show {
      output(Self, &Logger) -> Unit
      to_string(Self) -> String
    }

    Trait for types that can be converted to String

    Show
    ,
    trait Eq {
      equal(Self, Self) -> Bool
      op_equal(Self, Self) -> Bool
    }

    Trait for types whose elements can test for equality

    Eq
    ,
    trait ToJson {
      to_json(Self) -> Json
    }

    Trait for types that can be converted to Json

    ToJson
    )
  4. Top-Level Structures Function definitions and the complete program:

    pub struct Function {
      
    String
    name
    :
    String
    String
    // Function name
    Array[(String, Type)]
    params
    :
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [(
    String
    String
    ,
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    )] // Parameter list: [(param_name, type)]
    Type
    ret_ty
    :
    enum Type {
      Unit
      Bool
      Int
      Double
    } derive(Show, Eq, ToJson)
    Type
    // Return type
    Array[Stmt]
    body
    :
    type Array[T]

    An Array is a collection of values that supports random access and can grow in size.

    Array
    [
    enum Stmt {
      Let(String, Type, Expr)
      Assign(String, Expr)
      If(Expr, Array[Stmt], Array[Stmt])
      While(Expr, Array[Stmt])
      Return(Expr?)
      Expr(Expr)
    } derive(Show, Eq, ToJson)
    Stmt
    ] // Sequence of statements in the function body
    } derive(
    trait Show {
      output(Self, &Logger) -> Unit
      to_string(Self) -> String
    }

    Trait for types that can be converted to String

    Show
    ,
    trait Eq {
      equal(Self, Self) -> Bool
      op_equal(Self, Self) -> Bool
    }

    Trait for types whose elements can test for equality

    Eq
    ,
    trait ToJson {
      to_json(Self) -> Json
    }

    Trait for types that can be converted to Json

    ToJson
    )
    // The program is defined as a map from function names to function definitions pub type Program
    type Map[K, V]

    Mutable linked hash map that maintains the order of insertion, not thread safe.

    Example

      let map = { 3: "three", 8 :  "eight", 1 :  "one"}
      assert_eq(map.get(2), None)
      assert_eq(map.get(3), Some("three"))
      map.set(3, "updated")
      assert_eq(map.get(3), Some("updated"))
    Map
    [
    String
    String
    ,
    struct Function {
      name: String
      params: Array[(String, Type)]
      ret_ty: Type
      body: Array[Stmt]
    } derive(Show, Eq, ToJson)
    Function
    ]

Design Highlight: Mutability of Type Annotations

Notice that each expression node contains a mut ty~ : Type? field. This design allows us to fill in type information during the type-checking phase without having to rebuild the entire AST.

Recursive Descent Parsing: A Top-Down Construction Strategy

Recursive Descent is a top-down parsing method where the core idea is to write a corresponding parsing function for each grammar rule. In Moonbit, pattern matching makes the implementation of this method exceptionally elegant.

Parsing Atomic Expressions

pub fn 
(tokens : ArrayView[Token]) -> (AtomExpr, ArrayView[Token]) raise
parse_atom_expr
(
ArrayView[Token]
tokens
:
#deprecated("use @array.View instead")
#builtin.valtype
type ArrayView[T]

A @array.View represents a view into a section of an array without copying the data.

Example

let arr = [1, 2, 3, 4, 5]
let view = arr[1:4] // Creates a view of elements at indices 1,2,3
inspect(view[0], content="2")
inspect(view.length(), content="3")
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]
) -> (
enum AtomExpr {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Var(String, ty~ : Type?)
  Paren(Expr, ty~ : Type?)
  Call(String, Array[Expr], ty~ : Type?)
} derive(Show, Eq, ToJson)
AtomExpr
,
#deprecated("use @array.View instead")
#builtin.valtype
type ArrayView[T]

A @array.View represents a view into a section of an array without copying the data.

Example

let arr = [1, 2, 3, 4, 5]
let view = arr[1:4] // Creates a view of elements at indices 1,2,3
inspect(view[0], content="2")
inspect(view.length(), content="3")
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) raise {
match
ArrayView[Token]
tokens
{
// Parse literals
ArrayView[Token]
[
(Bool) -> Token
Bool
ArrayView[Token]
(
Bool
b
ArrayView[Token]
), ..rest]
=> (AtomExpr::
(Bool) -> AtomExpr
Bool
(
Bool
b
),
ArrayView[Token]
rest
)
ArrayView[Token]
[
(Int) -> Token
Int
ArrayView[Token]
(
Int
i
ArrayView[Token]
), ..rest]
=> (AtomExpr::
(Int) -> AtomExpr
Int
(
Int
i
),
ArrayView[Token]
rest
)
ArrayView[Token]
[
(Double) -> Token
Double
ArrayView[Token]
(
Double
d
ArrayView[Token]
), ..rest]
=> (AtomExpr::
(Double) -> AtomExpr
Double
(
Double
d
),
ArrayView[Token]
rest
)
// Parse function calls: func_name(arg1, arg2, ...)
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(
String
func_name
ArrayView[Token]
),
(Char) -> Token
Bracket
ArrayView[Token]
('('), ..rest]
=> {
let (
Array[Expr]
args
,
Unit
rest
) =
(ArrayView[Token]) -> (Array[Expr], Unit)
parse_argument_list
(
ArrayView[Token]
rest
)
match
Unit
rest
{
Unit
[
(Char) -> _/0
Bracket
Unit
(')'), ..remaining]
=>
(AtomExpr::
(String, Array[Expr], ty~ : Type?) -> AtomExpr
Call
(
String
func_name
,
Array[Expr]
args
,
Type?
ty
=
Type?
None
),
ArrayView[Token]
remaining
)
_ => raise
Error
SyntaxError
("Expected ')' after function arguments")
} } // Parse variable references
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(
String
var_name
ArrayView[Token]
), ..rest]
=>
(AtomExpr::
(String, ty~ : Type?) -> AtomExpr
Var
(
String
var_name
,
Type?
ty
=
Type?
None
),
ArrayView[Token]
rest
)
// Parse parenthesized expressions: (expression)
ArrayView[Token]
[
(Char) -> Token
Bracket
ArrayView[Token]
('('), ..rest]
=> {
let (
Expr
expr
,
ArrayView[Token]
rest
) =
(tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression
(
ArrayView[Token]
rest
)
match
ArrayView[Token]
rest
{
ArrayView[Token]
[
(Char) -> Token
Bracket
ArrayView[Token]
(')'), ..remaining]
=>
(AtomExpr::
(Expr, ty~ : Type?) -> AtomExpr
Paren
(
Expr
expr
,
Type?
ty
=
Type?
None
),
ArrayView[Token]
remaining
)
_ => raise
Error
SyntaxError
("Expected ')' after expression")
} } _ => raise
Error
SyntaxError
("Expected atomic expression")
} }

Parsing Statements

Statement parsing needs to dispatch to different handler functions based on the starting keyword:

pub fn 
(tokens : ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_stmt
(
ArrayView[Token]
tokens
:
#deprecated("use @array.View instead")
#builtin.valtype
type ArrayView[T]

A @array.View represents a view into a section of an array without copying the data.

Example

let arr = [1, 2, 3, 4, 5]
let view = arr[1:4] // Creates a view of elements at indices 1,2,3
inspect(view[0], content="2")
inspect(view.length(), content="3")
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) -> (
enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt
,
#deprecated("use @array.View instead")
#builtin.valtype
type ArrayView[T]

A @array.View represents a view into a section of an array without copying the data.

Example

let arr = [1, 2, 3, 4, 5]
let view = arr[1:4] // Creates a view of elements at indices 1,2,3
inspect(view[0], content="2")
inspect(view.length(), content="3")
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) {
match
ArrayView[Token]
tokens
{
// Parse let statements [
(String) -> Token
Keyword
("let"),
(String) -> Token
Lower
(
String
var_name
),
(String) -> Token
Symbol
(":"), ..] => { /* ... */ }
// Parse if/while/return statements
ArrayView[Token]
[
(String) -> Token
Keyword
ArrayView[Token]
("if"), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_if_stmt
(
ArrayView[Token]
rest
)
ArrayView[Token]
[
(String) -> Token
Keyword
ArrayView[Token]
("while"), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_while_stmt
(
ArrayView[Token]
rest
)
ArrayView[Token]
[
(String) -> Token
Keyword
ArrayView[Token]
("return"), .. rest]
=> { /* ... */ }
// Parse assignment statements
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(_),
(String) -> Token
Symbol
ArrayView[Token]
("="), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_assign_stmt
(
ArrayView[Token]
tokens
)
// Parse single expression statements
ArrayView[Token]
[
(String) -> Token
Lower
ArrayView[Token]
(_),
(String) -> Token
Symbol
ArrayView[Token]
("="), .. rest]
=>
(ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_single_expr_stmt
(
ArrayView[Token]
tokens
)
_ => { /* Error handling */ } } }

Challenge: Handling Operator Precedence:

The most complex part of expression parsing is handling operator precedence. We need to ensure that 1 + 2 * 3 is correctly parsed as 1 + (2 * 3) and not (1 + 2) * 3.

💡 Application of Advanced Moonbit Features

Automatic Derivation Feature

pub enum Expr {
  // ...
} derive(Show, Eq, ToJson)

Moonbit's derive feature automatically generates common implementations for types. Here we use three:

  • Show: Provides debugging output functionality.
  • Eq: Supports equality comparison.
  • ToJson: Serializes to JSON format, which is convenient for debugging and persistence.

These automatically generated features are extremely useful in compiler development, especially during the debugging and testing phases.

Error Handling Mechanism

pub fn 
(tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression
(
ArrayView[Token]
tokens
:
#deprecated("use @array.View instead")
#builtin.valtype
type ArrayView[T]

A @array.View represents a view into a section of an array without copying the data.

Example

let arr = [1, 2, 3, 4, 5]
let view = arr[1:4] // Creates a view of elements at indices 1,2,3
inspect(view[0], content="2")
inspect(view.length(), content="3")
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) -> (
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
,
#deprecated("use @array.View instead")
#builtin.valtype
type ArrayView[T]

A @array.View represents a view into a section of an array without copying the data.

Example

let arr = [1, 2, 3, 4, 5]
let view = arr[1:4] // Creates a view of elements at indices 1,2,3
inspect(view[0], content="2")
inspect(view.length(), content="3")
ArrayView
[
enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token
]) raise {
// The 'raise' keyword indicates that this function may throw an exception }

Moonbit's raise mechanism provides structured error handling, allowing syntax errors to be accurately located and reported.

Through this layered design and recursive descent parsing strategy, we have built a parser that is both flexible and efficient, laying a solid foundation for the subsequent type-checking phase.


Chapter 4: Type Checking and Semantic Analysis

Semantic Analysis is a crucial intermediate stage in compiler design. While parsing ensures the program's structure is correct, it doesn't mean the program is semantically valid. Type Checking, as the core component of semantic analysis, is responsible for verifying the type consistency of all operations in the program, ensuring type safety and runtime correctness.

Scope Management: Building the Environment Chain

The primary challenge in type checking is correctly handling variable scopes. At different levels of the program (global, function, block), the same variable name may refer to different entities. We adopt the classic design of an Environment Chain to solve this problem:

pub struct TypeEnv[K, V] {
  
TypeEnv[K, V]?
parent
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
[

type parameter K

K
,

type parameter V

V
]? // Reference to the parent environment
Map[K, V]
data
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[

type parameter K

K
,

type parameter V

V
] // Variable bindings in the current environment
}

The core of the environment chain is the variable lookup algorithm, which follows the rules of lexical scoping:

pub fn 
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
::
(self : TypeEnv[K, V], key : K) -> V?
get
[K :
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
+
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
, V](
TypeEnv[K, V]
self
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
Self
[

type parameter K

K
,

type parameter V

V
],
K
key
:

type parameter K

K
) ->

type parameter V

V
? {
match
TypeEnv[K, V]
self
.
Map[K, V]
data
.
(self : Map[K, V], key : K) -> V?

Retrieves the value associated with a given key in the hash map.

Parameters:

  • self : The hash map to search in.
  • key : The key to look up in the map.

Returns Some(value) if the key exists in the map, None otherwise.

Example:

  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get
(
K
key
) {
(V) -> V?
Some
(
V
value
) =>
(V) -> V?
Some
(
V
value
) // Found in the current environment
V?
None
=>
match
TypeEnv[K, V]
self
.
TypeEnv[K, V]?
parent
{
(TypeEnv[K, V]) -> TypeEnv[K, V]?
Some
(
TypeEnv[K, V]
parent_env
) =>
TypeEnv[K, V]
parent_env
.
(self : TypeEnv[K, V], key : K) -> V?
get
(
K
key
) // Recursively search the parent environment
TypeEnv[K, V]?
None
=>
V?
None
// Reached the top-level environment, variable not defined
} } }

Design Principle: Lexical Scoping

This design ensures that variable lookup follows lexical scoping rules:

  1. First, search in the current scope.
  2. If not found, recursively search in the parent scope.
  3. Continue until the variable is found or the global scope is reached.

Type Checker Architecture

Environment management alone is not sufficient to complete the type-checking task. Some operations (like function calls) need to access global program information. Therefore, we design a comprehensive type checker:

pub struct TypeChecker {
  
TypeEnv[String, Type]
local_env
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
[
String
String
,
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
] // Local variable environment
Function
current_func
:
struct Function {
  name: String
  params: Array[(String, Type)]
  ret_ty: Type
  body: Array[Stmt]
} derive(Show, Eq, ToJson)
Function
// The function currently being checked
Program
program
:
type Program Map[String, Function]
Program
// Complete program information
}

Implementation of Partial Node Type Checking

The core of the type checker is to apply the corresponding type rules to different AST nodes. The following is the implementation of expression type checking:

pub fn 
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr
::
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
Expr
self
:
enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Self
,
TypeEnv[String, Type]
env
:
struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv
[
String
String
,
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
]
) ->
enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type
raise {
match
Expr
self
{
// Type checking for atomic expressions
(AtomExpr, ty~ : Type?) -> Expr
AtomExpr
Expr
(
AtomExpr
atom_expr
Expr
, ..) as node
=> {
let
Type
ty
=
AtomExpr
atom_expr
.
(TypeEnv[String, Type]) -> Type
check_type
(
TypeEnv[String, Type]
env
)
Expr
node
Unit
.ty =
(Type) -> Type?
Some
Unit
(
Type
ty
Unit
)
// Fill in the type information
Type
ty
} // Type checking for unary operations
(String, Expr, ty~ : Type?) -> Expr
Unary
Expr
("-",
Expr
expr
Expr
, ..) as node
=> {
let
Type
ty
=
Expr
expr
.
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
TypeEnv[String, Type]
env
)
Expr
node
Unit
.ty =
(Type) -> Type?
Some
Unit
(
Type
ty
Unit
)
Type
ty
} // Type checking for binary operations
(String, Expr, Expr, ty~ : Type?) -> Expr
Binary
Expr
("+",
Expr
lhs
Expr
,
Expr
rhs
Expr
, ..) as node
=> {
let
Type
lhs_type
=
Expr
lhs
.
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
TypeEnv[String, Type]
env
)
let
Type
rhs_type
=
Expr
rhs
.
(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type
(
TypeEnv[String, Type]
env
)
// Ensure operand types are consistent guard
Type
lhs_type
(Type, Type) -> Bool

automatically derived

==
Type
rhs_type
else {
raise
Error
TypeCheckError
(
"Binary operation requires matching types, got \{
Type
lhs_type
} and \{
Type
rhs_type
}"
) } let
Type
result_type
= match
String
op
{
// Comparison operators always return a boolean value "==" | "!=" | "<" | "<=" | ">" | ">=" => Type::
Type
Bool
// Arithmetic operators, etc., maintain the operand type _ =>
Type
lhs_type
}
Expr
node
Unit
.ty =
(Type) -> Type?
Some
Unit
(
Type
result_type
Unit
)
Type
result_type
} } }

💡 Moonbit Enum Modification Trick

During the type-checking process, we need to fill in type information for the AST nodes. Moonbit provides an elegant way to modify the mutable fields of enum variants:

pub enum Expr {
  AtomExpr(AtomExpr, mut ty~ : Type?)
  Unary(String, Expr, mut ty~ : Type?)
  Binary(String, Expr, Expr, mut ty~ : Type?)
} derive(Show, Eq, ToJson)

By using the as binding in pattern matching, we can get a reference to the enum variant and modify its mutable fields:

match expr {
  AtomExpr(atom_expr, ..) as node => {
    let 
?
ty
=
Unit
atom_expr
.
(Unit) -> ?
check_type
(
Unit
env
)
node.ty = Some(ty) // Modify the mutable field ty } // ... }

This design avoids the overhead of rebuilding the entire AST while maintaining a functional programming style.


Complete Compilation Flow Demonstration

After the three stages of lexical analysis, parsing, and type checking, our compiler frontend is now able to convert source code into a fully typed abstract syntax tree. Let's demonstrate the complete process with a simple example:

Source Code Example

fn 
(x : Int, y : Int) -> Int
add
(
Int
x
:
Int
Int
,
Int
y
:
Int
Int
) ->
Int
Int
{
return
Int
x
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
y
;
}

Compilation Output: Typed AST

Using the derive(ToJson) feature, we can output the final AST in JSON format for inspection:

{
  "functions": {
    "add": {
      "name": "add",
      "params": [
        ["x", { "$tag": "Int" }],
        ["y", { "$tag": "Int" }]
      ],
      "ret_ty": { "$tag": "Int" },
      "body": [
        {
          "$tag": "Return",
          "0": {
            "$tag": "Binary",
            "0": "+",
            "1": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "x",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "2": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "y",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "ty": { "$tag": "Int" }
          }
        }
      ]
    }
  }
}

From this JSON output, we can clearly see:

  1. Complete Function Signature: Including the parameter list and return type.
  2. Type-Annotated AST Nodes: Each expression carries type information.
  3. Structured Program Representation: Provides a clear data structure for the subsequent code generation phase.

Conclusion

In this article, we have delved into the complete implementation process of a compiler frontend. From a stream of characters to a typed abstract syntax tree, we have witnessed the unique advantages of the Moonbit language in compiler construction:

Core Takeaways

  1. The Power of Pattern Matching: Moonbit's string pattern matching and structural pattern matching greatly simplify the implementation of lexical analysis and parsing.
  2. Functional Programming Paradigm: The combination of the loop construct, environment chains, and immutable data structures provides a solution that is both elegant and efficient.
  3. Expressive Type System: Through mutable fields in enums and trait objects, we can build data structures that are both type-safe and flexible.
  4. Engineering Features: Features like derive, structured error handling, and JSON serialization significantly improve development efficiency.

Looking Ahead to Part 2

Having mastered the implementation of the frontend, the next article will guide us into the more exciting code generation phase. We will:

  • Delve into the design philosophy of LLVM Intermediate Representation.
  • Explore how to use Moonbit's official llvm.mbt binding library.
  • Implement the complete conversion from AST to LLVM IR.
  • Generate executable RISC-V assembly code.

Building a compiler is a complex and challenging process, but as we have shown in this article, Moonbit provides powerful and elegant tools for this task. Let's continue this exciting compiler construction journey in the next part.

Recommended Resources


Dependency Injection in FP: The Reader Monad

· 10 min read

Developers familiar with hexagonal architecture know that to keep core business logic pure and independent, we place "side effects" like database calls and external API interactions into "ports" and "adapters." These are then injected into the application layer using Dependency Injection (DI). It's safe to say that classic object-oriented and layered architectures rely heavily on DI.

But when I started building things in MoonBit, I had no idea.

I wanted to follow best practices in a functionally-oriented environment like MoonBit, but with no classes, no interfaces, and no DI containers, how was I supposed to implement DI?

This led me to a crucial question: In a field as mature as software engineering, was there truly no established, functional-native solution for something as fundamental as dependency injection?

The answer is a resounding yes. In the functional world, this solution is a monad: the Reader Monad.

First, What is a Monad?

A Monad can be understood as a "wrapper" or a "context."

Think of a normal function as an assembly line. You put a bag of flour in at one end and expect instant noodles to come out the other. But this simple picture hides the complexities the assembly line has to handle:

  • What if there's no flour? (null)
  • What if the dough is too dry and jams the machine? (Throwing exceptions)
  • The ingredient machine needs to read today's recipe is it beef or chicken flavor? (Reading external configuration)
  • The packaging machine at the end needs to log how many packages it has processed today. (Updating a counter)

Monad is the master control system for this complex assembly line. It bundles your data together with the context of the processing flow, ensuring the entire process runs smoothly and safely.

In software development, the Monad family has several common members:

  • Option(Maybe): Handles cases where a value might be missing. The box either has something in it or it's empty.
  • Result(Either): Handles operations that might fail. The box is either green (success) and contains a result, or it's red (failure) and contains an error.
  • State Monad: Manages situations that require modifying state. This box produces a result while also updating a counter on its side. Think of React's useState.
  • Future (or Promise): Deals with values that will exist in the future. This box gives you a "pickup slip," promising to deliver the goods later.
  • Reader Monad: The box can consult an "environment" at any time, but it cannot modify it.

The Reader Monad

The idea behind the Reader Monad dates back to the 1990s, gaining popularity in purely functional languages like Haskell. To uphold the strict rule of "purity" (i.e., functions cannot have side effects), developers needed an elegant way for multiple functions to share a common configuration environment. The Reader Monad was born to resolve this tension.

And today, its applications are widespread:

  • Application Configuration Management: Passing around global configurations like database connection pools, API keys, or feature flags.
  • Request Context Injection: In web services, bundling information like the currently logged-in user into an environment that can be accessed by all functions in the request handling chain.
  • Hexagonal Architecture: It's used to create a firewall between the core business logic (Domain/Application Layer) and external infrastructure (Infrastructure Layer).

In short, the Reader Monad is a specialized tool for handling read-only environmental dependencies. It solves two key problems:

  • Parameter Drilling: It saves us from passing a configuration object down through many layers of functions.
  • Decoupling Logic and Configuration: Business logic cares about what to do, not where the configuration comes from. This keeps the code clean and extremely easy to test.

The Core API

A Reader library typically includes a few core functions.

Reader::pure

This is like placing a value directly into a standard container. It takes an ordinary value and wraps it into the simplest possible Reader computation—one that doesn't depend on any environment. pure is often the last step in a pipeline, taking your final calculated result and putting it back into the Reader context, effectively "packaging" it.

typealias @reader.Reader

// `pure` creates a computation that ignores the environment.
let 
?
pure_reader
: Reader[
String
String
,
Int
Int
] =
(Int) -> ?
Reader::pure
(100)
test { // No matter what the environment is (e.g., "hello"), the result is always 100.
(a : Int, b : Int, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
?
pure_reader
.
(String) -> Int
run
("hello"), 100)
}

Reader::bind

This is the "connector" of the assembly line. It links different processing steps together, like connecting the "kneading" step to the "rolling" step to form a complete production line. Its purpose is sequencing. bind handles the plumbing behind the scenes; you define the steps, and it ensures the output of one computation is passed as the input to the next.

fnalias 
() -> ?
@reader.ask
// Step 1: Define a Reader that reads a value from the environment (an Int). let
?
step1
: Reader[
Int
Int
,
Int
Int
] =
() -> ?
ask
()
// Step 2: Define a function that takes the result of Step 1 // and returns a new Reader computation. fn
(n : Int) -> ?
step2_func
(
Int
n
:
Int
Int
) -> Reader[
Int
Int
,
Int
Int
] {
(Int) -> ?
Reader::pure
(
Int
n
(self : Int, other : Int) -> Int

Multiplies two 32-bit integers. This is the implementation of the * operator for Int.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns the product of the two integers. If the result overflows the range of Int, it wraps around according to two's complement arithmetic.

Example:

  inspect(42 * 2, content="84")
  inspect(-10 * 3, content="-30")
  let max = 2147483647 // Int.max_value
  inspect(max * 2, content="-2") // Overflow wraps around
*
2)
} // Use `bind` to chain the two steps together. let
?
computation
: Reader[
Int
Int
,
Int
Int
] =
?
step1
.
((Int) -> ?) -> ?
bind
(
(n : Int) -> ?
step2_func
)
test { // Run the entire computation with an environment of 5. // Flow: `ask()` gets 5 from the environment -> `bind` passes 5 to `step2_func` // -> `step2_func` calculates 5*2=10 -> the result is `pure(10)`.
(a : Int, b : Int, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
?
computation
.
(Int) -> Int
run
(5), 10)
}

Reader::map

This is like changing the value inside the container without touching the container itself. It simply transforms the result. Often, we just want to perform a simple conversion on a result, and using map is more direct and expresses intent more clearly than using the more powerful bind.

// `map` transforms the result without affecting the dependency.
let 
?
reader_int
: Reader[
Unit
Unit
,
Int
Int
] =
(Int) -> ?
Reader::pure
(5)
let
?
reader_string
: Reader[
Unit
Unit
,
String
String
] =
?
reader_int
.
((Unit) -> String) -> ?
map
(fn(
Unit
n
) {
"Value is \{
Unit
n
}"
}) test {
(a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
?
reader_string
.
(Unit) -> String
run
(()), "Value is 5")
}

ask

ask is like a worker on the assembly line who can, at any moment, look up at the "production recipe" hanging on the wall. This is our primary means of actually reading from the environment. While bind passes the environment along implicitly, ask is what you use when you need to explicitly find out what's written in that recipe.

// `ask` retrieves the entire environment.
let 
?
ask_reader
: Reader[
String
String
,
String
String
] =
() -> ?
ask
()
let
String
result
:
String
String
=
?
ask_reader
.
(String) -> String
run
("This is the environment")
test {
(a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
String
result
, "This is the environment")
}

A common helper, asks, is just a convenient shorthand for chaining ask and map.

DI vs. Reader Monad

Let's consider a classic example: developing a UserService that needs a Logger to record logs and a Database to fetch data.

In a traditional DI setup, you might have a UserService class that declares its Logger and Database dependencies in its constructor. At runtime, you create instances of the logger and database and "inject" them when creating the UserService instance.

interface Logger {
  info(message: string): void
}
interface Database {
  getUserById(id: number): { name: string } | undefined
}

class UserService {
  constructor(
    private logger: Logger,
    private db: Database
  ) {}

  getUserName(id: number): string | undefined {
    this.logger.info(`Querying user with id: ${id}`)
    const user = this.db.getUserById(id)
    return user?.name
  }
}

const myLogger: Logger = { info: (msg) => console.log(`[LOG] ${msg}`) }
const myDb: Database = {
  getUserById: (id) => (id === 1 ? { name: 'MoonbitLang' } : undefined)
}

const userService = new UserService(myLogger, myDb)
const userName = userService.getUserName(1) // "MoonbitLang"

With the Reader Monad, the approach is different. The getUserName function doesn't hold any dependencies itself. Instead, it's defined as a "computation description." It declares that it needs an AppConfig environment (which contains the logger and database) to run. This function is completely decoupled from the concrete implementations of its dependencies.

fnalias 
((Unit) -> String) -> ?
@reader.asks
struct User {
String
name
:
String
String
} trait
trait Logger {
  info(Self, String) -> Unit
}
Logger
{
(Self, String) -> Unit
info
(

type parameter Self

Self
,
String
String
) ->
Unit
Unit
} trait
trait Database {
  getUserById(Self, Int) -> User?
}
Database
{
(Self, Int) -> User?
getUserById
(

type parameter Self

Self
,
Int
Int
) ->
struct User {
  name: String
}
User
?
} struct AppConfig {
&Logger
logger
: &
trait Logger {
  info(Self, String) -> Unit
}
Logger
&Database
db
: &
trait Database {
  getUserById(Self, Int) -> User?
}
Database
} fn
(id : Int) -> ?
getUserName
(
Int
id
:
Int
Int
) -> Reader[
struct AppConfig {
  logger: &Logger
  db: &Database
}
AppConfig
,
String
String
?] {
((Unit) -> String) -> ?
asks
(
Unit
config
=> {
Unit
config
.
&Logger
logger
.
(&Logger, String) -> Unit
info
("Querying user with id: \{
Int
id
}")
let
User?
user
=
Unit
config
.
&Database
db
.
(&Database, Int) -> User?
getUserById
(
Int
id
)
User?
user
.
(self : User?, f : (User) -> String) -> String?

Maps the value of an Option using a provided function.

Example

  let a = Some(5)
  assert_eq(a.map(x => x * 2), Some(10))

  let b = None
  assert_eq(b.map(x => x * 2), None)
map
(
User
obj
=>
User
obj
.
String
name
)
}) } struct LocalDB {} impl
trait Database {
  getUserById(Self, Int) -> User?
}
Database
for
struct LocalDB {
}
LocalDB
with
(LocalDB, id : Int) -> User?
getUserById
(_,
Int
id
) {
if
Int
id
(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
==
1 {
(User) -> User?
Some
({
String
name
: "MoonbitLang" })
} else {
User?
None
} } struct LocalLogger {} impl
trait Logger {
  info(Self, String) -> Unit
}
Logger
for
struct LocalLogger {
}
LocalLogger
with
(LocalLogger, content : String) -> Unit
info
(_,
String
content
) {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("\{
String
content
}")
} test "Test UserName" { let
AppConfig
appConfig
=
struct AppConfig {
  logger: &Logger
  db: &Database
}
AppConfig
::{
&Database
db
:
struct LocalDB {
}
LocalDB
::{ },
&Logger
logger
:
struct LocalLogger {
}
LocalLogger
::{ } }
(a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
(id : Int) -> ?
getUserName
(1).
(AppConfig) -> Unit
run
(
AppConfig
appConfig
).
() -> String
unwrap
(), "MoonbitLang")
}

This characteristic makes the Reader Monad a perfect match for hexagonal architecture. The core principle of this architecture is Dependency Inversion — the core business logic should not depend on concrete infrastructure.

The getUserName function is a prime example. It only depends on the AppConfig abstraction (the "port"), with no knowledge of whether the underlying implementation is MySQL, PostgreSQL, or a mock database for testing.

But what problem can't it solve? State modification.

The environment in a Reader Monad is always "read-only." Once injected, it cannot be changed throughout the computation. If you need a mutable state, you'll have to turn to its sibling, the State Monad.

So, the benefit is clear: you can read configuration from anywhere in your computation. The drawback is just as clear too: it can only read.

A Simple i18n Utility

Frontend developers are likely familiar with libraries like i18next for internationalization (i18n). The core pattern involves injecting an i18n instance into the entire application using something like React Context. Any component can then access translation functions from this context. This is, in essence, a form of dependency injection.

This brings us back to our original goal: finding a DI pattern to support i18n in a CLI tool. Here’s a simple demonstration.

So first, let's install the dependencies.

moon add colmugx/reader

And then, we define the environment and dictionary types our i18n library will need. The environment, which we can call I18nConfig, would hold the current language (e.g., "en_US") and a dictionary. The dictionary would be a map of locales to their respective translation maps, where each translation map holds key-value pairs of translation keys and their translated strings.

typealias String as Locale

typealias String as TranslationKey

typealias String as TranslationValue

typealias 
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[
String
TranslationKey
,
String
TranslationValue
] as Translations
typealias
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[
String
Locale
,
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Translations
] as Dict
struct I18nConfig { // 'mut' is used here for demonstration purposes to easily change the language. mut
String
lang
:
String
Locale
Map[String, Map[String, String]]
dict
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Dict
}

Next, we create our translation function, t. This function takes a translation key as input and returns a Reader. This Reader describes a computation that, when run, will use asks to access the I18nConfig from the environment. It will look up the current language, find the corresponding dictionary, and then find the translation for the given key. If anything is not found, it gracefully defaults to returning the original key.

fn 
(key : String) -> ?
t
(
String
key
:
String
TranslationKey
) -> Reader[
struct I18nConfig {
  mut lang: String
  dict: Map[String, Map[String, String]]
}
I18nConfig
,
String
TranslationValue
] {
((Unit) -> String) -> ?
asks
(
Unit
config
=>
Unit
config
.
Map[String, Map[String, String]]
dict
.
(self : Map[String, Map[String, String]], key : String) -> Map[String, String]?

Retrieves the value associated with a given key in the hash map.

Parameters:

  • self : The hash map to search in.
  • key : The key to look up in the map.

Returns Some(value) if the key exists in the map, None otherwise.

Example:

  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get
(
Unit
config
.
String
lang
)
.
(self : Map[String, String]?, f : (Map[String, String]) -> String) -> String?

Maps the value of an Option using a provided function.

Example

  let a = Some(5)
  assert_eq(a.map(x => x * 2), Some(10))

  let b = None
  assert_eq(b.map(x => x * 2), None)
map
(
Map[String, String]
lang_map
=>
Map[String, String]
lang_map
.
(self : Map[String, String], key : String) -> String?

Retrieves the value associated with a given key in the hash map.

Parameters:

  • self : The hash map to search in.
  • key : The key to look up in the map.

Returns Some(value) if the key exists in the map, None otherwise.

Example:

  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get
(
String
key
).
(self : String?, default : String) -> String

Return the contained Some value or the provided default.

unwrap_or
(
String
key
))
.
(self : String?, default : String) -> String

Return the contained Some value or the provided default.

unwrap_or
(
String
key
))
}

And that's it. The core logic is surprisingly simple.

Now, let's imagine our CLI tool needs to display a welcome message in the language specified by the operating system's LANG environment variable.

We can define a welcome_message function that takes some content as input. It uses our t function to get the translation for the "welcome" key and then uses bind to chain another Reader computation that combines the translated text with the provided content.

RUN IT

fn 
(content : String) -> ?
welcome_message
(
String
content
:
String
String
) -> Reader[
struct I18nConfig {
  mut lang: String
  dict: Map[String, Map[String, String]]
}
I18nConfig
,
String
String
] {
(key : String) -> ?
t
("welcome").
((Unit) -> Unit) -> ?
bind
(
Unit
welcome_text
=>
(String) -> Unit
Reader::pure
("\{
Unit
welcome_text
} \{
String
content
}"))
} test { let
Map[String, Map[String, String]]
dict
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Dict
= {
"en_US": { "welcome": "Welcome To" }, "zh_CN": { "welcome": "欢迎来到" }, } // Assuming your system language (LANG) is zh_CN let
I18nConfig
app_config
=
struct I18nConfig {
  mut lang: String
  dict: Map[String, Map[String, String]]
}
I18nConfig
::{
String
lang
: "zh_CN",
Map[String, Map[String, String]]
dict
}
let
?
msg
=
(content : String) -> ?
welcome_message
("MoonbitLang")
(a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
?
msg
.
(I18nConfig) -> String
run
(
I18nConfig
app_config
), "欢迎来到 MoonbitLang")
// Switch the language
I18nConfig
app_config
.
String
lang
= "en_US"
(a : String, b : String, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
?
msg
.
(I18nConfig) -> String
run
(
I18nConfig
app_config
), "Welcome To MoonbitLang")
}

And with that, I'd like to say: Welcome to MoonbitLang.

MoonBit Pearls Vol 4: Choreographic Programming with Moonchor

· 24 min read

Traditional distributed programming is notoriously painful, primarily because we need to reason about the implicit global behavior while writing the explicit local programs that actually run on each node. This fragmented implementation makes programs difficult to debug, understand, and deprives them of type-checking provided by programming languages. Choreographic Programming makes the global behavior explicit by allowing developers to write a single program that requires communication across multiple participants, which is then projected onto each participant to achieve global behavior.

Choreographic programming is implemented in two distinct approaches:

  • As a completely new programming language (e.g., Choral), where developers write Choral programs that will be compiled into participant-specific Java programs.
  • As a library (e.g., HasChor), leveraging Haskell's type system to ensure static properties of choreographic programming while seamlessly integrating with Haskell's ecosystem.

MoonBit's ​​functional programming features​​ and ​​powerful type system​​ make it particularly suitable for building choreographic programming libraries.

This article demonstrates the core concepts and basic usage of choreographic programming using MoonBit's moonchor library through several examples.

Guided Tour: Bookstore Application

Let's examine a bookstore application involving two roles: Buyer and Seller. The core logic is as follows:

  1. The buyer sends the desired book title to the seller.
  2. The seller queries the database and informs the buyer of the price.
  3. The buyer decides whether to purchase the book.
  4. If the buyer decides to purchase, the seller deducts the book from inventory and sends the estimated delivery date to the buyer.
  5. Otherwise, the interaction terminates.

Traditional Implementation

Here, we focus on core logic rather than implementation details, using send and recv functions to represent message passing. In the traditional approach, we need to develop two separate applications for buyer and seller. We assume the following helper functions and types exist:

fn 
() -> String
get_title
() ->
String
String
{
"Homotopy Type Theory" } fn
(title : String) -> Int
get_price
(
String
title
:
String
String
) ->
Int
Int
{
50 } fn
() -> Int
get_budget
() ->
Int
Int
{
100 } fn
(title : String) -> String
get_delivery_date
(
String
title
:
String
String
) ->
String
String
{
"2025-10-01" } enum Role {
Role
Buyer
Role
Seller
} async fn[T]
async (msg : T, target : Role) -> Unit
send
(
T
msg
:

type parameter T

T
,
Role
target
:
enum Role {
  Buyer
  Seller
}
Role
) ->
Unit
Unit
{
... } async fn[T]
async (source : Role) -> T
recv
(
Role
source
:
enum Role {
  Buyer
  Seller
}
Role
) ->

type parameter T

T
{
... }

The buyer's application:

async fn 
async () -> Unit
book_buyer
() ->
Unit
Unit
{
let
String
title
=
() -> String
get_title
()
async (msg : String, target : Role) -> Unit
send
(
String
title
,
Role
Seller
)
let
Int
price
=
async (source : Role) -> Int
recv
(
Role
Seller
)
if
Int
price
(self_ : Int, other : Int) -> Bool
<=
() -> Int
get_budget
() {
async (msg : Bool, target : Role) -> Unit
send
(true,
Role
Seller
)
let
Unit
delivery_date
=
async (source : Role) -> Unit
recv
(
Role
Seller
)
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The book will be delivered on: \{
Unit
delivery_date
}")
} else {
async (msg : Bool, target : Role) -> Unit
send
(false,
Role
Seller
)
} }

The seller's application:

async fn 
async () -> Unit
book_seller
() ->
Unit
Unit
{
let
String
title
=
async (source : Role) -> String
recv
(
Role
Buyer
)
let
Int
price
=
(title : String) -> Int
get_price
(
String
title
)
async (msg : Int, target : Role) -> Unit
send
(
Int
price
,
Role
Buyer
)
let
Bool
decision
=
async (source : Role) -> Bool
recv
(
Role
Buyer
)
if
Bool
decision
{
let
String
delivery_date
=
(title : String) -> String
get_delivery_date
(
String
title
)
async (msg : String, target : Role) -> Unit
send
(
String
delivery_date
,
Role
Buyer
)
} }

These two implementations suffer from at least the following issues:

  1. No type safety guarantee: Note that both send and recv are generic functions. Type safety is only ensured when the types of sending and receiving messages match; otherwise, runtime errors may occur during (de)serialization. The compiler cannot verify type safety at compile time because it cannot determine which send corresponds to which recv. Type safety is dependent on the developer not making mistakes.

  2. Potential deadlocks: If the developer accidentally forgets to write some send in the buyer's program, both buyer and seller may wait indefinitely for each other's messages and be stuck. Alternatively, if a buyer's connection is temporarily interrupted during network communication, the seller will keep waiting for the buyer's message. Both scenarios lead to deadlocks.

  3. Explicit synchronization required: To communicate the purchase decision, the buyer must explicitly send a Bool message. Subsequent coordination requires ensuring both buyer and seller follow the same execution path at the if price <= get_budget() and if decision branches - a property that cannot be guaranteed at compile time.

The root cause of these problems lies in splitting what should be a unified coordination logic into two separate implementations based on implementation requirements. Next, we'll examine how choreographic programming addresses these issues.

moonchor Implementation

With choreographic programming, we can write the buyer's and seller's logic in the same function, which then exhibits different behaviors with different parameters when called. We use moonchor's API to define the buyer and seller roles. In moonchor, roles are defined as trait Location. To provide better static properties, roles are not only values but also unique types that need to implement the Location trait.

struct Buyer {} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
)
impl @moonchor.Location for
struct Buyer {
} derive(Show, Hash)
Buyer
with
(_/0) -> String
name
(_) {
"buyer" } struct Seller {} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
)
impl @moonchor.Location for
struct Seller {
} derive(Show, Hash)
Seller
with
(_/0) -> String
name
(_) {
"seller" } let
Buyer
buyer
:
struct Buyer {
} derive(Show, Hash)
Buyer
=
struct Buyer {
} derive(Show, Hash)
Buyer
::{ }
let
Seller
seller
:
struct Seller {
} derive(Show, Hash)
Seller
=
struct Seller {
} derive(Show, Hash)
Seller
::{ }

Buyer and Seller types don't contain any fields. Types implementing the Location trait only need to provide a name method that returns a string as the role's identifier. This name method is critically important - it serves as the definitive identity marker for roles and provides a final verification mechanism when type checking cannot guarantee type safety. Never assign the same name to different roles, as this will lead to unexpected runtime errors. Later we'll examine how types provide a certain level of safety and why relying solely on types is insufficient.

Next, we define the core logic of the bookstore application, which is referred to as a choreography:

async fn 
async (ctx : ?) -> Unit
bookshop
(
?
ctx
: @moonchor.ChoreoContext) ->
Unit
Unit
{
let
Unit
title_at_buyer
=
?
ctx
.
(Buyer, (Unit) -> String) -> Unit
locally
(
Buyer
buyer
,
Unit
_unwrapper
=>
() -> String
get_title
())
let
Unit
title_at_seller
=
?
ctx
.
(Buyer, Seller, Unit) -> Unit
comm
(
Buyer
buyer
,
Seller
seller
,
Unit
title_at_buyer
)
let
Unit
price_at_seller
=
?
ctx
.
(Seller, (Unit) -> Int) -> Unit
locally
(
Seller
seller
, fn(
Unit
unwrapper
) {
let
String
title
=
Unit
unwrapper
.
(Unit) -> String
unwrap
(
Unit
title_at_seller
)
(title : String) -> Int
get_price
(
String
title
)
}) let
Unit
price_at_buyer
=
?
ctx
.
(Seller, Buyer, Unit) -> Unit
comm
(
Seller
seller
,
Buyer
buyer
,
Unit
price_at_seller
)
let
Unit
decision_at_buyer
=
?
ctx
.
(Buyer, (Unit) -> Bool) -> Unit
locally
(
Buyer
buyer
, fn(
Unit
unwrapper
) {
let
Int
price
=
Unit
unwrapper
.
(Unit) -> Int
unwrap
(
Unit
price_at_buyer
)
Int
price
(self_ : Int, other : Int) -> Bool
<
() -> Int
get_budget
()
}) if
?
ctx
.
(Buyer, Unit) -> Bool
broadcast
(
Buyer
buyer
,
Unit
decision_at_buyer
) {
let
Unit
delivery_date_at_seller
=
?
ctx
.
(Seller, (Unit) -> String) -> Unit
locally
(
Seller
seller
,
Unit
unwrapper
=>
(title : String) -> String
get_delivery_date
(
Unit
unwrapper
.
(Unit) -> String
unwrap
(
Unit
title_at_seller
),
)) let
Unit
delivery_date_at_buyer
=
?
ctx
.
(Seller, Buyer, Unit) -> Unit
comm
(
Seller
seller
,
Buyer
buyer
,
Unit
delivery_date_at_seller
,
)
?
ctx
.
(Buyer, (Unit) -> Unit) -> Unit
locally
(
Buyer
buyer
, fn(
Unit
unwrapper
) {
let
Unit
delivery_date
=
Unit
unwrapper
.
(Unit) -> Unit
unwrap
(
Unit
delivery_date_at_buyer
)
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The book will be delivered on \{
Unit
delivery_date
}")
}) |>
(t : Unit) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} }

This program is somewhat lengthy, so let's analyze it line by line.

The function parameter ctx: @moonchor.ChoreoContext is the context object provided by moonchor to applications, containing all interfaces for choreographic programming on the application side. First, we use ctx.locally to execute an operation get_title() that only needs to run at the buyer role. The first parameter of ctx.locally is the role. The second parameter is a closure where the content is the operation to execute, with the return value being wrapped as the return value of ctx.locally. Here, get_title() returns a String, while title_at_buyer has type @moonchor.Located[String, Buyer], indicating this value exists at the buyer role and cannot be used by other roles. If you attempt to use title_at_buyer at the seller role, the compiler will report an error stating that Buyer and Seller are not the same type.

Next, the buyer needs to send the book title to the seller, which we implement using ctx.comm. The first parameter of ctx.comm is the sender role, the second is the receiver role, and the third is the message to send. Here, the return value title_at_seller has type @moonchor.Located[String, Seller], indicating this value exists at the seller role. As you might have guessed, ctx.comm corresponds precisely to the send and recv operations. However, here type safety is guaranteed: ctx.comm is a generic function that ensures (1) the sent and received messages have the same type, and (2) the sender and receiver roles correspond to the type parameters of the parameter and return types, namely @moonchor.Located[T, Sender] and @moonchor.Located[T, Receiver].

Moving forward, the seller queries the database to get the book price. At this step we use the unwrapper parameter passed to the ctx.locally closure. This parameter is an object for unpacking Located types, whose type signature also includes a role type parameter. We can understand how it works by examining the signature of Unwrapper::unwrap: fn[T, L] Unwrapper::unwrap(_ : Unwrapper[L], v : Located[T, L]) -> T. This means in ctx.locally(buyer, unwrapper => ...), unwrapper has type Unwrapper[Buyer], while title_at_seller has type Located[String, Seller], so unwrapper.unwrap(title_at_seller) yields a result of type String. This explains why we can use title_at_seller in the closure but not title_at_buyer.

Knowledge of Choice

Explicit synchronization in the subsequent process is critical. We need a dedicated section to explain that. In choreographic programming, this synchronization is referred to as Knowledge of Choice. In the example above, the buyer needs to know whether to purchase the book, and the seller needs to know the buyer's decision. We use ctx.broadcast to implement this functionality.

The first parameter of ctx.broadcast is the sender's role, and the second parameter is the message to be shared with all other roles. In this example, both buyer and seller need to know the purchase decision, so the buyer broadcasts this decision decision_at_buyer to all participants (here only the seller) via ctx.broadcast. Interestingly, the return value of broadcast is a plain type rather than a Located type, meaning it can be used by all roles directly at the top level without needing to be unwrapped with unwrapper in locally. This allows us to use MoonBit's native if conditional statements for subsequent flows, ensuring both buyer and seller follow the same branch.

As the name suggests, ctx.broadcast serves to broadcast a value throughout the entire choreography. It can broadcast not just Bool types but any other type as well. Its results can be applied not only to if conditions but also to while loops or any other scenarios requiring common knowledge.

Launch Code

How does such a choreography run? moonchor provides the run_choreo function to launch a choreography. Currently, due to MoonBit's multi-backend feature, providing stable, portable TCP servers and cross-process communication interfaces presents challenges. Therefore, we'll use coroutines and channels to explore the actual execution process of choreographies. The complete launch code is as follows:

test "Blog: bookshop" {
  let 
Unit
backend
=
(Array[Buyer]) -> Unit
@moonchor.make_local_backend
([
Buyer
buyer
,
Seller
seller
])
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Buyer) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
bookshop
,
Buyer
buyer
) )
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Seller) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
bookshop
,
Seller
seller
) )
}

The above code launches two coroutines that execute the same choreography at the buyer and seller respectively. This can also be understood as the bookshop function being projected (also called EPP, endpoint projection) into two completely different versions: the "buyer version" and "seller version". In this example, the first parameter of run_choreo is a Backend type object that provides the underlying communication mechanism required for choreographic programming. We use the make_local_backend function to create a local backend (not to be confused with MoonBit's multi-backend mentioned earlier), which can run in local processes using the channel API provided by peter-jerry-ye/async/channel as the communication foundation. In the future, moonchor will provide more backend implementations, such as HTTP.

API and Partial Principles

We have gained a preliminary understanding of choreographic programming and moonchor. Next, we will formally introduce the APIs we've used along with some unused ones, while explaining some of their underlying principles.

Roles

In moonchor, we define roles by implementing the Location trait. The trait is declared as follows:

pub(open) trait 
trait Location {
  name(Self) -> String
}
Location
:
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
+
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
{
(Self) -> String
name
(

type parameter Self

Self
) ->
String
String
}

The Location trait object implements Eq:

impl 
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
for &
trait Location {
  name(Self) -> String
}
Location
with
(self : &Location, other : &Location) -> Bool
op_equal
(
&Location
self
,
&Location
other
) {
&Location
self
.
(&Location) -> String
name
()
(self : String, other : String) -> Bool

Tests whether two strings are equal by comparing their characters.

Parameters:

  • self : The first string to compare.
  • other : The second string to compare.

Returns true if both strings contain exactly the same sequence of characters, false otherwise.

Example:

  let str1 = "hello"
  let str2 = "hello"
  let str3 = "world"
  inspect(str1 == str2, content="true")
  inspect(str1 == str3, content="false")
==
&Location
other
.
(&Location) -> String
name
()
}

If two roles' name methods return the same string, they are considered the same role; otherwise, they are not. When determining whether a value belongs to a certain role, the name method serves as the definitive arbiter. This means values can have the same type but actually represent different roles. This feature is particularly important when handling dynamically generated roles. For example, in the bookstore scenario, there might be multiple buyers, and the seller needs to handle multiple buyer requests simultaneously, dynamically generating buyer roles based on server connections. In this case, the buyer type would be defined as:

struct DynamicBuyer {
  
String
id
:
String
String
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
)
impl @moonchor.Location for
struct DynamicBuyer {
  id: String
} derive(Show, Hash)
DynamicBuyer
with
(Unit) -> String
name
(
Unit
self
) {
"buyer-\{
Unit
self
.
String
id
}"
}

Located Values

Since values located at different roles may coexist in a choreography, we need a way to distinguish which role each value is located at. In moonchor, this is represented by the Located[T, L] type, indicating a value of type T located at role L.

type Located[T, L]

type Unwrapper[L]

Located Values are constructed via ChoreoContext::locally or ChoreoContext::comm. Both functions return a Located value.

To use a Located Value, we employ the unwrap method of the Unwrapper object. These concepts have already been demonstrated in the bookstore application example and won't be elaborated further here.

Local Computation

The most common API we've seen in examples is ChoreoContext::locally, which is used to perform a local computation at a specific role. Its signature is as follows:

type ChoreoContext

fn[T, L : 
trait Location {
  name(Self) -> String
}
Location
]
(self : ChoreoContext, location : L, computation : (Unwrapper[L]) -> T) -> Located[T, L]
locally
(
ChoreoContext
self
:
type ChoreoContext
ChoreoContext
,
L
location
:

type parameter L

L
,
(Unwrapper[L]) -> T
computation
: (
type Unwrapper[L]
Unwrapper
[

type parameter L

L
]) ->

type parameter T

T
) ->
type Located[T, L]
Located
[

type parameter T

T
,

type parameter L

L
] {
... }

This API executes the computation closure at the specified location role and wraps the result as a Located Value. The computation closure takes a single parameter - an unwrapper object of type Unwrapper[L], which is used within the closure to unpack Located[T, L] values into T types. This API binds computation results to specific roles, ensuring values can only be used at their designated roles. Attempting to use a value at another role or process values from different roles with this unwrapper will trigger compiler errors.

Communication

The ChoreoContext::comm API handles value transmission between roles. Its declaration is as follows:

trait 
trait Message {
}
Message
:
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
+
trait @json.FromJson {
  from_json(Json, @json.JsonPath) -> Self raise @json.JsonDecodeError
}

Trait for types that can be converted from Json

@json.FromJson
{}
async fn[T :
trait Message {
}
Message
, From :
trait Location {
  name(Self) -> String
}
Location
, To :
trait Location {
  name(Self) -> String
}
Location
]
async (self : ChoreoContext, from : From, to : To, value : Located[T, From]) -> Located[T, To]
comm
(
ChoreoContext
self
:
type ChoreoContext
ChoreoContext
,
From
from
:

type parameter From

From
,
To
to
:

type parameter To

To
,
Located[T, From]
value
:
type Located[T, L]
Located
[

type parameter T

T
,

type parameter From

From
]
) ->
type Located[T, L]
Located
[

type parameter T

T
,

type parameter To

To
] {
... }

Sending and receiving typically require serialization and deserialization. In moonchor's current implementation, Json is the message carrier for convenience. In the future, byte streams may be adopted as a more efficient and universal carrier.

ChoreoContext::comm has three type parameters: the message type to send, plus the sender and receiver role types From and To. These two role types correspond exactly to the method's from parameter, to parameter, as well as the value parameter and return value type. This ensures type safety during message (de)serialization between sender and receiver, and guarantees send/receive operations are properly paired, preventing accidental deadlocks.

Broadcast

When needing to share a value among multiple roles, we use the ChoreoContext::broadcast API to have a role broadcast a value to all other roles. Its signature is as follows:

async fn[T : 
trait Message {
}
Message
, L :
trait Location {
  name(Self) -> String
}
Location
]
type ChoreoContext
ChoreoContext
::
async (self : ChoreoContext, loc : L, value : Located[T, L]) -> T
broadcast
(
ChoreoContext
self
:
type ChoreoContext
ChoreoContext
,
L
loc
:

type parameter L

L
,
Located[T, L]
value
:
type Located[T, L]
Located
[

type parameter T

T
,

type parameter L

L
]
) ->

type parameter T

T
{
... }

The broadcast API is similar to the communication API, with two key differences:

  1. Broadcast doesn't require specifying receiver roles - it defaults to all roles in the choreography;
  2. The broadcast return value isn't a Located Value, but rather the message's type.

These characteristics reveal broadcast's purpose: enabling all roles to access the same value, allowing operations on this value at the choreography's top level rather than being confined within ChoreoContext::locally. For example, in the bookstore case, both buyer and seller need consensus on the purchase decision to ensure subsequent processes remain synchronized.

Backend and Execution

The API for running a choreography is as follows:

type Backend

typealias async (
type ChoreoContext
ChoreoContext
) ->

type parameter T

T
as Choreo[T]
async fn[T, L :
trait Location {
  name(Self) -> String
}
Location
]
async (backend : Backend, choreography : async (ChoreoContext) -> T, role : L) -> T
run_choreo
(
Backend
backend
:
type Backend
Backend
,
async (ChoreoContext) -> T
choreography
: Choreo[

type parameter T

T
],
L
role
:

type parameter L

L
) ->

type parameter T

T
{
... }

It takes three parameters: a backend, a user-written choreography, and the role to execute. The backend contains the concrete implementation of the communication mechanism, while the execution role specifies where this choreography should run. For example, in previous cases, the buyer's program needs to pass a value of type Buyer here, while the seller needs to pass a value of type Seller.

moonchor provides a local backend based on coroutines and channels:

fn 
(locations : Array[&Location]) -> Backend
make_local_backend
(
Array[&Location]
locations
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[&
trait Location {
  name(Self) -> String
}
Location
]) ->
type Backend
Backend
{
... }

This function establishes communication channels between all roles specified in the parameters, providing concrete communication implementations - namely the send and recv methods. The local backend can only be used for monolithic concurrent programs rather than true distributed applications. Well, the backend is pluggable: With other backends implemented based on stable network communication APIs, moonchor can easily be used to build distributed programs.

(Optional Reading) Case Study: Multi-Replica KVStore

In this section, we'll explore a more complicated case study - implementing a multi-replica KVStore using moonchor. We'll still only use moonchor's core APIs while fully leveraging MoonBit's generics and first-class functions. Our goal is to explore how MoonBit's powerful expressiveness can enhance choreographic programming functionalities.

Basic Implementation

First, let's prepare by defining two roles: Client and Server:

struct Server {} derive(
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
struct Client {} derive(
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
impl @moonchor.Location for
struct Server {
} derive(Hash, Show)
Server
with
(_/0) -> String
name
(_) {
"server" } impl @moonchor.Location for
struct Client {
} derive(Hash, Show)
Client
with
(_/0) -> String
name
(_) {
"client" } let
Server
server
:
struct Server {
} derive(Hash, Show)
Server
=
struct Server {
} derive(Hash, Show)
Server
::{ }
let
Client
client
:
struct Client {
} derive(Hash, Show)
Client
=
struct Client {
} derive(Hash, Show)
Client
::{ }

To implement a KVStore like Redis, we need to implement two basic interfaces: get and put (corresponding to Redis's get and set). The simplest implementation uses a Map data structure to store key-value pairs:

struct ServerState {
  
Map[String, Int]
db
:
type Map[K, V]

Mutable linked hash map that maintains the order of insertion, not thread safe.

Example

  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map
[
String
String
,
Int
Int
]
} fn
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
() ->
struct ServerState {
  db: Map[String, Int]
}
ServerState
{
{
Map[String, Int]
db
: {} }
}

For the KVStore, get and put requests are sent by clients over the network. Before receiving requests, we don't know their specific content. Therefore, we need to define a request type Request that includes the request type and parameters:

enum Request {
  
(String) -> Request
Get
(
String
String
)
(String, Int) -> Request
Put
(
String
String
,
Int
Int
)
} derive(
trait ToJson {
  to_json(Self) -> Json
}

Trait for types that can be converted to Json

ToJson
,
trait @json.FromJson {
  from_json(Json, @json.JsonPath) -> Self raise @json.JsonDecodeError
}

Trait for types that can be converted from Json

FromJson
)

For convenience, our KVStore only supports String keys and Int values. Next, we define a Response type to represent the server's response to requests:

typealias 
Int
Int
? as Response

The response is an optional integer. For Put requests, the response is None; for Get requests, the response is the corresponding value wrapped in Some, or None if the key doesn't exist.

fn 
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
:
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
Request
request
:
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
) ->
enum Option[A] {
  None
  Some(A)
}
Response
{
match
Request
request
{
Request::
(String) -> Request
Get
(
String
key
) =>
ServerState
state
.
Map[String, Int]
db
.
(self : Map[String, Int], key : String) -> Int?

Retrieves the value associated with a given key in the hash map.

Parameters:

  • self : The hash map to search in.
  • key : The key to look up in the map.

Returns Some(value) if the key exists in the map, None otherwise.

Example:

  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get
(
String
key
)
Request::
(String, Int) -> Request
Put
(
String
key
,
Int
value
) => {
ServerState
state
.
Map[String, Int]
db
(Map[String, Int], String, Int) -> Unit

Sets the value associated with a key in the hash map. If the key already exists, updates its value; otherwise, adds a new key-value pair. This function is automatically called when using the index assignment syntax map[key] = value.

Parameters:

  • map : The hash map to modify.
  • key : The key to associate with the value. Must implement Hash and Eq traits.
  • value : The value to associate with the key.

Example:

  let map : Map[String, Int] = Map::new()
  map["key"] = 42
  inspect(map.get("key"), content="Some(42)")
[
key] =
Int
value
Int?
None
} } }

Our goal is to define two functions, put and get, to simulate the client's request initiation process. Their respective tasks are:

  1. Generate the request at the Client, wrapping the key-value pair;
  2. Send the request to the Server;
  3. The Server processes the request using the handle_request function;
  4. Send the response back to the Client.

As we can see, the logic of put and get functions is similar. We can abstract the three processes (2, 3, and 4) into a single function called access_server.

async fn 
async (ctx : ?, state_at_server : ?, key : String, value : Int) -> Unit
put_v1
(
?
ctx
: @moonchor.ChoreoContext,
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
],
String
key
:
String
String
,
Int
value
:
Int
Int
) ->
Unit
Unit
{
let
?
request
=
?
ctx
.
(Client, (Unit) -> Request) -> ?
locally
(
Client
client
,
Unit
_unwrapper
=> Request::
(String, Int) -> Request
Put
(
String
key
,
Int
value
))
async (ctx : ?, request : ?, state_at_server : ?) -> ?
access_server_v1
(
?
ctx
,
?
request
,
?
state_at_server
) |>
(t : ?) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} async fn
async (ctx : ?, state_at_server : ?, key : String) -> ?
get_v1
(
?
ctx
: @moonchor.ChoreoContext,
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
],
String
key
:
String
String
) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Client {
} derive(Hash, Show)
Client
] {
let
?
request
=
?
ctx
.
(Client, (Unit) -> Request) -> ?
locally
(
Client
client
,
Unit
_unwrapper
=> Request::
(String) -> Request
Get
(
String
key
))
async (ctx : ?, request : ?, state_at_server : ?) -> ?
access_server_v1
(
?
ctx
,
?
request
,
?
state_at_server
)
} async fn
async (ctx : ?, request : ?, state_at_server : ?) -> ?
access_server_v1
(
?
ctx
: @moonchor.ChoreoContext,
?
request
: @moonchor.Located[
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
,
struct Client {
} derive(Hash, Show)
Client
],
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
]
) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Client {
} derive(Hash, Show)
Client
] {
let
Unit
request_at_server
=
?
ctx
.
(Client, Server, ?) -> Unit
comm
(
Client
client
,
Server
server
,
?
request
)
let
Unit
response
=
?
ctx
.
(Server, (Unit) -> Int?) -> Unit
locally
(
Server
server
, fn(
Unit
unwrapper
) {
let
Request
request
=
Unit
unwrapper
.
(Unit) -> Request
unwrap
(
Unit
request_at_server
)
let
ServerState
state
=
Unit
unwrapper
.
(?) -> ServerState
unwrap
(
?
state_at_server
)
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
,
Request
request
)
})
?
ctx
.
(Server, Client, Unit) -> ?
comm
(
Server
server
,
Client
client
,
Unit
response
)
}

With this, our KVStore implementation is complete. We can write a simple choreography to test it:

async fn 
async (ctx : ?) -> Unit
kvstore_v1
(
?
ctx
: @moonchor.ChoreoContext) ->
Unit
Unit
{
let
?
state_at_server
=
?
ctx
.
(Server, (Unit) -> ServerState) -> ?
locally
(
Server
server
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
async (ctx : ?, state_at_server : ?, key : String, value : Int) -> Unit
put_v1
(
?
ctx
,
?
state_at_server
, "key1", 42)
async (ctx : ?, state_at_server : ?, key : String, value : Int) -> Unit
put_v1
(
?
ctx
,
?
state_at_server
, "key2", 41)
let
?
v1_at_client
=
async (ctx : ?, state_at_server : ?, key : String) -> ?
get_v1
(
?
ctx
,
?
state_at_server
, "key1")
let
?
v2_at_client
=
async (ctx : ?, state_at_server : ?, key : String) -> ?
get_v1
(
?
ctx
,
?
state_at_server
, "key2")
?
ctx
.
(Client, (Unit) -> Unit) -> Unit
locally
(
Client
client
, fn(
Unit
unwrapper
) {
let
Int
v1
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v1_at_client
).
() -> Int
unwrap
()
let
Int
v2
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v2_at_client
).
() -> Int
unwrap
()
if
Int
v1
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
v2
(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
==
83 {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The server is working correctly")
} else {
() -> Unit
panic
()
} }) |>
(t : Unit) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} test "kvstore v1" { let
Unit
backend
=
(Array[Server]) -> Unit
@moonchor.make_local_backend
([
Server
server
,
Client
client
])
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Server) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v1
,
Server
server
))
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Client) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v1
,
Client
client
))
}

This program stores two numbers 42 and 41 under "key1" and "key2" respectively, then retrieves these values from the server and verifies their sum equals 83. If any request returns None or the calculation result isn't 83, the program will panic.

Double Replication

Now, let's enhance the KVStore with fault tolerance. The simplest approach is to create a backup replica that maintains identical data to the primary replica, while performing consistency checks during Get requests.

We'll create a new role for the backup replica:

struct Backup {} derive(
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
impl @moonchor.Location for
struct Backup {
} derive(Hash, Show)
Backup
with
(_/0) -> String
name
(_) {
"backup" } let
Backup
backup
:
struct Backup {
} derive(Hash, Show)
Backup
=
struct Backup {
} derive(Hash, Show)
Backup
::{ }

Define a function to check consistency: this function verifies whether all replica responses are identical, and panics if inconsistencies are found.

fn 
(responses : Array[Int?]) -> Unit
check_consistency
(
Array[Int?]
responses
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Option[A] {
  None
  Some(A)
}
Response
]) ->
Unit
Unit
{
match
Array[Int?]
responses
.
(self : Array[Int?]) -> Int??

Removes the last element from a array and returns it, or None if it is empty.

Example

  let v = [1, 2, 3]
  assert_eq(v.pop(), Some(3))
  assert_eq(v, [1, 2])
pop
() {
Int??
None
=> return
(Int?) -> Int??
Some
(
Int?
f
) =>
for
Int?
res
in
Array[Int?]
responses
{
if
Int?
res
(x : Int?, y : Int?) -> Bool
!=
Int?
f
{
() -> Unit
panic
()
} } } }

Most other components remain unchanged. We only need to add replica handling in the access_server function. The new access_server_v2 logic works as follows: after receiving a request, the Server forwards it to Backup; then Server and Backup process the request separately; after processing, Backup sends the response back to Server, where Server performs consistency checks on both results.

async fn 
async (ctx : ?, state_at_server : ?, state_at_backup : ?, key : String, value : Int) -> Unit
put_v2
(
?
ctx
: @moonchor.ChoreoContext,
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
],
?
state_at_backup
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Backup {
} derive(Hash, Show)
Backup
],
String
key
:
String
String
,
Int
value
:
Int
Int
) ->
Unit
Unit
{
let
?
request
=
?
ctx
.
(Client, (Unit) -> Request) -> ?
locally
(
Client
client
,
Unit
_unwrapper
=> Request::
(String, Int) -> Request
Put
(
String
key
,
Int
value
))
async (ctx : ?, request : ?, state_at_server : ?, state_at_backup : ?) -> ?
access_server_v2
(
?
ctx
,
?
request
,
?
state_at_server
,
?
state_at_backup
) |>
(t : ?) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} async fn
async (ctx : ?, state_at_server : ?, state_at_backup : ?, key : String) -> ?
get_v2
(
?
ctx
: @moonchor.ChoreoContext,
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
],
?
state_at_backup
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Backup {
} derive(Hash, Show)
Backup
],
String
key
:
String
String
) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Client {
} derive(Hash, Show)
Client
] {
let
?
request
=
?
ctx
.
(Client, (Unit) -> Request) -> ?
locally
(
Client
client
,
Unit
_unwrapper
=> Request::
(String) -> Request
Get
(
String
key
))
async (ctx : ?, request : ?, state_at_server : ?, state_at_backup : ?) -> ?
access_server_v2
(
?
ctx
,
?
request
,
?
state_at_server
,
?
state_at_backup
)
} async fn
async (ctx : ?, request : ?, state_at_server : ?, state_at_backup : ?) -> ?
access_server_v2
(
?
ctx
: @moonchor.ChoreoContext,
?
request
: @moonchor.Located[
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
,
struct Client {
} derive(Hash, Show)
Client
],
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
],
?
state_at_backup
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Backup {
} derive(Hash, Show)
Backup
]
) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Client {
} derive(Hash, Show)
Client
] {
let
Unit
request_at_server
=
?
ctx
.
(Client, Server, ?) -> Unit
comm
(
Client
client
,
Server
server
,
?
request
)
let
Unit
request_at_backup
=
?
ctx
.
(Server, Backup, Unit) -> Unit
comm
(
Server
server
,
Backup
backup
,
Unit
request_at_server
)
let
Unit
response_at_backup
=
?
ctx
.
(Backup, (Unit) -> Int?) -> Unit
locally
(
Backup
backup
, fn(
Unit
unwrapper
) {
let
Request
request
=
Unit
unwrapper
.
(Unit) -> Request
unwrap
(
Unit
request_at_backup
)
let
ServerState
state
=
Unit
unwrapper
.
(?) -> ServerState
unwrap
(
?
state_at_backup
)
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
,
Request
request
)
}) let
Unit
backup_response_at_server
=
?
ctx
.
(Backup, Server, Unit) -> Unit
comm
(
Backup
backup
,
Server
server
,
Unit
response_at_backup
)
let
Unit
response_at_server
=
?
ctx
.
(Server, (Unit) -> Int?) -> Unit
locally
(
Server
server
, fn(
Unit
unwrapper
) {
let
Request
request
=
Unit
unwrapper
.
(Unit) -> Request
unwrap
(
Unit
request_at_server
)
let
ServerState
state
=
Unit
unwrapper
.
(?) -> ServerState
unwrap
(
?
state_at_server
)
let
Int?
response
=
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
,
Request
request
)
let
Int?
backup_response
=
Unit
unwrapper
.
(Unit) -> Int?
unwrap
(
Unit
backup_response_at_server
)
(responses : Array[Int?]) -> Unit
check_consistency
([
Int?
response
,
Int?
backup_response
])
Int?
response
})
?
ctx
.
(Server, Client, Unit) -> ?
comm
(
Server
server
,
Client
client
,
Unit
response_at_server
)
}

As before, we can write a simple choreography to test it:

async fn 
async (ctx : ?) -> Unit
kvstore_v2
(
?
ctx
: @moonchor.ChoreoContext) ->
Unit
Unit
{
let
?
state_at_server
=
?
ctx
.
(Server, (Unit) -> ServerState) -> ?
locally
(
Server
server
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
let
?
state_at_backup
=
?
ctx
.
(Backup, (Unit) -> ServerState) -> ?
locally
(
Backup
backup
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
async (ctx : ?, state_at_server : ?, state_at_backup : ?, key : String, value : Int) -> Unit
put_v2
(
?
ctx
,
?
state_at_server
,
?
state_at_backup
, "key1", 42)
async (ctx : ?, state_at_server : ?, state_at_backup : ?, key : String, value : Int) -> Unit
put_v2
(
?
ctx
,
?
state_at_server
,
?
state_at_backup
, "key2", 41)
let
?
v1_at_client
=
async (ctx : ?, state_at_server : ?, state_at_backup : ?, key : String) -> ?
get_v2
(
?
ctx
,
?
state_at_server
,
?
state_at_backup
, "key1")
let
?
v2_at_client
=
async (ctx : ?, state_at_server : ?, state_at_backup : ?, key : String) -> ?
get_v2
(
?
ctx
,
?
state_at_server
,
?
state_at_backup
, "key2")
?
ctx
.
(Client, (Unit) -> Unit) -> Unit
locally
(
Client
client
, fn(
Unit
unwrapper
) {
let
Int
v1
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v1_at_client
).
() -> Int
unwrap
()
let
Int
v2
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v2_at_client
).
() -> Int
unwrap
()
if
Int
v1
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
v2
(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
==
83 {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The server is working correctly")
} else {
() -> Unit
panic
()
} }) |>
(t : Unit) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} test "kvstore 2.0" { let
Unit
backend
=
(Array[Server]) -> Unit
@moonchor.make_local_backend
([
Server
server
,
Client
client
,
Backup
backup
])
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Server) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v2
,
Server
server
) )
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Client) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v2
,
Client
client
) )
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Backup) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v2
,
Backup
backup
) )
}

Abstracting Replication Strategy with Higher-Order Functions

During the double replication implementation, we encountered coupled code where server request processing, backup requests, and consistency checking were intertwined.

Using MoonBit's higher-order functions, we can abstract the replication strategy away from the concrete processing logic. Let's analyze what constitutes a replication strategy. It should encapsulate how the server processes requests using replicas after receiving them. The key insight is that the replication strategy itself is request-agnostic and should be decoupled from the actual request handling. This makes the strategy swappable, allowing easy switching between different strategies or implementing new ones in the future.

Of course, real-world replication strategies are far more complicated and often resist clean separation. For this example, we simplify the problem to focus on moonchor's programming capabilities, directly defining the replication strategy as a function determining how the server processes requests after receiving them. We can define it with a type alias:

typealias async (@moonchor.ChoreoContext, @moonchor.Located[
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
,
struct Server {
} derive(Hash, Show)
Server
]) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Server {
} derive(Hash, Show)
Server
,
] as ReplicationStrategy

Now we can simplify the access_server implementation by passing the strategy as a parameter:

async fn 
async (ctx : ?, request : ?, strategy : async (?, ?) -> ?) -> ?
access_server_v3
(
?
ctx
: @moonchor.ChoreoContext,
?
request
: @moonchor.Located[
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
,
struct Client {
} derive(Hash, Show)
Client
],
async (?, ?) -> ?
strategy
: ReplicationStrategy
) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Client {
} derive(Hash, Show)
Client
] {
let
?
request_at_server
=
?
ctx
.
(Client, Server, ?) -> ?
comm
(
Client
client
,
Server
server
,
?
request
)
let
?
response
=
async (?, ?) -> ?
strategy
(
?
ctx
,
?
request_at_server
)
?
ctx
.
(Server, Client, ?) -> ?
comm
(
Server
server
,
Client
client
,
?
response
)
} async fn
async (ctx : ?, strategy : async (?, ?) -> ?, key : String, value : Int) -> Unit
put_v3
(
?
ctx
: @moonchor.ChoreoContext,
async (?, ?) -> ?
strategy
: ReplicationStrategy,
String
key
:
String
String
,
Int
value
:
Int
Int
) ->
Unit
Unit
{
let
?
request
=
?
ctx
.
(Client, (Unit) -> Request) -> ?
locally
(
Client
client
,
Unit
_unwrapper
=> Request::
(String, Int) -> Request
Put
(
String
key
,
Int
value
))
async (ctx : ?, request : ?, strategy : async (?, ?) -> ?) -> ?
access_server_v3
(
?
ctx
,
?
request
,
async (?, ?) -> ?
strategy
) |>
(t : ?) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} async fn
async (ctx : ?, strategy : async (?, ?) -> ?, key : String) -> ?
get_v3
(
?
ctx
: @moonchor.ChoreoContext,
async (?, ?) -> ?
strategy
: ReplicationStrategy,
String
key
:
String
String
) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Client {
} derive(Hash, Show)
Client
] {
let
?
request
=
?
ctx
.
(Client, (Unit) -> Request) -> ?
locally
(
Client
client
,
Unit
_unwrapper
=> Request::
(String) -> Request
Get
(
String
key
))
async (ctx : ?, request : ?, strategy : async (?, ?) -> ?) -> ?
access_server_v3
(
?
ctx
,
?
request
,
async (?, ?) -> ?
strategy
)
}

This successfully abstracts the replication strategy from the request handling logic. Below, we reimplement the double replication strategy:

async fn 
async (state_at_server : ?, state_at_backup : ?) -> (async (?, ?) -> ?)
double_replication_strategy
(
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
],
?
state_at_backup
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Backup {
} derive(Hash, Show)
Backup
],
) -> ReplicationStrategy { fn(
?
ctx
: @moonchor.ChoreoContext,
?
request_at_server
: @moonchor.Located[
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
,
struct Server {
} derive(Hash, Show)
Server
]
) { let
Unit
request_at_backup
=
?
ctx
.
(Server, Backup, ?) -> Unit
comm
(
Server
server
,
Backup
backup
,
?
request_at_server
)
let
Unit
response_at_backup
=
?
ctx
.
(Backup, (Unit) -> Int?) -> Unit
locally
(
Backup
backup
, fn(
Unit
unwrapper
) {
let
Request
request
=
Unit
unwrapper
.
(Unit) -> Request
unwrap
(
Unit
request_at_backup
)
let
ServerState
state
=
Unit
unwrapper
.
(?) -> ServerState
unwrap
(
?
state_at_backup
)
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
,
Request
request
)
}) let
Unit
backup_response
=
?
ctx
.
(Backup, Server, Unit) -> Unit
comm
(
Backup
backup
,
Server
server
,
Unit
response_at_backup
)
?
ctx
.
(Server, (Unit) -> Int?) -> ?
locally
(
Server
server
, fn(
Unit
unwrapper
) {
let
Request
request
=
Unit
unwrapper
.
(?) -> Request
unwrap
(
?
request_at_server
)
let
ServerState
state
=
Unit
unwrapper
.
(?) -> ServerState
unwrap
(
?
state_at_server
)
let
Int?
res
=
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
,
Request
request
)
(responses : Array[Int?]) -> Unit
check_consistency
([
Unit
unwrapper
.
(Unit) -> Int?
unwrap
(
Unit
backup_response
),
Int?
res
])
Int?
res
}) } }

Note the function signature of double_replication_strategy - it returns a function of type ReplicationStrategy. Given two parameters, it constructs a new replication strategy. This demonstrates using higher-order functions to abstract replication strategies, known as higher-order choreography in choreographic programming.

We can test it with a simple choreography:

async fn 
async (ctx : ?) -> Unit
kvstore_v3
(
?
ctx
: @moonchor.ChoreoContext) ->
Unit
Unit
{
let
?
state_at_server
=
?
ctx
.
(Server, (Unit) -> ServerState) -> ?
locally
(
Server
server
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
let
?
state_at_backup
=
?
ctx
.
(Backup, (Unit) -> ServerState) -> ?
locally
(
Backup
backup
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
let
async (?, ?) -> ?
strategy
=
async (state_at_server : ?, state_at_backup : ?) -> (async (?, ?) -> ?)
double_replication_strategy
(
?
state_at_server
,
?
state_at_backup
)
async (ctx : ?, strategy : async (?, ?) -> ?, key : String, value : Int) -> Unit
put_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key1", 42)
async (ctx : ?, strategy : async (?, ?) -> ?, key : String, value : Int) -> Unit
put_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key2", 41)
let
?
v1_at_client
=
async (ctx : ?, strategy : async (?, ?) -> ?, key : String) -> ?
get_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key1")
let
?
v2_at_client
=
async (ctx : ?, strategy : async (?, ?) -> ?, key : String) -> ?
get_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key2")
?
ctx
.
(Client, (Unit) -> Unit) -> Unit
locally
(
Client
client
, fn(
Unit
unwrapper
) {
let
Int
v1
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v1_at_client
).
() -> Int
unwrap
()
let
Int
v2
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v2_at_client
).
() -> Int
unwrap
()
if
Int
v1
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
v2
(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
==
83 {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The server is working correctly")
} else {
() -> Unit
panic
()
} }) |>
(t : Unit) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} test "kvstore 3.0" { let
Unit
backend
=
(Array[Server]) -> Unit
@moonchor.make_local_backend
([
Server
server
,
Client
client
,
Backup
backup
])
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Server) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v2
,
Server
server
))
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Client) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v2
,
Client
client
))
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Backup) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v2
,
Backup
backup
))
}

Implementing Role-Polymorphism Through Parametric Polymorphism

To implement new replication strategies like triple replication, we need to define two new Backup types for differentiation:

struct Backup1 {} derive(
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
impl @moonchor.Location for
struct Backup1 {
} derive(Hash, Show)
Backup1
with
(_/0) -> String
name
(_) {
"backup1" } let
Backup1
backup1
:
struct Backup1 {
} derive(Hash, Show)
Backup1
=
struct Backup1 {
} derive(Hash, Show)
Backup1
::{}
struct Backup2 {} derive(
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
impl @moonchor.Location for
struct Backup2 {
} derive(Hash, Show)
Backup2
with
(_/0) -> String
name
(_) {
"backup2" } let
Backup2
backup2
:
struct Backup2 {
} derive(Hash, Show)
Backup2
=
struct Backup2 {
} derive(Hash, Show)
Backup2
::{}

Next, we need to modify the core logic of access_server. An immediate problem emerges: to have both Backup1 and Backup2 process the request and return responses, we'd need to repeat these statements: let request = unwrapper.unwrap(request_at_backup); let state = unwrapper.unwrap(state_at_backup); handle_request(state, request). Code duplication is a code smell that should be abstracted away. Here, moonchor's "roles as types" advantage becomes apparent - we can use MoonBit's parametric polymorphism to abstract the backup processing logic into a polymorphic function do_backup, which takes a role type parameter B representing the backup role:

async fn[B : @moonchor.Location] 
async (ctx : ?, request_at_server : ?, backup : B, state_at_backup : ?) -> ?
do_backup
(
?
ctx
: @moonchor.ChoreoContext,
?
request_at_server
: @moonchor.Located[
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
,
struct Server {
} derive(Hash, Show)
Server
],
B
backup
:

type parameter B

B
,
?
state_at_backup
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,

type parameter B

B
]
) -> @moonchor.Located[
enum Option[A] {
  None
  Some(A)
}
Response
,
struct Server {
} derive(Hash, Show)
Server
] {
let
Unit
request_at_backup
=
?
ctx
.
(Server, B, ?) -> Unit
comm
(
Server
server
,
B
backup
,
?
request_at_server
)
let
Unit
response_at_backup
=
?
ctx
.
(B, (Unit) -> Int?) -> Unit
locally
(
B
backup
, fn(
Unit
unwrapper
) {
let
Request
request
=
Unit
unwrapper
.
(Unit) -> Request
unwrap
(
Unit
request_at_backup
)
let
ServerState
state
=
Unit
unwrapper
.
(?) -> ServerState
unwrap
(
?
state_at_backup
)
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
,
Request
request
)
})
?
ctx
.
(B, Server, Unit) -> ?
comm
(
B
backup
,
Server
server
,
Unit
response_at_backup
)
}

This enables us to freely implement either double or triple replication strategies. For the triple replication strategy, we simply need to call do_backup twice within the function returned by triple_replication_strategy:

async fn 
async (state_at_server : ?, state_at_backup1 : ?, state_at_backup2 : ?) -> (async (?, ?) -> ?)
triple_replication_strategy
(
?
state_at_server
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Server {
} derive(Hash, Show)
Server
],
?
state_at_backup1
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Backup1 {
} derive(Hash, Show)
Backup1
],
?
state_at_backup2
: @moonchor.Located[
struct ServerState {
  db: Map[String, Int]
}
ServerState
,
struct Backup2 {
} derive(Hash, Show)
Backup2
]
) -> ReplicationStrategy { fn(
?
ctx
: @moonchor.ChoreoContext,
?
request_at_server
: @moonchor.Located[
enum Request {
  Get(String)
  Put(String, Int)
} derive(ToJson, @json.FromJson)
Request
,
struct Server {
} derive(Hash, Show)
Server
]
) { let
?
backup_response1
=
async (ctx : ?, request_at_server : ?, backup : Backup1, state_at_backup : ?) -> ?
do_backup
(
?
ctx
,
?
request_at_server
,
Backup1
backup1
,
?
state_at_backup1
,
) let
?
backup_response2
=
async (ctx : ?, request_at_server : ?, backup : Backup2, state_at_backup : ?) -> ?
do_backup
(
?
ctx
,
?
request_at_server
,
Backup2
backup2
,
?
state_at_backup2
,
)
?
ctx
.
(Server, (Unit) -> Int?) -> ?
locally
(
Server
server
, fn(
Unit
unwrapper
) {
let
Request
request
=
Unit
unwrapper
.
(?) -> Request
unwrap
(
?
request_at_server
)
let
ServerState
state
=
Unit
unwrapper
.
(?) -> ServerState
unwrap
(
?
state_at_server
)
let
Int?
res
=
(state : ServerState, request : Request) -> Int?
handle_request
(
ServerState
state
,
Request
request
)
(responses : Array[Int?]) -> Unit
check_consistency
([
Unit
unwrapper
.
(?) -> Int?
unwrap
(
?
backup_response1
),
Unit
unwrapper
.
(?) -> Int?
unwrap
(
?
backup_response2
),
Int?
res
,
])
Int?
res
}) } }

Since we've successfully separated the replication strategy from the access process, the access_server, put, and get functions require no modifications. Let's test the final KVStore implementation:

async fn 
async (ctx : ?) -> Unit
kvstore_v4
(
?
ctx
: @moonchor.ChoreoContext) ->
Unit
Unit
{
let
?
state_at_server
=
?
ctx
.
(Server, (Unit) -> ServerState) -> ?
locally
(
Server
server
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
let
?
state_at_backup1
=
?
ctx
.
(Backup1, (Unit) -> ServerState) -> ?
locally
(
Backup1
backup1
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
let
?
state_at_backup2
=
?
ctx
.
(Backup2, (Unit) -> ServerState) -> ?
locally
(
Backup2
backup2
,
Unit
_unwrapper
=>
struct ServerState {
  db: Map[String, Int]
}
ServerState
::
() -> ServerState
new
())
let
async (?, ?) -> ?
strategy
=
async (state_at_server : ?, state_at_backup1 : ?, state_at_backup2 : ?) -> (async (?, ?) -> ?)
triple_replication_strategy
(
?
state_at_server
,
?
state_at_backup1
,
?
state_at_backup2
,
)
async (ctx : ?, strategy : async (?, ?) -> ?, key : String, value : Int) -> Unit
put_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key1", 42)
async (ctx : ?, strategy : async (?, ?) -> ?, key : String, value : Int) -> Unit
put_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key2", 41)
let
?
v1_at_client
=
async (ctx : ?, strategy : async (?, ?) -> ?, key : String) -> ?
get_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key1")
let
?
v2_at_client
=
async (ctx : ?, strategy : async (?, ?) -> ?, key : String) -> ?
get_v3
(
?
ctx
,
async (?, ?) -> ?
strategy
, "key2")
?
ctx
.
(Client, (Unit) -> Unit) -> Unit
locally
(
Client
client
, fn(
Unit
unwrapper
) {
let
Int
v1
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v1_at_client
).
() -> Int
unwrap
()
let
Int
v2
=
Unit
unwrapper
.
(?) -> Unit
unwrap
(
?
v2_at_client
).
() -> Int
unwrap
()
if
Int
v1
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Int
v2
(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
==
83 {
(input : String) -> Unit

Prints any value that implements the Show trait to the standard output, followed by a newline.

Parameters:

  • value : The value to be printed. Must implement the Show trait.

Example:

  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println
("The server is working correctly")
} else {
() -> Unit
panic
()
} }) |>
(t : Unit) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
} test "kvstore 4.0" { let
Unit
backend
=
(Array[Server]) -> Unit
@moonchor.make_local_backend
([
Server
server
,
Client
client
,
Backup1
backup1
,
Backup2
backup2
])
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Server) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v4
,
Server
server
))
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Client) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v4
,
Client
client
))
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Backup1) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v4
,
Backup1
backup1
))
(() -> Unit) -> Unit
@toolkit.run_async
(() =>
(Unit, async (?) -> Unit, Backup2) -> Unit
@moonchor.run_choreo
(
Unit
backend
,
async (ctx : ?) -> Unit
kvstore_v4
,
Backup2
backup2
))
}

With this, we've completed the multi-replica KVStore implementation. Throughout this example, we never manually used any send or recv to express distributed node interactions. Instead, we leveraged moonchor's choreographic programming capabilities to handle all communication and synchronization processes, avoiding potential type errors, deadlocks, and explicit synchronization issues.

Conclusion

In this article, we've explored the elegance of choreographic programming through moonchor while witnessing MoonBit's powerful expressiveness. For deeper insights into choreographic programming, you may refer to Haskell's library HasChor, the Choral language, or moonchor source code. To try moonchor yourself, simply install it via the command moon add Milky2018/moonchor@0.15.0.

MoonBit Pearls Vol.03:01 knapsack problem

· 13 min read

The 0/1 Knapsack Problem is a classic dynamic programming (DP) problem commonly found in algorithm competitions. This article presents five versions of the solution, starting from a basic brute-force approach and gradually evolving into a DP-based implementation.

Problem Definition

There are several items, each with aweightand a value:

struct Item {
  
Int
weight
:
Int
Int
Int
value
:
Int
Int
}

Now, given a list of items (items) and a knapsack capacity (capacity), select a subset of the items such that the total weight does not exceed the knapsack capacity, and the total value is maximized.

typealias 
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
@list.T
as List
let
@list.List[Item]
items_1
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
] =
(arr : FixedArray[Item]) -> @list.List[Item]

Create a list from a FixedArray.

Converts a FixedArray into a list with the same elements in the same order.

Example

let ls = @list.of([1, 2, 3, 4, 5])
assert_eq(ls.to_array(), [1, 2, 3, 4, 5])
@list.of
([
{
Int
weight
: 7,
Int
value
: 20 },
{
Int
weight
: 4,
Int
value
: 10 },
{
Int
weight
: 5,
Int
value
: 11 },
])

Take items_1 as an example. If the knapsack capacity is 10, the optimal solution is to select the last two items. Their combined weight is 4 + 5 = 9, and their total value is 10 + 11 = 21.

Note that items cannot be split. Therefore, greedily selecting the item with the highest value-to-weight ratio does not always yield the correct result.

For example, in the case above, if we pick only the first item (which has the highest ratio), we get 20 points in total, but the knapsack will be full and no other items can be added.

Problem Modeling

First, we define some basic objects and operations.

//A combination of items, referred to as "combination" throughout the article
struct Combination {
  
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
]
Int
total_weight
:
Int
Int
Int
total_value
:
Int
Int
} //An empty combination let
Combination
empty_combination
:
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
= {
@list.List[Item]
items
:
() -> @list.List[Item]

Creates an empty list.

Example

let ls : @list.List[Int] = @list.empty()
assert_eq(ls.length(), 0)
@list.empty
(),
Int
total_weight
: 0,
Int
total_value
: 0,
} //Add an item to a combination and return a new combination fn
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
::
(self : Combination, item : Item) -> Combination
add
(
Combination
self
:
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
,
Item
item
:
struct Item {
  weight: Int
  value: Int
}
Item
) ->
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
{
{
@list.List[Item]
items
:
Combination
self
.
@list.List[Item]
items
.
(self : @list.List[Item], head : Item) -> @list.List[Item]

Add an element to the front of the list.

This is an alias for prepend - it creates a new list with the given element added to the beginning.

Example

let ls = @list.of([2, 3, 4]).add(1)
assert_eq(ls, @list.of([1, 2, 3, 4]))
add
(
Item
item
),
Int
total_weight
:
Combination
self
.
Int
total_weight
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Item
item
.
Int
weight
,
Int
total_value
:
Combination
self
.
Int
total_value
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Item
item
.
Int
value
,
} } //Two combinations are considered equal if they have the same total value impl
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
for
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
with
(self : Combination, other : Combination) -> Bool
op_equal
(
Combination
self
,
Combination
other
) {
Combination
self
.
Int
total_value
(self : Int, other : Int) -> Bool

Compares two integers for equality.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare.

Returns true if both integers have the same value, false otherwise.

Example:

  inspect(42 == 42, content="true")
  inspect(42 == -42, content="false")
==
Combination
other
.
Int
total_value
} //Compare two combinations by their total value impl
trait Compare {
  compare(Self, Self) -> Int
}

Trait for types whose elements are ordered

The return value of [compare] is:

  • zero, if the two arguments are equal
  • negative, if the first argument is smaller
  • positive, if the first argument is greater
Compare
for
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
with
(self : Combination, other : Combination) -> Int
compare
(
Combination
self
,
Combination
other
) {
Combination
self
.
Int
total_value
.
(self : Int, other : Int) -> Int

Compares two integers and returns their relative order.

Parameters:

  • self : The first integer to compare.
  • other : The second integer to compare against.

Returns an integer indicating the relative order:

  • A negative value if self is less than other
  • Zero if self equals other
  • A positive value if self is greater than other

Example:

  let a = 42
  let b = 24
  inspect(a.compare(b), content="1") // 42 > 24
  inspect(b.compare(a), content="-1") // 24 < 42
  inspect(a.compare(a), content="0") // 42 = 42
compare
(
Combination
other
.
Int
total_value
)
}

Now, we can begin thinking about how to solve the problem.

1.Naive Enumeration

Enumeration is the most straightforward approach. By following the problem definition step by step, we can arrive at a solution:

  1. Enumerate all possible combinations;
  2. Filter out only the valid combinations-those that fit in the knapsack;
  3. Select the one with the maximum total value.

Thanks to the two functions provided by our modeling, we can translate the three steps above directly into MoonBit code. The all_combinations function, which we'll implement later, has the type(List[Item]) -> List[Combination].

fn 
(items : @list.List[Item], capacity : Int) -> Combination
solve_v1
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
{
(items : @list.List[Item]) -> @list.List[Combination]
all_combinations
(
@list.List[Item]
items
)
.
(self : @list.List[Combination], f : (Combination) -> Bool) -> @list.List[Combination]

Filter the list.

Example

assert_eq(@list.of([1, 2, 3, 4, 5]).filter(x => x % 2 == 0), @list.of([2, 4]))
filter
(fn(
Combination
comb
) {
Combination
comb
.
Int
total_weight
(self_ : Int, other : Int) -> Bool
<=
Int
capacity
})
.
(self : @list.List[Combination]) -> Combination

Get the maximum element of the list.

Warning: This function panics if the list is empty. Use maximum() for a safe alternative that returns Option.

Example

let ls = @list.of([1, 3, 2, 5, 4])
assert_eq(ls.unsafe_maximum(), 5)

Panics

Panics if the list is empty.

unsafe_maximum
()
}

Note that we use unsafe_maximum instead of maximum because we're taking the maximum of a non-empty list, and in this case, maximum will not return None.Since the problem guarantees that a solution exists (as long as capacity is non-negative), we can safely use unsafe_maximum. It skips the empty-list check but implicitly assumes the result will exist.

Now let's implement the enumeration step.Suppose all_combinations takes a list of items and returns a list of combinations, each representing a possible subset of those items. If this seems unclear at first, we can begin by looking at the definition of the list structure, which looks roughly like this:

enum List[A] {
  Empty
  More(A, tail~ : List[A])
}

In other words, a list has two possible forms:

  1. An empty list, represented as Empty;
  2. A non-empty list, represented as More, which contains the first element (A) and the rest of the list (tail: List[A]), which is itself also a list.

This structure gives us a hint for how to recursively build combinations:

  • If the item list is empty, the only possible combination is the empty combination;
  • Otherwise, there must be a first item item1 and a remaining list items_tail. In this case, we can:
    1. Recursively compute all combinations of items_tail, which represent combinations that do not include item1;
    2. For each of those, add item1 to form new combinations that do include it;
    3. Merge both sets to obtain all combinations of the original items list.

For example, if the item list is a, b, c, then the combinations can be divided into two parts:

Without aWith a (by adding a to the left side)
{ }{ a }
{ b }{ a, b }
{ c }{ a, c }
{ b, c }{ a, b, c }
fn 
(items : @list.List[Item]) -> @list.List[Combination]
all_combinations
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
]) ->
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
] {
match
@list.List[Item]
items
{
@list.List[Item]
Empty
=>
(x : Combination) -> @list.List[Combination]

Create a list with a single element.

Returns a list containing only the given element.

Example

  let ls = @list.singleton(42)
  assert_eq(ls, @list.of([42]))
  assert_eq(ls.length(), 1)
@list.singleton
(
Combination
empty_combination
)
(Item, tail~ : @list.List[Item]) -> @list.List[Item]
More
(
Item
item1
,
@list.List[Item]
tail
=
@list.List[Item]
items_tail
) => {
let
@list.List[Combination]
combs_without_item1
=
(items : @list.List[Item]) -> @list.List[Combination]
all_combinations
(
@list.List[Item]
items_tail
)
let
@list.List[Combination]
combs_with_item1
=
@list.List[Combination]
combs_without_item1
.
(self : @list.List[Combination], f : (Combination) -> Combination) -> @list.List[Combination]

Maps the list.

Example

assert_eq(@list.of([1, 2, 3, 4, 5]).map(x => x * 2), @list.of([2, 4, 6, 8, 10]))
map
(_.
(self : Combination, item : Item) -> Combination
add
(
Item
item1
))
@list.List[Combination]
combs_with_item1
(self : @list.List[Combination], other : @list.List[Combination]) -> @list.List[Combination]

Add implementation for List - concatenates two lists.

The + operator for lists performs concatenation. a + b is equivalent to a.concat(b).

Example

let a = @list.of([1, 2, 3])
let b = @list.of([4, 5, 6])
let result = a + b
assert_eq(result, @list.of([1, 2, 3, 4, 5, 6]))
+
@list.List[Combination]
combs_without_item1
} } }

By using pattern matching (match), we've translated the five lines of logic above into MoonBit code for the first time, line by line.

2.Early Filtering: Enumerate Only Valid Combinations

In the first version, generating all combinations and filtering out those that fit in the knapsack were two separate steps.During the enumeration process, many invalid combinations were generated-combinations that had already exceeded the knapsack capacity but were still extended with more items.If we filter them earlier, we can avoid generating many unnecessary combinations on top of invalid ones, which is clearly more efficient.Looking at the code, we see that invalid combinations are only produced during .map(_.add(item1)).So we can optimize by only adding item1 to combinations that can still fit it.

We now rename all_combinations to all_combinations_valid, which returns only combinations that can actually fit in the knapsack. The generation and filtering processes are now interleaved.

fn 
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
// Add capacity as a parameter because filtering requires it
) ->
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
] {
match
@list.List[Item]
items
{
@list.List[Item]
Empty
=>
(x : Combination) -> @list.List[Combination]

Create a list with a single element.

Returns a list containing only the given element.

Example

  let ls = @list.singleton(42)
  assert_eq(ls, @list.of([42]))
  assert_eq(ls.length(), 1)
@list.singleton
(
Combination
empty_combination
) // An empty combination is always valid
(Item, tail~ : @list.List[Item]) -> @list.List[Item]
More
(
Item
item1
,
@list.List[Item]
tail
=
@list.List[Item]
items_tail
) => {
// Recursively obtain combinations without this item (by assumption, all valid) let
@list.List[Combination]
valid_combs_without_item1
=
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid
(
@list.List[Item]
items_tail
,
Int
capacity
,
) // Add item1 to valid combinations that can still hold it let
@list.List[Combination]
valid_combs_with_item1
=
@list.List[Combination]
valid_combs_without_item1
.
(self : @list.List[Combination], f : (Combination) -> Bool) -> @list.List[Combination]

Filter the list.

Example

assert_eq(@list.of([1, 2, 3, 4, 5]).filter(x => x % 2 == 0), @list.of([2, 4]))
filter
(fn(
Combination
comb
) {
Combination
comb
.
Int
total_weight
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Item
item1
.
Int
weight
(self_ : Int, other : Int) -> Bool
<=
Int
capacity
})
.
(self : @list.List[Combination], f : (Combination) -> Combination) -> @list.List[Combination]

Maps the list.

Example

assert_eq(@list.of([1, 2, 3, 4, 5]).map(x => x * 2), @list.of([2, 4, 6, 8, 10]))
map
(_.
(self : Combination, item : Item) -> Combination
add
(
Item
item1
))
// Both parts are valid, so we return their union
@list.List[Combination]
valid_combs_with_item1
(self : @list.List[Combination], other : @list.List[Combination]) -> @list.List[Combination]

Add implementation for List - concatenates two lists.

The + operator for lists performs concatenation. a + b is equivalent to a.concat(b).

Example

let a = @list.of([1, 2, 3])
let b = @list.of([4, 5, 6])
let result = a + b
assert_eq(result, @list.of([1, 2, 3, 4, 5, 6]))
+
@list.List[Combination]
valid_combs_without_item1
} } }

This structure naturally supports inductive reasoning, making it easy to prove the correctness of all_combinations_valid-it indeed returns only valid combinations.

Since all_combinations_valid already returns only valid combinations, we no longer need to filter in solve.We simply remove the filter from solve:

fn 
(items : @list.List[Item], capacity : Int) -> Combination
solve_v2
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
{
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid
(
@list.List[Item]
items
,
Int
capacity
).
(self : @list.List[Combination]) -> Combination

Get the maximum element of the list.

Warning: This function panics if the list is empty. Use maximum() for a safe alternative that returns Option.

Example

let ls = @list.of([1, 3, 2, 5, 4])
assert_eq(ls.unsafe_maximum(), 5)

Panics

Panics if the list is empty.

unsafe_maximum
()
}

3.Maintain Order to Enable Early Termination

In the previous version, to construct new combinations, we needed to iterate over every combination in valid_combs_without_item1.

But we can observe: if item1 cannot be added to a certain combination, then it definitely cannot be added to any combination with a greater total weight than that one. In other words, if valid_combs_without_item1 is sorted in ascending order of total weight, then we don't need to traverse it entirely during filtering.

During filtering, as soon as we encounter a combination that can't fit item1, we can immediately discard the remaining combinations.Since this logic is common, the standard library provides a function called take_while, which we use to replace filter.

To make valid_combs_without_item1 sorted, we could use a sorting algorithm-but that would require traversing the entire list, which defeats the purpose.Therefore, we adopt a different approach: ensure that the list returned by all_combinations_valid is already sorted in ascending order.

This requires a leap of faith via recursion:

fn 
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid_ordered
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
] {
match
@list.List[Item]
items
{
@list.List[Item]
Empty
=>
(x : Combination) -> @list.List[Combination]

Create a list with a single element.

Returns a list containing only the given element.

Example

  let ls = @list.singleton(42)
  assert_eq(ls, @list.of([42]))
  assert_eq(ls.length(), 1)
@list.singleton
(
Combination
empty_combination
) // A single-element list is naturally in ascending order.
(Item, tail~ : @list.List[Item]) -> @list.List[Item]
More
(
Item
item1
,
@list.List[Item]
tail
=
@list.List[Item]
items_tail
) => {
// We assume that all_combinations_valid_ordered returns a list sorted in ascending order (inductive hypothesis). let
@list.List[Combination]
valid_combs_without_item1
=
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid_ordered
(
@list.List[Item]
items_tail
,
Int
capacity
,
) // Then valid_combs_with_item1 is also in ascending order, because taking a prefix of an ascending list and adding the same weight to each element still yields an ascending list by total weight. let
@list.List[Combination]
valid_combs_with_item1
=
@list.List[Combination]
valid_combs_without_item1
.
(self : @list.List[Combination], p : (Combination) -> Bool) -> @list.List[Combination]

Take the longest prefix of a list of elements that satisfies a given predicate.

Example

  let ls = @list.from_array([1, 2, 3, 4])
  let r = ls.take_while(x => x < 3)
  assert_eq(r, @list.of([1, 2]))
take_while
(fn(
Combination
comb
) {
Combination
comb
.
Int
total_weight
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Item
item1
.
Int
weight
(self_ : Int, other : Int) -> Bool
<=
Int
capacity
})
.
(self : @list.List[Combination], f : (Combination) -> Combination) -> @list.List[Combination]

Maps the list.

Example

assert_eq(@list.of([1, 2, 3, 4, 5]).map(x => x * 2), @list.of([2, 4, 6, 8, 10]))
map
(_.
(self : Combination, item : Item) -> Combination
add
(
Item
item1
))
// Now, we only need to ensure that the merged result is also sorted in ascending order to maintain our initial assumption.
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order
(
@list.List[Combination]
valid_combs_with_item1
,
@list.List[Combination]
valid_combs_without_item1
)
} } }

The final task is to implement the function merge_keep_order, which merges two ascending lists into one ascending list:

fn 
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order
(
@list.List[Combination]
a
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
],
@list.List[Combination]
b
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
]
) ->
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
] {
match (
@list.List[Combination]
a
,
@list.List[Combination]
b
) {
(
@list.List[Combination]
Empty
,
@list.List[Combination]
another
) | (
@list.List[Combination]
another
,
@list.List[Combination]
Empty
) =>
@list.List[Combination]
another
(
(Combination, tail~ : @list.List[Combination]) -> @list.List[Combination]
More
(
Combination
a1
,
@list.List[Combination]
tail
=
@list.List[Combination]
a_tail
),
(Combination, tail~ : @list.List[Combination]) -> @list.List[Combination]
More
(
Combination
b1
,
@list.List[Combination]
tail
=
@list.List[Combination]
b_tail
)) =>
// If a1 is lighter than b1, and b1 is part of an ascending list, then: // a1 is lighter than all combinations in b // Since list a is also in ascending order // a1 is also lighter than all combinations in a_tail // Therefore, a1 is the smallest among all elements in a and b if
Combination
a1
.
Int
total_weight
(self_ : Int, other : Int) -> Bool
<
Combination
b1
.
Int
total_weight
{
// We first recursively merge the rest of the list, then prepend a1
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order
(
@list.List[Combination]
a_tail
,
@list.List[Combination]
b
).
(self : @list.List[Combination], head : Combination) -> @list.List[Combination]

Add an element to the front of the list.

This is an alias for prepend - it creates a new list with the given element added to the beginning.

Example

let ls = @list.of([2, 3, 4]).add(1)
assert_eq(ls, @list.of([1, 2, 3, 4]))
add
(
Combination
a1
)
} else { //
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order
(
@list.List[Combination]
a
,
@list.List[Combination]
b_tail
).
(self : @list.List[Combination], head : Combination) -> @list.List[Combination]

Add an element to the front of the list.

This is an alias for prepend - it creates a new list with the given element added to the beginning.

Example

let ls = @list.of([2, 3, 4]).add(1)
assert_eq(ls, @list.of([1, 2, 3, 4]))
add
(
Combination
b1
)
} } }

Although it might seem a bit verbose, I still want to point this out: by following a case-based analysis aligned with the structure of the code, it's actually easy to prove the correctness of all_combinations_valid_ordered and merge_keep_order - they do return a sorted list.

For an ascending list, the maximum element is simply the last one. So we replaced unsafe_maximum with unsafe_last.

fn 
(items : @list.List[Item], capacity : Int) -> Combination
solve_v3
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
{
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid_ordered
(
@list.List[Item]
items
,
Int
capacity
).
(self : @list.List[Combination]) -> Combination

Get the last element of the list.

Warning: This function panics if the list is empty. Use last() for a safe alternative that returns Option.

Example

let ls = @list.of([1, 2, 3, 4, 5])
assert_eq(ls.unsafe_last(), 5)

Panics

Panics if the list is empty.

unsafe_last
()
}

Looking back, this version of the optimization might not seem like a big win - after all, we still have to traverse the entire list during the merge. That was my initial impression too, but I later discovered something unexpected.

4.Removing Redundant Combinations with Equal Weights for Optimal Time Complexity

So far, the optimizations we've made haven't addressed time complexity, but they've laid the groundwork for this next step. Now let's consider the algorithm's time complexity.

In the worst case (e.g., when the knapsack is large enough to hold everything), the combination list returned by all_combinations can contain up to 2numberofitems2^{number of items} elements. This results in an exponential time complexity, especially since all_combinations is called multiple times, each time returning a potentially large list.

To reduce the time complexity, we need to limit the length of the candidate combination list. This is based on a simple observation: if two combinations have the same total weight, the one with the higher total value is always better. Therefore, we don't need to keep both in the list.

By eliminating these redundant combinations, the list length will never exceed the knapsack's capacity (thanks to the pigeonhole principle). This optimization reduces the algorithm's time complexity to O(numberofitems×capacity)\mathcal{O}(number of items \times capacity). Upon reviewing the code, the only place where redundant combinations may still be introduced is the else branch of merge_keep_order. To prevent this, we just need to make a small modification to that section.

fnalias 
(x : T, y : T) -> T
@math.maximum
fn
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order_and_dedup
(
@list.List[Combination]
a
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
],
@list.List[Combination]
b
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
]
) ->
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
] {
match (
@list.List[Combination]
a
,
@list.List[Combination]
b
) {
(
@list.List[Combination]
Empty
,
@list.List[Combination]
another
) | (
@list.List[Combination]
another
,
@list.List[Combination]
Empty
) =>
@list.List[Combination]
another
(
(Combination, tail~ : @list.List[Combination]) -> @list.List[Combination]
More
(
Combination
a1
,
@list.List[Combination]
tail
=
@list.List[Combination]
a_tail
),
(Combination, tail~ : @list.List[Combination]) -> @list.List[Combination]
More
(
Combination
b1
,
@list.List[Combination]
tail
=
@list.List[Combination]
b_tail
)) =>
if
Combination
a1
.
Int
total_weight
(self_ : Int, other : Int) -> Bool
<
Combination
b1
.
Int
total_weight
{
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order_and_dedup
(
@list.List[Combination]
a_tail
,
@list.List[Combination]
b
).
(self : @list.List[Combination], head : Combination) -> @list.List[Combination]

Add an element to the front of the list.

This is an alias for prepend - it creates a new list with the given element added to the beginning.

Example

let ls = @list.of([2, 3, 4]).add(1)
assert_eq(ls, @list.of([1, 2, 3, 4]))
add
(
Combination
a1
)
} else if
Combination
a1
.
Int
total_weight
(self_ : Int, other : Int) -> Bool
>
Combination
b1
.
Int
total_weight
{
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order_and_dedup
(
@list.List[Combination]
a
,
@list.List[Combination]
b_tail
).
(self : @list.List[Combination], head : Combination) -> @list.List[Combination]

Add an element to the front of the list.

This is an alias for prepend - it creates a new list with the given element added to the beginning.

Example

let ls = @list.of([2, 3, 4]).add(1)
assert_eq(ls, @list.of([1, 2, 3, 4]))
add
(
Combination
b1
)
} else { // "In this case, a1 and b1 have the same weight, creating a duplicate. We keep the one with the higher total value." let
Combination
better
=
(x : Combination, y : Combination) -> Combination
maximum
(
Combination
a1
,
Combination
b1
)
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order_and_dedup
(
@list.List[Combination]
a_tail
,
@list.List[Combination]
b_tail
).
(self : @list.List[Combination], head : Combination) -> @list.List[Combination]

Add an element to the front of the list.

This is an alias for prepend - it creates a new list with the given element added to the beginning.

Example

let ls = @list.of([2, 3, 4]).add(1)
assert_eq(ls, @list.of([1, 2, 3, 4]))
add
(
Combination
better
)
} } }

Simply replace the corresponding parts with all_combinations_valid_ordered_nodup (arguably the longest function name I've ever written) and solve_v4.

fn 
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid_ordered_nodup
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
] {
match
@list.List[Item]
items
{
@list.List[Item]
Empty
=>
(x : Combination) -> @list.List[Combination]

Create a list with a single element.

Returns a list containing only the given element.

Example

  let ls = @list.singleton(42)
  assert_eq(ls, @list.of([42]))
  assert_eq(ls.length(), 1)
@list.singleton
(
Combination
empty_combination
)
(Item, tail~ : @list.List[Item]) -> @list.List[Item]
More
(
Item
item1
,
@list.List[Item]
tail
=
@list.List[Item]
items_tail
) => {
let
@list.List[Combination]
combs_without_item1
=
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid_ordered_nodup
(
@list.List[Item]
items_tail
,
Int
capacity
,
) let
@list.List[Combination]
combs_with_item1
=
@list.List[Combination]
combs_without_item1
.
(self : @list.List[Combination], p : (Combination) -> Bool) -> @list.List[Combination]

Take the longest prefix of a list of elements that satisfies a given predicate.

Example

  let ls = @list.from_array([1, 2, 3, 4])
  let r = ls.take_while(x => x < 3)
  assert_eq(r, @list.of([1, 2]))
take_while
(fn(
Combination
comb
) {
Combination
comb
.
Int
total_weight
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Item
item1
.
Int
weight
(self_ : Int, other : Int) -> Bool
<=
Int
capacity
})
.
(self : @list.List[Combination], f : (Combination) -> Combination) -> @list.List[Combination]

Maps the list.

Example

assert_eq(@list.of([1, 2, 3, 4, 5]).map(x => x * 2), @list.of([2, 4, 6, 8, 10]))
map
(_.
(self : Combination, item : Item) -> Combination
add
(
Item
item1
))
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order_and_dedup
(
@list.List[Combination]
combs_with_item1
,
@list.List[Combination]
combs_without_item1
)
} } } fn
(items : @list.List[Item], capacity : Int) -> Combination
solve_v4
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
{
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_valid_ordered_nodup
(
@list.List[Item]
items
,
Int
capacity
).
(self : @list.List[Combination]) -> Combination

Get the last element of the list.

Warning: This function panics if the list is empty. Use last() for a safe alternative that returns Option.

Example

let ls = @list.of([1, 2, 3, 4, 5])
assert_eq(ls.unsafe_last(), 5)

Panics

Panics if the list is empty.

unsafe_last
()
}

At this point, we've essentially reinvented the dynamic programming solution for the 0/1 knapsack problem.

Conclusion

This article's content came from a sudden idea I had one morning while lying in bed. From the first to the fourth version, all code was written entirely on my phone without any debugging-yet correctness was easily guaranteed. Compared to conventional solutions often seen in algorithm competitions, the functional programming style used here brings the following advantages:

  1. No loops, only recursion and structural decomposition. To extract elements from a list, pattern matching (match) is required, which naturally prompts consideration of empty cases and expresses intent more clearly than initializing a DP array.
  2. Composition of higher-order functions. Standard functions like filter, take_while, map, and maximum replace boilerplate loops (for, while), making the traversal purpose clearer at a glance.
  3. Declarative code. The later versions of the solution are direct translations of the first one. Rather than just implementing a solution, the code describes the problem itself. This ensures the correctness of the first version. Each subsequent improvement was made without affecting the correctness, allowing for a safe and iterative process.

Of course, this solution doesn't apply state space compression, so a tradeoff between readability and efficiency remains. The functional style is slightly idealistic but still leaves room for many optimizations. A future direction would be to convert the list structure into a tree, exploiting the fact that functions only pass two argument groups throughout execution-possibly even a single value. This could reduce space complexity to O(capacity), though it's beyond the scope of this article. We believe the approach here offers a beginner-friendly and understandable way to write correct code.

Appendix

Further Optimization Details

Given that the order of items does not affect the total value of the result, we can convert all_combinations into a tail-recursive version.

Additionally, since the list produced by take_while is immediately discarded after being passed to map, certain syntax-level techniques can be used to avoid creating this temporary list altogether.

fn 
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_loop
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
] {
loop
@list.List[Item]
items
,
(x : Combination) -> @list.List[Combination]

Create a list with a single element.

Returns a list containing only the given element.

Example

  let ls = @list.singleton(42)
  assert_eq(ls, @list.of([42]))
  assert_eq(ls.length(), 1)
@list.singleton
(
Combination
empty_combination
) {
@list.List[Item]
Empty
,
@list.List[Combination]
combs_so_far
=>
@list.List[Combination]
combs_so_far
(Item, tail~ : @list.List[Item]) -> @list.List[Item]
More
(
Item
item1
,
@list.List[Item]
tail
=
@list.List[Item]
items_tail
),
@list.List[Combination]
combs_so_far
=> {
let
@list.List[Combination]
combs_without_item1
=
@list.List[Combination]
combs_so_far
let
@list.List[Combination]
combs_with_item1
=
@list.List[Combination]
combs_without_item1
.
(self : @list.List[Combination]) -> Iter[Combination]

Create an iterator over the list elements.

Returns an iterator that yields each element of the list in order.

Example

let ls = @list.of([1, 2, 3, 4, 5])
let iter = ls.iter()
let sum = iter.fold(init=0, (acc, x) => acc + x)
inspect(sum, content="15")
iter
()
.
(self : Iter[Combination], f : (Combination) -> Bool) -> Iter[Combination]

Takes elements from the iterator as long as the predicate function returns true.

Type Parameters

  • T: The type of the elements in the iterator.

Arguments

  • self - The input iterator.
  • f - The predicate function that determines whether an element should be taken.

Returns

A new iterator that contains the elements as long as the predicate function returns true.

take_while
(fn(
Combination
comb
) {
Combination
comb
.
Int
total_weight
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
Item
item1
.
Int
weight
(self_ : Int, other : Int) -> Bool
<=
Int
capacity
})
.
(self : Iter[Combination], f : (Combination) -> Combination) -> Iter[Combination]

Transforms the elements of the iterator using a mapping function.

Type Parameters

  • T: The type of the elements in the iterator.
  • R: The type of the transformed elements.

Arguments

  • self - The input iterator.
  • f - The mapping function that transforms each element of the iterator.

Returns

A new iterator that contains the transformed elements.

map
(_.
(self : Combination, item : Item) -> Combination
add
(
Item
item1
))
|>
(iter : Iter[Combination]) -> @list.List[Combination]

Convert an iterator into a list, preserving order of elements.

Creates a list from an iterator, maintaining the same order as the iterator. If order is not important, consider using from_iter_rev for better performance.

Example

let arr = [1, 2, 3, 4, 5]
let iter = arr.iter()
let ls = @list.from_iter(iter)
assert_eq(ls, @list.of([1, 2, 3, 4, 5]))
@list.from_iter
continue
@list.List[Item]
items_tail
,
(a : @list.List[Combination], b : @list.List[Combination]) -> @list.List[Combination]
merge_keep_order_and_dedup
(
@list.List[Combination]
combs_with_item1
,
@list.List[Combination]
combs_without_item1
)
} } } fn
(items : @list.List[Item], capacity : Int) -> Combination
solve_v5
(
@list.List[Item]
items
:
enum @list.List[A] {
  Empty
  More(A, tail~ : @list.List[A])
} derive(Eq)
List
[
struct Item {
  weight: Int
  value: Int
}
Item
],
Int
capacity
:
Int
Int
) ->
struct Combination {
  items: @list.List[Item]
  total_weight: Int
  total_value: Int
}
Combination
{
(items : @list.List[Item], capacity : Int) -> @list.List[Combination]
all_combinations_loop
(
@list.List[Item]
items
,
Int
capacity
).
(self : @list.List[Combination]) -> Combination

Get the last element of the list.

Warning: This function panics if the list is empty. Use last() for a safe alternative that returns Option.

Example

let ls = @list.of([1, 2, 3, 4, 5])
assert_eq(ls.unsafe_last(), 5)

Panics

Panics if the list is empty.

unsafe_last
()
}

Side Notes

  1. In the first version, the Combination generated by all_combinations(items) even contains one more node than More, making it a true master of linked list node reuse.
  2. An ascending order can't be treated as "broken" order, so the corresponding take_while must be converted to drop_while. When using an array, we can directly cut segments via binary_search on indices.
  3. If you're interested, consider how to generalize the above approach to various other knapsack problems.
  4. The original name of all_combinations_loop was: generate_all_ordered_combination_that_fit_in_backpack_list_without_duplicates_using_loop.

Test

test {
  for 
(@list.List[Item], Int) -> Combination
solve
in [
(items : @list.List[Item], capacity : Int) -> Combination
solve_v1
,
(items : @list.List[Item], capacity : Int) -> Combination
solve_v2
,
(items : @list.List[Item], capacity : Int) -> Combination
solve_v3
,
(items : @list.List[Item], capacity : Int) -> Combination
solve_v4
,
(items : @list.List[Item], capacity : Int) -> Combination
solve_v5
] {
(a : Int, b : Int, msg? : String, loc~ : SourceLoc = _) -> Unit raise

Asserts that two values are equal. If they are not equal, raises a failure with a message containing the source location and the values being compared.

Parameters:

  • a : First value to compare.
  • b : Second value to compare.
  • loc : Source location information to include in failure messages. This is usually automatically provided by the compiler.

Throws a Failure error if the values are not equal, with a message showing the location of the failing assertion and the actual values that were compared.

Example:

  assert_eq(1, 1)
  assert_eq("hello", "hello")
assert_eq
(
(@list.List[Item], Int) -> Combination
solve
(
@list.List[Item]
items_1
, 10).
Int
total_value
, 21)
} }