Dancing with LLVM: A Moonbit Chronicle (Part 1) - Implementing the Frontend

August 4, 2025 · 16 min read

Introduction

Programming language design and compiler implementation have long been considered among the most challenging topics in computer science. The traditional path to learning compilers often requires students to first master a complex set of theoretical foundations:

Automata Theory: Finite state machines and regular expressions
Type Theory: The mathematical underpinnings of λ-calculus and type systems
Computer Architecture: Low-level implementation from assembly language to machine code

However, Moonbit, a functional programming language designed for the modern development landscape, offers a fresh perspective. It not only features a rigorous type system and exceptional memory safety guarantees but, more importantly, its rich syntax and toolchain tailored for the AI era make it an ideal choice for learning and implementing compilers.

Series Overview This series of articles will delve into the core concepts and best practices of modern compiler implementation by building a small programming language compiler called TinyMoonbit.

Part 1: Focuses on the implementation of the language frontend, including lexical analysis, parsing, and type checking, ultimately generating an abstract syntax tree with complete type annotations.

Part 2: Dives into the code generation phase, utilizing Moonbit's official llvm.mbt binding library to convert the abstract syntax tree into LLVM intermediate representation and finally generate RISC-V assembly code.

TinyMoonbit Language Design

TinyMoonbit is a systems-level programming language with an abstraction level comparable to C. Although its syntax heavily borrows from Moonbit, TinyMoonbit is not a subset of the Moonbit language. Instead, it is a simplified version designed to test the feature completeness of llvm.mbt while also serving an educational purpose.

Note: Due to space constraints, the TinyMoonbit implementation discussed in this series is simpler than the actual TinyMoonbit. For the complete version, please refer to TinyMoonbitLLVM.

Core Features

TinyMoonbit provides the fundamental features required for modern systems programming:

✅ Low-level Memory Operations: Direct pointer manipulation and memory management
✅ Control Flow Structures: Conditional branches, loops, and function calls
✅ Type Safety: Static type checking and explicit type declarations
❌ Simplified Design: To reduce implementation complexity, advanced features like type inference and closures are not supported.

Syntax Example

Let's demonstrate TinyMoonbit's syntax with a classic implementation of the Fibonacci sequence:

extern fn (x : Int) -> Unit
print_int(Int
x : Int
Int) -> Unit
Unit;

// Recursive implementation of the Fibonacci sequence
fn (n : Int) -> Int
fib(Int
n : Int
Int) -> Int
Int {
  if Int
n (self_ : Int, other : Int) -> Bool
<= 1 {
    return Int
n;
  }
  return (n : Int) -> Int
fib(Int
n (self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
- 1) (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ (n : Int) -> Int
fib(Int
n (self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
- 2);
}

fn main {
  (x : Int) -> Unit
print_int((n : Int) -> Int
fib(10));
}

Compilation Target

After the complete compilation process, the above code will generate the following LLVM Intermediate Representation:

; ModuleID = 'tinymoonbit'
source_filename = "tinymoonbit"

define i32 @fib(i32 %0) {
entry:
  %1 = alloca i32, align 4
  store i32 %0, ptr %1, align 4
  %2 = load i32, ptr %1, align 4
  %3 = icmp sle i32 %2, 1
  br i1 %3, label %4, label %6

4:                                                ; preds = %entry
  %5 = load i32, ptr %1, align 4
  ret i32 %5

6:                                                ; preds = %4, %entry
  %7 = load i32, ptr %1, align 4
  %8 = sub i32 %7, 1
  %9 = call i32 @fib(i32 %8)
  %10 = load i32, ptr %1, align 4
  %11 = sub i32 %10, 2
  %12 = call i32 @fib(i32 %11)
  %13 = add i32 %9, %12
  ret i32 %13
}

define void @main() {
entry:
  %0 = call i32 @fib(i32 10)
  call void @print_int(i32 %0)
}

declare void @print_int(i32 %0)

Chapter 2: Lexical Analysis

Lexical Analysis is the first stage of the compilation process. Its core mission is to convert a continuous stream of characters into a sequence of meaningful tokens. This seemingly simple conversion process is, in fact, the cornerstone of the entire compiler pipeline.

From Characters to Symbols: Token Design and Implementation

Consider the following code snippet:

let Int
x : Int
Int = 5;

After being processed by the lexer, it will produce the following sequence of tokens:

(Keyword "let") → (Identifier "x") → (Symbol ":") →
(Type "Int") → (Operator "=") → (IntLiteral 5) → (Symbol ";")

This conversion process needs to handle various complex situations:

Whitespace Filtering: Skipping spaces, tabs, and newlines.
Keyword Recognition: Distinguishing reserved words from user-defined identifiers.
Numeric Parsing: Correctly identifying the boundaries of integers and floating-point numbers.
Operator Handling: Differentiating between single-character and multi-character operators.

Token Type System Design

Based on the TinyMoonbit syntax specification, we classify all possible symbols into the following token types:

pub enum Token {
  (Bool) -> Token
Bool(Bool
Bool)       // Boolean values: true, false
  (Int) -> Token
Int(Int
Int)         // Integers: 1, 2, 3, ...
  (Double) -> Token
Double(Double
Double)   // Floating-point numbers: 1.0, 2.5, 3.14, ...
  (String) -> Token
Keyword(String
String)  // Reserved words: let, if, while, fn, return
  (String) -> Token
Upper(String
String)    // Type identifiers: start with an uppercase letter, e.g., Int, Double, Bool
  (String) -> Token
Lower(String
String)    // Variable identifiers: start with a lowercase letter, e.g., x, y, result
  (String) -> Token
Symbol(String
String)   // Operators and punctuation: +, -, *, :, ;, ->
  (Char) -> Token
Bracket(Char
Char)    // Brackets: (, ), [, ], {, }
  Token
EOF              // End-of-file marker
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq)

Leveraging Pattern Matching

Moonbit's powerful pattern matching capabilities allow us to implement the lexer in an unprecedentedly elegant way. Compared to the traditional finite state machine approach, this pattern-matching-based implementation is more intuitive and easier to understand.

Core Analysis Function

pub fn (code : String) -> Array[Token]
lex(String
code: String
String) -> type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token] {
  let Array[Token]
tokens = type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array::(capacity? : Int) -> Array[Token]
Creates a new empty array with an optional initial capacity.
Parameters:

capacity : The initial capacity of the array. If 0 (default), creates an
array with minimum capacity. Must be non-negative.
Returns a new empty array of type Array[T] with the specified initial
capacity.
Example:
  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new()

  loop String
code[:] {
    // Skip whitespace characters
    StringView
[' ' | '\n' | '\r' | '\t', ..rest] =>
      continue StringView
rest

    // Handle single-line comments
    StringView
[.."//", ..rest] =>
      continue loop StringView
rest {
        StringView
['\n' | '\r', ..rest_str] => break StringView
rest_str
        StringView
[_, ..rest_str] => continue StringView
rest_str
        StringView
[] as rest_str => break StringView
rest_str
      }

    // Recognize multi-character operators (order is important!)
    StringView
[.."->", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("->")); continue StringView
rest }
    StringView
[.."==", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("==")); continue StringView
rest }
    StringView
[.."!=", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("!=")); continue StringView
rest }
    StringView
[.."<=", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("<=")); continue StringView
rest }
    StringView
[..">=", ..rest] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol(">=")); continue StringView
rest }

    // Recognize single-character operators and punctuation
    [':' | '.' | ',' | ';' | '+' | '-' | '*' |
     '/' | '%' | '>' | '<' | '=' as c, ..rest] => {
      Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((String) -> Token
Symbol("\{Char
c}"))
      continue StringView
rest
    }

    // Recognize brackets
    StringView
[Char
'(' | ')' | '[' | ']' | '{' | '}' as cStringView
, ..rest] => {
      Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push((Char) -> Token
Bracket(Char
c))
      continue StringView
rest
    }

    // Recognize identifiers and literals
    StringView
['a'..='z', ..] as code => {
      let (Token
tok, StringView
rest) = (StringView) -> (Token, StringView)
lex_ident(StringView
code);
      Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Token
tok)
      continue StringView
rest
    }

    ['A'..='Z', ..] => { ... }
    ['0'..='9', ..] => { ... }

    // Reached the end of the file
    [] => { Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Token
EOF); break Array[Token]
tokens }
  }
}

Keyword Recognition Strategy

Identifier parsing requires special handling for keyword recognition:

pub fn (rest : StringView) -> (Token, StringView)
let_ident(StringView
rest: type StringView
StringView represents a view of a String that maintains proper Unicode
character boundaries. It allows safe access to a substring while handling
multi-byte characters correctly.
@string.View) -> (enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token, type StringView
StringView represents a view of a String that maintains proper Unicode
character boundaries. It allows safe access to a substring while handling
multi-byte characters correctly.
@string.View) {
  // Predefined keyword map
  let Unit
keyword_map = Unit
Map.(Array[(String, Token)]) -> Unit
from_array([
    ("let", Token::(String) -> Token
Keyword("let")),
    ("fn", Token::(String) -> Token
Keyword("fn")),
    ("if", Token::(String) -> Token
Keyword("if")),
    ("else", Token::(String) -> Token
Keyword("else")),
    ("while", Token::(String) -> Token
Keyword("while")),
    ("return", Token::(String) -> Token
Keyword("return")),
    ("extern", Token::(String) -> Token
Keyword("extern")),
    ("true", Token::(Bool) -> Token
Bool(true)),
    ("false", Token::(Bool) -> Token
Bool(false)),
  ])

  let Array[Char]
identifier_chars = type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array::(capacity? : Int) -> Array[Char]
Creates a new empty array with an optional initial capacity.
Parameters:

capacity : The initial capacity of the array. If 0 (default), creates an
array with minimum capacity. Must be non-negative.
Returns a new empty array of type Array[T] with the specified initial
capacity.
Example:
  let arr : Array[Int] = Array::new(capacity=10)
  inspect(arr.length(), content="0")
  inspect(arr.capacity(), content="10")

  let arr : Array[Int] = Array::new()
  inspect(arr.length(), content="0")
new()
  let StringView
remaining = loop StringView
rest {
    StringView
[Char
'a'..='z' | 'A'..='Z' | '0'..='9' | '_' as cStringView
, ..rest_str] => {
      Array[Char]
identifier_chars.(self : Array[Char], value : Char) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
  let v = []
  v.push(3)
push(Char
c)
      continue StringView
rest_str
    }
    StringView
_ as rest_str => break StringView
rest_str
  }

  let String
ident = (ArrayView[Char]) -> String
String::(chars : ArrayView[Char]) -> String
Convert char array to string.
  let s = @string.from_array(['H', 'e', 'l', 'l', 'o'])
  assert_eq(s, "Hello")
Do not convert large datas to Array[Char] and build a string with String::from_array.
For efficiency considerations, it's recommended to use Buffer instead.
from_array(Array[Char]
identifier_chars)
  let Token
token = Unit
keyword_map.(String) -> Unit
get(String
ident).(() -> Token) -> Token
or_else(() => Token::(String) -> Token
Lower(String
ident))

  (Token
token, StringView
remaining)
}

💡 In-depth Analysis of Moonbit Syntax Features

The implementation of the lexer above fully demonstrates several outstanding advantages of Moonbit in compiler development:

Functional Loop Construct
```
loop initial_value {
  pattern1 => continue new_value1
  pattern2 => continue new_value2
  pattern3 => break final_value
}
```
loop is not a traditional loop structure but a functional loop:
- It accepts an initial parameter as the loop state.
- It handles different cases through pattern matching.
- continue passes the new state to the next iteration.
- break terminates the loop and returns the final value.

String Views and Pattern Matching

Moonbit's string pattern matching feature greatly simplifies text processing:

// Match a single character
['a', ..rest] => // Starts with the character 'a'

// Match a character range
['a'..='z' as c, ..rest] => // A lowercase letter, bound to the variable c

// Match a string literal
[.."hello", ..rest] => // Equivalent to ['h','e','l','l','o', ..rest]

// Match multiple possible characters
[' ' | '\t' | '\n', ..rest] => // Any whitespace character

The Importance of Pattern Matching Priority

⚠️ Important Reminder: The order of matching is crucial.

When writing pattern matching rules, you must place more specific patterns before more general ones. For example:

// ✅ Correct order
loop code[:] {
  [.."->", ..rest] => { ... }     // Match multi-character operators first
  ['-' | '>' as c, ..rest] => { ... }  // Then match single characters
}

// ❌ Incorrect order - "->" will never be matched
loop code[:] {
  ['-' | '>' as c, ..rest] => { ... }
  [.."->", ..rest] => { ... }     // This will never be executed
}

By using this pattern-matching-based approach, we not only avoid complex state machine implementations but also achieve a clearer and more maintainable code structure.

Chapter 3: Parsing and Abstract Syntax Tree Construction

Syntactic Analysis (or Parsing) is the second core stage of the compiler. Its task is to reorganize the sequence of tokens produced by lexical analysis into a hierarchical Abstract Syntax Tree (AST). This process not only verifies whether the program conforms to the language's grammatical rules but also provides a structured data representation for subsequent semantic analysis and code generation.

Abstract Syntax Tree Design: A Structured Representation of the Program

Before building the parser, we need to carefully design the structure of the AST. This design determines how the program's syntactic structure is represented and how subsequent compilation stages will process these structures.

1. Core Type System

First, we define the representation of the TinyMoonbit type system in the AST:

pub enum Type {
  Type
Unit    // Unit type, represents no return value
  Type
Bool    // Boolean type: true, false
  Type
Int     // 32-bit signed integer
  Type
Double  // 64-bit double-precision floating-point number
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

pub fn (type_name : String) -> Type
parse_type(String
type_name: String
String) -> enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type {
  match String
type_name {
    "Unit" => Type::Type
Unit
    "Bool" => Type::Type
Bool
    "Int" => Type::Type
Int
    "Double" => Type::Type
Double
    _ => (msg : String) -> Type
Aborts the program with an error message. Always causes a panic, regardless
of the message provided.
Parameters:

message : A string containing the error message to be displayed when
aborting.
Returns a value of type T. However, this function never actually returns a
value as it always causes a panic.
abort("Unknown type: \{String
type_name}")
  }
}

2. Layered AST Node Design

We use a layered design to clearly represent the different abstraction levels of the program:

Atomic Expressions (AtomExpr) Represent the most basic, indivisible expression units:

pub enum AtomExpr {
  (Bool) -> AtomExpr
Bool(Bool
Bool)                                    // Boolean literal
  (Int) -> AtomExpr
Int(Int
Int)                                      // Integer literal
  (Double) -> AtomExpr
Double(Double
Double)                                // Floating-point literal
  (String, ty~ : Type?) -> AtomExpr
Var(String
String, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)                  // Variable reference
  (Expr, ty~ : Type?) -> AtomExpr
Paren(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)                  // Parenthesized expression
  (String, Array[Expr], ty~ : Type?) -> AtomExpr
Call(String
String, type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr], mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)    // Function call
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

Compound Expressions (Expr) More complex structures that can contain operators and multiple sub-expressions:

pub enum Expr {
  (AtomExpr, ty~ : Type?) -> Expr
AtomExpr(enum AtomExpr {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Var(String, ty~ : Type?)
  Paren(Expr, ty~ : Type?)
  Call(String, Array[Expr], ty~ : Type?)
} derive(Show, Eq, ToJson)
AtomExpr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)          // Wrapper for atomic expressions
  (String, Expr, ty~ : Type?) -> Expr
Unary(String
String, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)         // Unary operation: -, !
  (String, Expr, Expr, ty~ : Type?) -> Expr
Binary(String
String, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, mut Type?
ty~ : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type?)  // Binary operation: +, -, *, /, ==, !=, etc.
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

Statements (Stmt) Represent executable units in the program:

pub enum Stmt {
  (String, Type, Expr) -> Stmt
Let(String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr)                      // Variable declaration: let x : Int = 5;
  (String, Expr) -> Stmt
Assign(String
String, enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr)                         // Assignment statement: x = 10;
  (Expr, Array[Stmt], Array[Stmt]) -> Stmt
If(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt], type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt])           // Conditional branch: if-else
  (Expr, Array[Stmt]) -> Stmt
While(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt])                     // Loop statement: while
  (Expr?) -> Stmt
Return(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr?)                                // Return statement: return expr;
  (Expr) -> Stmt
Expr(enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr)                                   // Expression statement
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

Top-Level Structures Function definitions and the complete program:

pub struct Function {
  String
name : String
String                     // Function name
  Array[(String, Type)]
params : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[(String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type)]    // Parameter list: [(param_name, type)]
  Type
ret_ty : enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type                     // Return type
  Array[Stmt]
body : type Array[T]
An Array is a collection of values that supports random access and can
grow in size.
Array[enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt]                // Sequence of statements in the function body
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show, trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq, trait ToJson {
  to_json(Self) -> Json
}
Trait for types that can be converted to Json
ToJson)

// The program is defined as a map from function names to function definitions
pub type Program type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
String, struct Function {
  name: String
  params: Array[(String, Type)]
  ret_ty: Type
  body: Array[Stmt]
} derive(Show, Eq, ToJson)
Function]

Design Highlight: Mutability of Type Annotations

Notice that each expression node contains a mut ty~ : Type? field. This design allows us to fill in type information during the type-checking phase without having to rebuild the entire AST.

Recursive Descent Parsing: A Top-Down Construction Strategy

Recursive Descent is a top-down parsing method where the core idea is to write a corresponding parsing function for each grammar rule. In Moonbit, pattern matching makes the implementation of this method exceptionally elegant.

Parsing Atomic Expressions

pub fn (tokens : ArrayView[Token]) -> (AtomExpr, ArrayView[Token]) raise
parse_atom_expr(
  ArrayView[Token]
tokens: #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]
) -> (enum AtomExpr {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Var(String, ty~ : Type?)
  Paren(Expr, ty~ : Type?)
  Call(String, Array[Expr], ty~ : Type?)
} derive(Show, Eq, ToJson)
AtomExpr, #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) raise {
  match ArrayView[Token]
tokens {
    // Parse literals
    ArrayView[Token]
[(Bool) -> Token
BoolArrayView[Token]
(Bool
bArrayView[Token]
), ..rest] => (AtomExpr::(Bool) -> AtomExpr
Bool(Bool
b), ArrayView[Token]
rest)
    ArrayView[Token]
[(Int) -> Token
IntArrayView[Token]
(Int
iArrayView[Token]
), ..rest] => (AtomExpr::(Int) -> AtomExpr
Int(Int
i), ArrayView[Token]
rest)
    ArrayView[Token]
[(Double) -> Token
DoubleArrayView[Token]
(Double
dArrayView[Token]
), ..rest] => (AtomExpr::(Double) -> AtomExpr
Double(Double
d), ArrayView[Token]
rest)

    // Parse function calls: func_name(arg1, arg2, ...)
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(String
func_nameArrayView[Token]
), (Char) -> Token
BracketArrayView[Token]
('('), ..rest] => {
      let (Array[Expr]
args, Unit
rest) = (ArrayView[Token]) -> (Array[Expr], Unit)
parse_argument_list(ArrayView[Token]
rest)
      match Unit
rest {
        Unit
[(Char) -> _/0
BracketUnit
(')'), ..remaining] =>
          (AtomExpr::(String, Array[Expr], ty~ : Type?) -> AtomExpr
Call(String
func_name, Array[Expr]
args, Type?
ty=Type?
None), ArrayView[Token]
remaining)
        _ => raise Error
SyntaxError("Expected ')' after function arguments")
      }
    }

    // Parse variable references
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(String
var_nameArrayView[Token]
), ..rest] =>
      (AtomExpr::(String, ty~ : Type?) -> AtomExpr
Var(String
var_name, Type?
ty=Type?
None), ArrayView[Token]
rest)

    // Parse parenthesized expressions: (expression)
    ArrayView[Token]
[(Char) -> Token
BracketArrayView[Token]
('('), ..rest] => {
      let (Expr
expr, ArrayView[Token]
rest) = (tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression(ArrayView[Token]
rest)
      match ArrayView[Token]
rest {
        ArrayView[Token]
[(Char) -> Token
BracketArrayView[Token]
(')'), ..remaining] =>
          (AtomExpr::(Expr, ty~ : Type?) -> AtomExpr
Paren(Expr
expr, Type?
ty=Type?
None), ArrayView[Token]
remaining)
        _ => raise Error
SyntaxError("Expected ')' after expression")
      }
    }

    _ => raise Error
SyntaxError("Expected atomic expression")
  }
}

Parsing Statements

Statement parsing needs to dispatch to different handler functions based on the starting keyword:

pub fn (tokens : ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_stmt(ArrayView[Token]
tokens : #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) -> (enum Stmt {
  Let(String, Type, Expr)
  Assign(String, Expr)
  If(Expr, Array[Stmt], Array[Stmt])
  While(Expr, Array[Stmt])
  Return(Expr?)
  Expr(Expr)
} derive(Show, Eq, ToJson)
Stmt, #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) {
  match ArrayView[Token]
tokens {
    // Parse let statements
    [(String) -> Token
Keyword("let"), (String) -> Token
Lower(String
var_name), (String) -> Token
Symbol(":"), ..] => { /* ... */ }

    // Parse if/while/return statements
    ArrayView[Token]
[(String) -> Token
KeywordArrayView[Token]
("if"), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_if_stmt(ArrayView[Token]
rest)
    ArrayView[Token]
[(String) -> Token
KeywordArrayView[Token]
("while"), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_while_stmt(ArrayView[Token]
rest)
    ArrayView[Token]
[(String) -> Token
KeywordArrayView[Token]
("return"), .. rest] => { /* ... */ }

    // Parse assignment statements
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(_), (String) -> Token
SymbolArrayView[Token]
("="), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_assign_stmt(ArrayView[Token]
tokens)

    // Parse single expression statements
    ArrayView[Token]
[(String) -> Token
LowerArrayView[Token]
(_), (String) -> Token
SymbolArrayView[Token]
("="), .. rest] => (ArrayView[Token]) -> (Stmt, ArrayView[Token])
parse_single_expr_stmt(ArrayView[Token]
tokens)

    _ => { /* Error handling */ }
  }
}

Challenge: Handling Operator Precedence:

The most complex part of expression parsing is handling operator precedence. We need to ensure that 1 + 2 * 3 is correctly parsed as 1 + (2 * 3) and not (1 + 2) * 3.

💡 Application of Advanced Moonbit Features

Automatic Derivation Feature

pub enum Expr {
  // ...
} derive(Show, Eq, ToJson)

Moonbit's derive feature automatically generates common implementations for types. Here we use three:

Show: Provides debugging output functionality.
Eq: Supports equality comparison.
ToJson: Serializes to JSON format, which is convenient for debugging and persistence.

These automatically generated features are extremely useful in compiler development, especially during the debugging and testing phases.

Error Handling Mechanism

pub fn (tokens : ArrayView[Token]) -> (Expr, ArrayView[Token]) raise
parse_expression(ArrayView[Token]
tokens: #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) -> (enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr, #builtin.valtype
type ArrayView[T]
An ArrayView represents a view into a section of an array without copying the data.
Example
  let arr = [1, 2, 3, 4, 5]
  let view = arr[1:4]  // Creates a view of elements at indices 1,2,3
  assert_eq(view[0], 2)
  assert_eq(view.length(), 3)
ArrayView[enum Token {
  Bool(Bool)
  Int(Int)
  Double(Double)
  Keyword(String)
  Upper(String)
  Lower(String)
  Symbol(String)
  Bracket(Char)
  EOF
} derive(Show, Eq)
Token]) raise {
  // The 'raise' keyword indicates that this function may throw an exception
}

Moonbit's raise mechanism provides structured error handling, allowing syntax errors to be accurately located and reported.

Through this layered design and recursive descent parsing strategy, we have built a parser that is both flexible and efficient, laying a solid foundation for the subsequent type-checking phase.

Chapter 4: Type Checking and Semantic Analysis

Semantic Analysis is a crucial intermediate stage in compiler design. While parsing ensures the program's structure is correct, it doesn't mean the program is semantically valid. Type Checking, as the core component of semantic analysis, is responsible for verifying the type consistency of all operations in the program, ensuring type safety and runtime correctness.

Scope Management: Building the Environment Chain

The primary challenge in type checking is correctly handling variable scopes. At different levels of the program (global, function, block), the same variable name may refer to different entities. We adopt the classic design of an Environment Chain to solve this problem:

pub struct TypeEnv[K, V] {
  TypeEnv[K, V]?
parent : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv[type parameter K
K, type parameter V
V]?     // Reference to the parent environment
  Map[K, V]
data : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[type parameter K
K, type parameter V
V]            // Variable bindings in the current environment
}

The core of the environment chain is the variable lookup algorithm, which follows the rules of lexical scoping:

pub fn struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv::(self : TypeEnv[K, V], key : K) -> V?
get[K : trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}
Trait for types whose elements can test for equality
Eq + trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}
Trait for types that can be hashed
The hash method should return a hash value for the type, which is used in hash tables and other data structures.
The hash_combine method is used to combine the hash of the current value with another hash value,
typically used to hash composite types.
When two values are equal according to the Eq trait, they should produce the same hash value.
The hash method does not need to be implemented if hash_combine is implemented,
When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.
Hash, V](TypeEnv[K, V]
self : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
Self[type parameter K
K, type parameter V
V], K
key : type parameter K
K) -> type parameter V
V? {
  match TypeEnv[K, V]
self.Map[K, V]
data.(self : Map[K, V], key : K) -> V?
Retrieves the value associated with a given key in the hash map.
Parameters:

self : The hash map to search in.
key : The key to look up in the map.
Returns Some(value) if the key exists in the map, None otherwise.
Example:
  let map = { "key": 42 }
  inspect(map.get("key"), content="Some(42)")
  inspect(map.get("nonexistent"), content="None")
get(K
key) {
    (V) -> V?
Some(V
value) => (V) -> V?
Some(V
value)    // Found in the current environment
    V?
None =>
      match TypeEnv[K, V]
self.TypeEnv[K, V]?
parent {
        (TypeEnv[K, V]) -> TypeEnv[K, V]?
Some(TypeEnv[K, V]
parent_env) => TypeEnv[K, V]
parent_env.(self : TypeEnv[K, V], key : K) -> V?
get(K
key)  // Recursively search the parent environment
        TypeEnv[K, V]?
None => V?
None              // Reached the top-level environment, variable not defined
      }
  }
}

Design Principle: Lexical Scoping

This design ensures that variable lookup follows lexical scoping rules:

First, search in the current scope.

If not found, recursively search in the parent scope.

Continue until the variable is found or the global scope is reached.

Type Checker Architecture

Environment management alone is not sufficient to complete the type-checking task. Some operations (like function calls) need to access global program information. Therefore, we design a comprehensive type checker:

pub struct TypeChecker {
  TypeEnv[String, Type]
local_env : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv[String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type]    // Local variable environment
  Function
current_func : struct Function {
  name: String
  params: Array[(String, Type)]
  ret_ty: Type
  body: Array[Stmt]
} derive(Show, Eq, ToJson)
Function              // The function currently being checked
  Program
program : type Program Map[String, Function]
Program                    // Complete program information
}

Implementation of Partial Node Type Checking

The core of the type checker is to apply the corresponding type rules to different AST nodes. The following is the implementation of expression type checking:

pub fn enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Expr::(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(
  Expr
self : enum Expr {
  AtomExpr(AtomExpr, ty~ : Type?)
  Unary(String, Expr, ty~ : Type?)
  Binary(String, Expr, Expr, ty~ : Type?)
} derive(Show, Eq, ToJson)
Self,
  TypeEnv[String, Type]
env : struct TypeEnv[K, V] {
  parent: TypeEnv[K, V]?
  data: Map[K, V]
}
TypeEnv[String
String, enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type]
) -> enum Type {
  Unit
  Bool
  Int
  Double
} derive(Show, Eq, ToJson)
Type raise {
  match Expr
self {
    // Type checking for atomic expressions
    (AtomExpr, ty~ : Type?) -> Expr
AtomExprExpr
(AtomExpr
atom_exprExpr
, ..) as node => {
      let Type
ty = AtomExpr
atom_expr.(TypeEnv[String, Type]) -> Type
check_type(TypeEnv[String, Type]
env)
      Expr
nodeUnit
.ty = (Type) -> Type?
SomeUnit
(Type
tyUnit
)  // Fill in the type information
      Type
ty
    }

    // Type checking for unary operations
    (String, Expr, ty~ : Type?) -> Expr
UnaryExpr
("-", Expr
exprExpr
, ..) as node => {
      let Type
ty = Expr
expr.(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(TypeEnv[String, Type]
env)
      Expr
nodeUnit
.ty = (Type) -> Type?
SomeUnit
(Type
tyUnit
)
      Type
ty
    }

    // Type checking for binary operations
    (String, Expr, Expr, ty~ : Type?) -> Expr
BinaryExpr
("+", Expr
lhsExpr
, Expr
rhsExpr
, ..) as node => {
      let Type
lhs_type = Expr
lhs.(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(TypeEnv[String, Type]
env)
      let Type
rhs_type = Expr
rhs.(self : Expr, env : TypeEnv[String, Type]) -> Type raise
check_type(TypeEnv[String, Type]
env)

      // Ensure operand types are consistent
      guard Type
lhs_type (Type, Type) -> Bool
automatically derived
== Type
rhs_type else {
        raise Error
TypeCheckError(
          "Binary operation requires matching types, got \{Type
lhs_type} and \{Type
rhs_type}"
        )
      }

      let Type
result_type = match String
op {
        // Comparison operators always return a boolean value
        "==" | "!=" | "<" | "<=" | ">" | ">=" => Type::Type
Bool

        // Arithmetic operators, etc., maintain the operand type
        _ => Type
lhs_type
      }

      Expr
nodeUnit
.ty = (Type) -> Type?
SomeUnit
(Type
result_typeUnit
)
      Type
result_type
    }
  }
}

💡 Moonbit Enum Modification Trick

During the type-checking process, we need to fill in type information for the AST nodes. Moonbit provides an elegant way to modify the mutable fields of enum variants:

pub enum Expr {
  AtomExpr(AtomExpr, mut ty~ : Type?)
  Unary(String, Expr, mut ty~ : Type?)
  Binary(String, Expr, Expr, mut ty~ : Type?)
} derive(Show, Eq, ToJson)

By using the as binding in pattern matching, we can get a reference to the enum variant and modify its mutable fields:

match expr {
  AtomExpr(atom_expr, ..) as node => {
    let ?
ty = Unit
atom_expr.(Unit) -> ?
check_type(Unit
env)
    node.ty = Some(ty)  // Modify the mutable field
    ty
  }
  // ...
}

This design avoids the overhead of rebuilding the entire AST while maintaining a functional programming style.

Complete Compilation Flow Demonstration

After the three stages of lexical analysis, parsing, and type checking, our compiler frontend is now able to convert source code into a fully typed abstract syntax tree. Let's demonstrate the complete process with a simple example:

Source Code Example

fn (x : Int, y : Int) -> Int
add(Int
x: Int
Int, Int
y: Int
Int) -> Int
Int {
  return Int
x (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ Int
y;
}

Compilation Output: Typed AST

Using the derive(ToJson) feature, we can output the final AST in JSON format for inspection:

{
  "functions": {
    "add": {
      "name": "add",
      "params": [
        ["x", { "$tag": "Int" }],
        ["y", { "$tag": "Int" }]
      ],
      "ret_ty": { "$tag": "Int" },
      "body": [
        {
          "$tag": "Return",
          "0": {
            "$tag": "Binary",
            "0": "+",
            "1": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "x",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "2": {
              "$tag": "AtomExpr",
              "0": {
                "$tag": "Var",
                "0": "y",
                "ty": { "$tag": "Int" }
              },
              "ty": { "$tag": "Int" }
            },
            "ty": { "$tag": "Int" }
          }
        }
      ]
    }
  }
}

From this JSON output, we can clearly see:

Complete Function Signature: Including the parameter list and return type.
Type-Annotated AST Nodes: Each expression carries type information.
Structured Program Representation: Provides a clear data structure for the subsequent code generation phase.

Conclusion

In this article, we have delved into the complete implementation process of a compiler frontend. From a stream of characters to a typed abstract syntax tree, we have witnessed the unique advantages of the Moonbit language in compiler construction:

Core Takeaways

The Power of Pattern Matching: Moonbit's string pattern matching and structural pattern matching greatly simplify the implementation of lexical analysis and parsing.
Functional Programming Paradigm: The combination of the loop construct, environment chains, and immutable data structures provides a solution that is both elegant and efficient.
Expressive Type System: Through mutable fields in enums and trait objects, we can build data structures that are both type-safe and flexible.
Engineering Features: Features like derive, structured error handling, and JSON serialization significantly improve development efficiency.

Looking Ahead to Part 2

Having mastered the implementation of the frontend, the next article will guide us into the more exciting code generation phase. We will:

Delve into the design philosophy of LLVM Intermediate Representation.
Explore how to use Moonbit's official llvm.mbt binding library.
Implement the complete conversion from AST to LLVM IR.
Generate executable RISC-V assembly code.

Building a compiler is a complex and challenging process, but as we have shown in this article, Moonbit provides powerful and elegant tools for this task. Let's continue this exciting compiler construction journey in the next part.

Recommended Resources

Moonbit Official Documentation

llvm.mbt Documentation

llvm.mbt Project

LLVM Official Tutorial

Introduction​

TinyMoonbit Language Design​

Core Features​

Syntax Example​

Compilation Target​

Chapter 2: Lexical Analysis​

From Characters to Symbols: Token Design and Implementation​

Token Type System Design​

Leveraging Pattern Matching​

Core Analysis Function​

Example

Example

Example

Example

Example

Example

Example

Example

Example

Keyword Recognition Strategy​

Example

💡 In-depth Analysis of Moonbit Syntax Features​

Chapter 3: Parsing and Abstract Syntax Tree Construction​

Abstract Syntax Tree Design: A Structured Representation of the Program​

1. Core Type System​

2. Layered AST Node Design​

Example

Recursive Descent Parsing: A Top-Down Construction Strategy​

Parsing Atomic Expressions​

Example

Example

Parsing Statements​

Example

Example

💡 Application of Advanced Moonbit Features​

Automatic Derivation Feature​

Error Handling Mechanism​

Example

Example

Chapter 4: Type Checking and Semantic Analysis​

Scope Management: Building the Environment Chain​

Example

Type Checker Architecture​

Implementation of Partial Node Type Checking​

Complete Compilation Flow Demonstration​

Source Code Example​

Compilation Output: Typed AST​

Conclusion​

Core Takeaways​

Looking Ahead to Part 2​

Introduction

TinyMoonbit Language Design

Core Features

Syntax Example

Compilation Target

Chapter 2: Lexical Analysis

From Characters to Symbols: Token Design and Implementation

Token Type System Design

Leveraging Pattern Matching

Core Analysis Function

Keyword Recognition Strategy

💡 In-depth Analysis of Moonbit Syntax Features

Chapter 3: Parsing and Abstract Syntax Tree Construction

Abstract Syntax Tree Design: A Structured Representation of the Program

1. Core Type System

2. Layered AST Node Design

Recursive Descent Parsing: A Top-Down Construction Strategy

Parsing Atomic Expressions

Parsing Statements

💡 Application of Advanced Moonbit Features

Automatic Derivation Feature

Error Handling Mechanism

Chapter 4: Type Checking and Semantic Analysis

Scope Management: Building the Environment Chain

Type Checker Architecture

Implementation of Partial Node Type Checking

Complete Compilation Flow Demonstration

Source Code Example

Compilation Output: Typed AST

Conclusion

Core Takeaways

Looking Ahead to Part 2