Dancing with LLVM: A Moonbit Chronicle (Part 2) - LLVM Backend Generation

August 6, 2025 · 17 min read

Introduction

In the process of programming language design, the frontend is responsible for understanding and verifying the structure and semantics of a program, while the compiler backend takes on the task of translating these abstract concepts into executable machine code. The implementation of the backend not only requires a deep understanding of the target architecture but also mastery of complex optimization techniques to generate efficient code.

LLVM (Low Level Virtual Machine), as a comprehensive modern compiler infrastructure, provides us with a powerful and flexible solution. By converting a program into LLVM Intermediate Representation (IR), we can leverage LLVM's mature toolchain to compile the code to various target architectures, including RISC-V, ARM, and x86.

Moonbit's LLVM Ecosystem

Moonbit officially provides two important LLVM-related projects:

llvm.mbt: Moonbit language bindings for the original LLVM, providing direct access to the llvm-c interface. It requires the installation of the complete LLVM toolchain, can only generate for native backends, and requires you to handle compilation and linking yourself, but it can generate IR that is fully compatible with the original LLVM.

MoonLLVM: A pure Moonbit implementation of an LLVM-like system. It can generate LLVM IR without external dependencies and supports JavaScript and WebAssembly backends.

This article chooses llvm.mbt as our tool. Its API design is inspired by the highly acclaimed inkwell library in the Rust ecosystem.

In the previous article, "Dancing with LLVM: A Moonbit Chronicle (Part 1) - Implementing the Frontend," we completed the conversion from source code to a typed abstract syntax tree. This article will build on that achievement, focusing on the core techniques and implementation details of code generation.

Chapter 1: Representing the LLVM Type System in Moonbit

Before diving into code generation, we first need to understand how llvm.mbt represents LLVM's various concepts within Moonbit's type system. LLVM's type system is quite complex, containing multiple levels such as basic types, composite types, and function types.

Trait Objects: An Abstract Representation of Types

In the API design of llvm.mbt, you will frequently encounter the core concept of &Type. This is not a concrete struct or enum, but a Trait Object—which can be understood as the functional equivalent of an abstract base class in object-oriented programming.

// &Type is a trait object representing any LLVM type
let Unit
some_type: &Type = Unit
context.() -> Unit
i32_type()

Type Identification and Conversion

To determine the specific type of a &Type, we need to perform a runtime type check using the as_type_enum interface:

pub fn fn identify_type(ty : Unit) -> String
identify_type(Unit
ty: &Type) -> String
String {
  match Unit
ty.() -> Unit
as_type_enum() {
    (Unit) -> Unit
IntType(Unit
int_ty) => "Integer type with \{Unit
int_ty.() -> Unit
get_bit_width()} bits"
    (_/0) -> Unit
FloatType(_/0
float_ty) => "Floating point type"
    (_/0) -> Unit
PointerType(_/0
ptr_ty) => "Pointer type"
    (_/0) -> Unit
FunctionType(_/0
func_ty) => "Function type"
    (_/0) -> Unit
ArrayType(_/0
array_ty) => "Array type"
    (_/0) -> Unit
StructType(_/0
struct_ty) => "Structure type"
    (_/0) -> Unit
VectorType(_/0
vec_ty) => "Vector type"
    (_/0) -> Unit
ScalableVectorType(_/0
svec_ty) => "Scalable vector type"
    (_/0) -> Unit
MetadataType(_/0
meta_ty) => "Metadata type"
  }
}

Safe Type Conversion Strategies

When we are certain that a &Type has a specific type, there are several conversion methods to choose from:

Direct Conversion (for deterministic scenarios)

let Unit
ty: &Type = Unit
context.() -> Unit
i32_type()
let ?
i32_ty = let ty : Unit
ty.() -> ?
into_int_type()  // Direct conversion, errors are handled by llvm.mbt
let ?
bit_width = let i32_ty : ?
i32_ty.() -> ?
get_bit_width()  // Call a method specific to IntType

Defensive Conversion (recommended for production environments)

let Unit
ty: &Type = () -> Unit
get_some_type()  // An unknown type obtained from somewhere

guard ty.as_type_enum() is IntType(i32_ty) else {
  raise CodeGenError("Expected integer type, got \{ty}")
}

// Now it's safe to use i32_ty
let ?
bit_width = let i32_ty : ?
i32_ty.() -> ?
get_bit_width()

Constructing Composite Types

LLVM supports various composite types, which are usually constructed through methods of basic types:

pub fn fn create_composite_types(context : ?) -> Unit
create_composite_types(?
context: @llvm.Context) -> Unit
Unit {
  let Unit
i32_ty = ?
context.() -> Unit
i32_type()
  let Unit
f64_ty = ?
context.() -> Unit
f64_type()

  // Array type: [16 x i32]
  let Unit
i32_array_ty = Unit
i32_ty.(Int) -> Unit
array_type(16)

  // Function type: i32 (i32, i32)
  let Unit
add_func_ty = Unit
i32_ty.(Array[Unit]) -> Unit
fn_type([Unit
i32_ty, Unit
i32_ty])

  // Struct type: {i32, f64}
  let Unit
struct_ty = ?
context.(Array[Unit]) -> Unit
struct_type([Unit
i32_ty, Unit
f64_ty])

  // Pointer type (all pointers are opaque in LLVM 18+)
  let Unit
ptr_ty = Unit
i32_ty.() -> Unit
ptr_type()

  // Output type information for verification
  fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Array type: \{Unit
i32_array_ty}")      // [16 x i32]
  fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Function type: \{Unit
add_func_ty}")    // i32 (i32, i32)
  fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Struct type: \{Unit
struct_ty}")        // {i32, f64}
  fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Pointer type: \{Unit
ptr_ty}")          // ptr
}

Important Reminder: Opaque Pointers

Starting with LLVM version 18, all pointer types use the opaque pointer design. This means that regardless of the type they point to, all pointers are represented as ptr in the IR, and the specific type information they point to is no longer visible in the type system.

Chapter 2: The LLVM Value System and the BasicValue Concept

Compared to the type system, LLVM's value system is more complex. llvm.mbt, consistent with inkwell, divides values into two important abstract layers: Value and BasicValue. The difference lies in distinguishing the source of value creation from the way values are used:

Value: Focuses on how a value is produced (e.g., constants, instruction results).
BasicValue: Focuses on what basic type a value has (e.g., integer, float, pointer).

Practical Application Example

pub fn fn demonstrate_value_system(context : ?, builder : ?) -> Unit
demonstrate_value_system(?
context: Context, ?
builder: Builder) -> Unit
Unit {
  let Unit
i32_ty = ?
context.() -> Unit
i32_type()

  // Create two integer constants - these are directly IntValue
  let Unit
const1 = Unit
i32_ty.(Int) -> Unit
const_int(10)  // Value: IntValue, BasicValue: IntValue
  let Unit
const2 = Unit
i32_ty.(Int) -> Unit
const_int(20)  // Value: IntValue, BasicValue: IntValue

  // Perform an addition operation - the result is an InstructionValue
  let Unit
add_result = ?
builder.(Unit, Unit) -> Unit
build_int_add(Unit
const1, Unit
const2)

  // In different contexts, we need different perspectives:

  // As an instruction to check its properties
  let Unit
instruction = Unit
add_result.() -> Unit
as_instruction()
  fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Instruction opcode: \{Unit
instruction.() -> Unit
get_opcode()}")

  // As a basic value to get its type
  let Unit
basic_value = Unit
add_result.() -> Unit
into_basic_value()
  fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:

value : The value to be printed. Must implement the Show trait.
Example:
  println(42)
  println("Hello, World!")
  println([1, 2, 3])
println("Result type: \{Unit
basic_value.() -> Unit
get_type()}")

  // As an integer value for subsequent calculations
  let Unit
int_value = Unit
add_result.() -> Unit
into_int_value()
  let Unit
final_result = ?
builder.(Unit, Unit) -> Unit
build_int_mul(Unit
int_value, Unit
const1)
}

Complete Classification of Value Types

ValueEnum: All possible value types

pub enum ValueEnum {
  (?) -> ValueEnum
IntValue(IntValue)              // Integer value
  (?) -> ValueEnum
FloatValue(FloatValue)          // Floating-point value
  (?) -> ValueEnum
PointerValue(PointerValue)      // Pointer value
  (?) -> ValueEnum
StructValue(StructValue)        // Struct value
  (?) -> ValueEnum
FunctionValue(FunctionValue)    // Function value
  (?) -> ValueEnum
ArrayValue(ArrayValue)          // Array value
  (?) -> ValueEnum
VectorValue(VectorValue)        // Vector value
  (?) -> ValueEnum
PhiValue(PhiValue)             // Phi node value
  (?) -> ValueEnum
ScalableVectorValue(ScalableVectorValue)  // Scalable vector value
  (?) -> ValueEnum
MetadataValue(MetadataValue)    // Metadata value
  (?) -> ValueEnum
CallSiteValue(CallSiteValue)    // Call site value
  (?) -> ValueEnum
GlobalValue(GlobalValue)        // Global value
  (?) -> ValueEnum
InstructionValue(InstructionValue)  // Instruction value
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show)

BasicValueEnum: Values that have a basic type

pub enum BasicValueEnum {
  (?) -> BasicValueEnum
ArrayValue(ArrayValue)              // Array value
  (?) -> BasicValueEnum
IntValue(IntValue)                  // Integer value
  (?) -> BasicValueEnum
FloatValue(FloatValue)              // Floating-point value
  (?) -> BasicValueEnum
PointerValue(PointerValue)          // Pointer value
  (?) -> BasicValueEnum
StructValue(StructValue)            // Struct value
  (?) -> BasicValueEnum
VectorValue(VectorValue)            // Vector value
  (?) -> BasicValueEnum
ScalableVectorValue(ScalableVectorValue)  // Scalable vector value
} derive(trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}
Trait for types that can be converted to String
Show)

💡 Best Practices for Value Conversion

In the actual code generation process, we often need to convert between different value perspectives:

pub fn fn value_conversion_patterns(instruction_result : Unit) -> Unit
value_conversion_patterns(Unit
instruction_result: &Value) -> Unit
Unit {
  // Pattern 1: I know what type this is, convert directly
  let Unit
int_val = Unit
instruction_result.() -> Unit
into_int_value()

  // Pattern 2: I just need a basic value, I don't care about the specific type
  let Unit
basic_val = Unit
instruction_result.() -> Unit
into_basic_value()

  // Pattern 3: Defensive programming, check before converting
  match Unit
instruction_result.() -> Unit
as_value_enum() {
    // Handle integer values
    (Unit) -> Unit
IntValue(Unit
int_val) => (Unit) -> Unit
handle_integer(Unit
int_val)
    // Handle float values
    (Unit) -> Unit
FloatValue(Unit
float_val) => (Unit) -> Unit
handle_float(Unit
float_val)
    _ => raise Error
CodeGenError("Unexpected value type")
  }
}

Through this two-layer abstraction, llvm.mbt maintains the integrity of the LLVM value system while providing an intuitive and easy-to-use interface for Moonbit developers.

Chapter 3: Practical LLVM IR Generation

Now that we understand the type and value systems, let's demonstrate how to use llvm.mbt to generate LLVM IR with a complete example. This example will implement a simple muladd function, showing the entire process from initialization to instruction generation.

Infrastructure Initialization

Any LLVM program begins by establishing three core components:

pub fn fn initialize_llvm() -> (?, ?, ?)
initialize_llvm() -> (Context, Module, Builder) {
  // 1. Create an LLVM context - a container for all LLVM objects
  let ?
context = () -> ?
@llvm.Context::() -> ?
create()

  // 2. Create a module - a container for functions and global variables
  let ?
module = ?
context.(String) -> ?
create_module("demo_module")

  // 3. Create an IR builder - used to generate instructions
  let ?
builder = ?
context.() -> ?
create_builder()

  (?
context, ?
module, ?
builder)
}

A Simple Function Generation Example

Let's implement a function that calculates (a * b) + c:

pub fn fn generate_muladd_function() -> String
generate_muladd_function() -> String
String {
  // Initialize LLVM infrastructure
  let (?
context, ?
module, ?
builder) = fn initialize_llvm() -> (?, ?, ?)
initialize_llvm()

  // Define the function signature
  let Unit
i32_ty = ?
context.() -> Unit
i32_type()
  let Unit
func_type = Unit
i32_ty.(Array[Unit]) -> Unit
fn_type([Unit
i32_ty, Unit
i32_ty, Unit
i32_ty])
  let Unit
func_value = ?
module.(String, Unit) -> Unit
add_function("muladd", Unit
func_type)

  // Create the function entry basic block
  let Unit
entry_block = ?
context.(Unit, String) -> Unit
append_basic_block(Unit
func_value, "entry")
  ?
builder.(Unit) -> Unit
position_at_end(Unit
entry_block)

  // Get the function parameters
  let Unit
arg_a = Unit
func_value.(Int) -> Unit
get_nth_param(0).() -> Unit
unwrap().() -> Unit
into_int_value()
  let Unit
arg_b = Unit
func_value.(Int) -> Unit
get_nth_param(1).() -> Unit
unwrap().() -> Unit
into_int_value()
  let Unit
arg_c = Unit
func_value.(Int) -> Unit
get_nth_param(2).() -> Unit
unwrap().() -> Unit
into_int_value()

  // Generate calculation instructions
  let Unit
mul_result = ?
builder.(Unit, Unit) -> Unit
build_int_mul(Unit
arg_a, Unit
arg_b).() -> Unit
into_int_value()
  let Unit
add_result = ?
builder.(Unit, Unit) -> Unit
build_int_add(Unit
mul_result, Unit
arg_c)

  // Generate the return instruction
  let _ = ?
builder.(Unit) -> Unit
build_return(Unit
add_result)

  // Output the generated IR
  ?
module.() -> String
dump()
}

Generated LLVM IR

Running the above code will produce the following LLVM Intermediate Representation:

; ModuleID = 'demo_module'
source_filename = "demo_module"

define i32 @muladd(i32 %0, i32 %1, i32 %2) {
entry:
  %3 = mul i32 %0, %1
  %4 = add i32 %3, %2
  ret i32 %4
}

💡 Code Generation Best Practices

Naming Conventions

For instructions that return a value, the build interface has a name label argument, which can be used to add a name to the result of the instruction.

let ?
mul_result = Unit
builder.(Unit, Unit, String) -> ?
build_int_mul(Unit
lhs, Unit
rhs, String
name="temp_product")
let ?
final_result = Unit
builder.(?, Unit, String) -> ?
build_int_add(let mul_result : ?
mul_result, Unit
offset, String
name="final_sum")

Error Handling

Use raise instead of panic for error handling, and manage exceptions for situations that are not easy to determine directly.

// Check for operations that might fail
match func_value.get_nth_param(index) {
  Some(param) => param.into_int_value()
  None => raise CodeGenError("Function parameter \{index} not found")
}

Chapter 4: TinyMoonbit Compiler Implementation

Now let's turn our attention to the actual compiler implementation, converting the abstract syntax tree we built in the previous article into LLVM IR.

Type Mapping: From Parser to LLVM

First, we need to establish a mapping between the TinyMoonbit type system and the LLVM type system:

pub struct CodeGen {
  ?
parser_program : Program                    // AST representation of the source program
  ?
llvm_context : @llvm.Context               // LLVM context
  ?
llvm_module : @llvm.Module                 // LLVM module
  ?
builder : @llvm.Builder                    // IR builder
  Map[String, ?]
llvm_functions : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
String, @llvm.FunctionValue]  // Function map
}

pub fn (?, ?) -> Unit raise
convert_type(?
self : Self, ?
parser_type : Type) -> &@llvm.Type raise {
  match ?
parser_type {
    Type::?
Unit => ?
selfUnit
.?
llvm_contextUnit
.() -> Unit
void_typeUnit
() as &@llvm.Type
    Type::?
Bool => ?
self.?
llvm_context.() -> Unit
bool_type()
    Type::?
Int => ?
self.?
llvm_context.() -> Unit
i32_type()
    Type::?
Double => ?
self.?
llvm_context.() -> Unit
f64_type()
    // Can be extended with more types as needed
  }
}

Environment Management: Mapping Variables to Values

During the code generation phase, we need to maintain a mapping from variable names to LLVM values:

pub struct Env {
  Env?
parent : struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env?                        // Reference to the parent environment
  Map[String, Unit]
symbols : type Map[K, V]
Mutable linked hash map that maintains the order of insertion, not thread safe.
Example
  let map = { 3: "three", 8 :  "eight", 1 :  "one"}
  assert_eq(map.get(2), None)
  assert_eq(map.get(3), Some("three"))
  map.set(3, "updated")
  assert_eq(map.get(3), Some("updated"))
Map[String
String, &@llvm.Value]        // Local variable map

  // Global information
  CodeGen
codegen : struct CodeGen {
  parser_program: ?
  llvm_context: ?
  llvm_module: ?
  builder: ?
  llvm_functions: Map[String, ?]
}
CodeGen                           // Reference to the code generator
  ?
parser_function : Function                  // AST of the current function
  ?
llvm_function : @llvm.FunctionValue         // LLVM representation of the current function
}

pub fn (?, String) -> Unit?
get_symbol(?
self : Self, String
name : String
String) -> &@llvm.Value? {
  match ?
self.Map[String, Unit]
symbols.fn[K : Hash + Eq, V] Map::get(self : Map[K, V], key : K) -> V?
Retrieves the value associated with a given key in the hash map.
Parameters:

self : The hash map to search in.
key : The key to look up in the map.
Returns Some(value) if the key exists in the map, None otherwise.
Example:
let map = { "key": 42 }
inspect(map.get("key"), content="Some(42)")
inspect(map.get("nonexistent"), content="None")
get(String
name) {
    (Unit) -> Unit?
Some(Unit
value) => (Unit) -> Unit?
Some(Unit
value)
    Unit?
None =>
      match ?
self.Env?
parent {
        (Env) -> Env?
Some(Env
parent_env) => Env
parent_env.(String) -> Unit?
get_symbol(String
name)
        Env?
None => Unit?
None
      }
  }
}

Variable Handling: Memory Allocation Strategy

As a systems-level language, TinyMoonbit supports variable reassignment. In LLVM IR's SSA (Static Single Assignment) form, we need to use the alloca + load/store pattern to implement mutable variables:

pub fn Stmt::(?, Env) -> Unit raise
emit(?
self : Self, Env
env : struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> Unit
Unit raise {
  match ?
self {
    // Variable declaration: e.g., let x : Int = 5;
    (String, Unit, Unit) -> ?
Let(String
var_name, Unit
var_type, Unit
init_expr) => {
      // Convert the type and allocate stack space
      let Unit
llvm_type = Env
env.CodeGen
codegen.(Unit) -> Unit
convert_type(Unit
var_type)
      let Unit
alloca = Env
env.CodeGen
codegen.?
builder.(Unit, String) -> Unit
build_alloca(Unit
llvm_type, String
var_name)

      // Record the allocated pointer in the symbol table
      Env
env.Map[String, Unit]
symbols.fn[K : Hash + Eq, V] Map::set(self : Map[K, V], key : K, value : V) -> Unit
Sets a key-value pair into the hash map. If the key already exists, updates
its value. If the hash map is near full capacity, automatically
grows the internal storage to accommodate more entries.
Parameters:

map : The hash map to modify.
key : The key to insert or update. Must implement Hash and Eq traits.
value : The value to associate with the key.
Example:
let map : Map[String, Int] = Map::new()
map.set("key", 42)
inspect(map.get("key"), content="Some(42)")
map.set("key", 24) // update existing key
inspect(map.get("key"), content="Some(24)")
set(String
var_name, Unit
allocaUnit
 as &@llvm.Value)

      // Calculate the value of the initialization expression
      let Unit
init_value = Unit
init_expr.(Env) -> Unit
emit(Env
env).() -> Unit
into_basic_value()

      // Store the initial value into the allocated memory
      let _ = Env
env.CodeGen
codegen.?
builder.(Unit, Unit) -> Unit
build_store(Unit
alloca, Unit
init_value)
    }

    // Variable assignment: x = 10;
    (Unit, Unit) -> ?
Assign(Unit
var_name, Unit
rhs_expr) => {
      // Get the memory address of the variable from the symbol table
      guard let (_/0) -> Unit
Some(_/0
var_ptr) = Env
env.(Unit) -> Unit
get_symbol(Unit
var_name) else {
        raise Error
CodeGenError("Undefined variable: \{Unit
var_name}")
      }

      // Calculate the value of the right-hand side expression
      let Unit
rhs_value = Unit
rhs_expr.(Env) -> Unit
emit(Env
env).() -> Unit
into_basic_value()

      // Store the new value into the variable's memory
      let _ = Env
env.CodeGen
codegen.?
builder.(Unit, Unit) -> Unit
build_store(Unit
var_ptr, Unit
rhs_value)
    }

    // Other statement types...
    _ => { /* Handle other statements */ }
  }
}

Design Decision: Why use alloca?

In functional languages, immutable variables can be directly mapped to SSA values. However, TinyMoonbit supports variable reassignment, which conflicts with the SSA principle of "each variable is assigned only once."

The alloca + load/store pattern is the standard way to handle mutable variables:

alloca: Allocates memory space on the stack.

store: Writes a value to memory.

load: Reads a value from memory.

LLVM's optimization process will automatically convert simple allocas back to value form (the mem2reg optimization).

Expression Code Generation

Expression code generation is relatively straightforward, mainly involving calling the corresponding instruction-building methods based on the expression type:

fn Expr::(?, Env) -> Unit raise
emit(?
self: Self, Env
env: struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> &@llvm.Value raise {
  match ?
self {
    (Unit) -> ?
AtomExpr(Unit
atom_expr, ..) => Unit
atom_expr.(Env) -> Unit
emit(Env
env)
    (String, Unit, _/0) -> ?
Unary("-", Unit
expr, _/0
ty = (_/0) -> _/0
Some(_/0
Int)) => {
      let Unit
value = Unit
expr.() -> Unit
emit().() -> Unit
into_int_value()
      let Unit
zero = Env
env.Unit
gen.Unit
llvm_ctx.() -> Unit
i32_type().() -> Unit
const_zero()
      Env
env.Unit
gen.?
builder.(Unit, Unit) -> Unit
build_int_sub(Unit
zero, Unit
value)
    }
    (String, Unit, _/0) -> ?
Unary("-", Unit
expr, _/0
ty = (_/0) -> _/0
Some(_/0
Double)) => {
      let Unit
value = Unit
expr.() -> Unit
emit().() -> Unit
into_float_value()
      Env
env.Unit
gen.?
builder.(Unit) -> Unit
build_float_neg(Unit
value)
    }
    (String, Unit, Unit, _/0) -> ?
Binary("+", Unit
lhs, Unit
rhs, _/0
ty=(_/0) -> _/0
Some(_/0
Int)) => {
      let Unit
lhs_val = Unit
lhs.() -> Unit
emit().() -> Unit
into_int_value()
      let Unit
rhs_val = Unit
rhs.() -> Unit
emit().() -> Unit
into_int_value()
      Env
env.Unit
gen.?
builder.(Unit, Unit) -> Unit
build_int_add(Unit
lhs_val, Unit
rhs_val)
    }
    // ... others
  }
}

Technical Detail: Floating-Point Negation

Note that when handling floating-point negation, we use build_float_neg instead of subtracting the operand from zero. This is because:

IEEE 754 Standard: Floating-point numbers have special values (like NaN, ∞), and simple subtraction might produce incorrect results.

Performance Considerations: Dedicated negation instructions are usually more efficient on modern processors.

Precision Guarantee: Avoids unnecessary rounding errors.

Chapter 5: Implementation of Control Flow Instructions

Control flow is the backbone of program logic, including conditional branches and loop structures. In LLVM IR, control flow is implemented through Basic Blocks and branch instructions. Each basic block represents a sequence of instructions with no internal jumps, and blocks are connected by branch instructions.

Conditional Branches: Implementing if-else Statements

Conditional branches require creating multiple basic blocks to represent different execution paths:

fn Stmt::(?, Env) -> Unit raise
emit(?
self: Self, Env
env: struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> Unit
Unit raise {
  let Unit
ctx = Env
env.Unit
gen.Unit
llvm_ctx
  let Unit
func = Env
env.Unit
llvm_func
  let ?
builder = Env
env.Unit
gen.?
builder
  match ?
self {
    (Unit, Unit, Unit) -> ?
If(Unit
cond, Unit
then_stmts, Unit
else_stmts) => {
      let Unit
cond_val = Unit
cond.(Env) -> Unit
emit(Env
env).() -> Unit
into_int_value()

      // Create three basic blocks
      let Unit
then_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)
      let Unit
else_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)
      let Unit
merge_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)

      // Create the jump instruction
      let _ = ?
builder.(Unit, Unit, Unit) -> Unit
build_conditional_branch(
        Unit
cond_val, Unit
then_block, Unit
else_block,
      )

      // Generate code for the then_block
      ?
builder.(Unit) -> Unit
position_at_end(Unit
then_block)
      let Unit
then_env = ?
self.() -> Unit
subenv()
      Unit
then_stmts.((Unit) -> Unit) -> Unit
each(Unit
s => Unit
s.(Unit) -> Unit
emitStmt(Unit
then_env))
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
merge_block)

      // Generate code for the else_block
      ?
builder.(Unit) -> Unit
position_at_end(Unit
else_block)
      let Unit
else_env = ?
self.() -> Unit
subenv()
      Unit
else_stmts.((Unit) -> Unit) -> Unit
each(Unit
s => Unit
s.(Unit) -> Unit
emitStmt(Unit
else_env))
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
merge_block)

      // After code generation is complete, the builder's position should be on the merge_block
      ?
builder.(Unit) -> Unit
position_at_end(Unit
merge_block)

    }
    // ...
  }
}

Generated LLVM IR Example

For the following TinyMoonbit code:

if x > 0 {
  y = x + 1;
} else {
  y = x - 1;
}

It will generate LLVM IR similar to this:

  %1 = load i32, ptr %x, align 4
  %2 = icmp sgt i32 %1, 0
  br i1 %2, label %if.then, label %if.else

if.then:                                          ; preds = %0
  %3 = load i32, ptr %x, align 4
  %4 = add i32 %3, 1
  store i32 %4, ptr %y, align 4
  br label %if.end

if.else:                                          ; preds = %0
  %5 = load i32, ptr %x, align 4
  %6 = sub i32 %5, 1
  store i32 %6, ptr %y, align 4
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  ; Subsequent code...

Loop Structures: Implementing while Statements

The implementation of loops requires special attention to the correct connection of the condition check and the loop body:

fn Stmt::(?, Env) -> Unit raise
emit(?
self: Self, Env
env: struct Env {
  parent: Env?
  symbols: Map[String, Unit]
  codegen: CodeGen
  parser_function: ?
  llvm_function: ?
}
Env) -> Unit
Unit raise {
  let Unit
ctx = Env
env.Unit
gen.Unit
llvm_ctx
  let Unit
func = Env
env.Unit
llvm_func
  let ?
builder = Env
env.Unit
gen.?
builder
  match ?
self {
    (Unit, Unit) -> ?
While(Unit
cond, Unit
body) => {
      // Generate three blocks
      let Unit
cond_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)
      let Unit
body_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)
      let Unit
merge_block = Unit
ctx.(Unit) -> Unit
append_basic_block(Unit
llvm_func)

      // First, unconditionally jump to the cond block
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
cond_block)
      ?
builder.(Unit) -> Unit
position_at_end(Unit
cond_block)

      // Generate code within the cond block, as well as the conditional jump instruction
      let Unit
cond_val = Unit
cond.() -> Unit
emit().() -> Unit
into_int_value()
      let _ = ?
builder.(Unit, Unit, Unit) -> Unit
build_conditional_branch(
        Unit
cond_val, Unit
body_block, Unit
merge_block,
      )
      ?
builder.(Unit) -> Unit
position_at_end(Unit
body_block)

      // Generate code for the body block, with an unconditional jump to the cond block at the end
      let Unit
body_env = ?
self.() -> Unit
subenv()
      Unit
body.((Unit) -> Unit) -> Unit
each(Unit
s => Unit
s.(Unit) -> Unit
emitStmt(Unit
body_env))
      let _ = ?
builder.(Unit) -> Unit
build_unconditional_branch(Unit
cond_block)

      // After code generation is finished, jump to the merge block
      ?
builder.(Unit) -> Unit
position_at_end(Unit
merge_block)
    }
    // ...
  }
}

Generated LLVM IR Example

For the TinyMoonbit code:

while i < 10 {
  i = i + 1;
}

It will generate:

  br label %while.cond

while.cond:                                       ; preds = %while.body, %0
  %1 = load i32, ptr %i, align 4
  %2 = icmp slt i32 %1, 10
  br i1 %2, label %while.body, label %while.end

while.body:                                       ; preds = %while.cond
  %3 = load i32, ptr %i, align 4
  %4 = add i32 %3, 1
  store i32 %4, ptr %i, align 4
  br label %while.cond

while.end:                                        ; preds = %while.cond
  ; Subsequent code...

💡 Control Flow Design Points

Basic Block Naming Strategy

The append_basic_block function also has a name label argument.

// Use descriptive block names for easier debugging and understanding
let ?
then_block = Unit
context.(Unit, String) -> ?
append_basic_block(Unit
func, String
name="if.then")
let ?
else_block = Unit
context.(Unit, String) -> ?
append_basic_block(Unit
func, String
name="if.else")
let ?
merge_block = Unit
context.(Unit, String) -> ?
append_basic_block(Unit
func, String
name="if.end")

Scope Management

// Create a separate scope for each branch and loop body
let ?
branch_env = Unit
env.() -> ?
sub_env()
branch_stmts.each( stmt => stmt.emit(branch_env) }

Builder Position Management

At the end, be sure to place the instruction builder on the correct basic block.

// Always ensure the builder points to the correct basic block
builder.position_at_end(merge_block)
// Generate instructions in this block...

Chapter 6: From LLVM IR to Machine Code

After generating the complete LLVM IR, we need to convert it into assembly code for the target machine. Although llvm.mbt provides a complete target machine configuration API, for learning purposes, we can use a simpler method.

Compiling with the `llc` Toolchain

The most direct method is to output the generated LLVM IR to a file and then use the LLVM toolchain to compile it:

Call the dump function of the Module, or you can use the println function.

let CodeGen
gen : struct CodeGen {
  parser_program: ?
  llvm_context: ?
  llvm_module: ?
  builder: ?
  llvm_functions: Map[String, ?]
}
CodeGen = ...
let ?
prog = let gen : CodeGen
gen.?
llvm_prog
prog.dump() // dump is recommended as it will be slightly faster than println, with the same effect

// or println(prog)

Complete Compilation Flow Example

Let's look at a complete compilation flow from source code to assembly code:

TinyMoonbit Source Code

fn fn factorial(n : Int) -> Int
factorial(Int
n: Int
Int) -> Int
Int {
  if Int
n fn Compare::op_le(x : Int, y : Int) -> Bool
<= 1 {
    return 1;
  }
  return Int
n fn Mul::mul(self : Int, other : Int) -> Int
Multiplies two 32-bit integers. This is the implementation of the *
operator for Int.
Parameters:

self : The first integer operand.
other : The second integer operand.
Returns the product of the two integers. If the result overflows the range of
Int, it wraps around according to two's complement arithmetic.
Example:
inspect(42 * 2, content="84")
inspect(-10 * 3, content="-30")
let max = 2147483647 // Int.max_value
inspect(max * 2, content="-2") // Overflow wraps around
* fn factorial(n : Int) -> Int
factorial(Int
n fn Sub::sub(self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:

self : The minuend (the number being subtracted from).
other : The subtrahend (the number to subtract).
Returns the difference between self and other.
Example:
let a = 42
let b = 10
inspect(a - b, content="32")
let max = 2147483647 // Int maximum value
inspect(max - -1, content="-2147483648") // Overflow case
- 1);
}

fn main() -> Unit {
  let Int
result: Int
Int = fn factorial(n : Int) -> Int
factorial(5);
  (Int) -> Unit
print_int(Int
result);
}

Generated LLVM IR

; ModuleID = 'tinymoonbit'
source_filename = "tinymoonbit"

define i32 @factorial(i32 %0) {
entry:
  %1 = alloca i32, align 4
  store i32 %0, ptr %1, align 4
  %2 = load i32, ptr %1, align 4
  %3 = icmp sle i32 %2, 1
  br i1 %3, label %4, label %6

4:                                                ; preds = %entry
  ret i32 1

6:                                                ; preds = %entry
  %7 = load i32, ptr %1, align 4
  %8 = load i32, ptr %1, align 4
  %9 = sub i32 %8, 1
  %10 = call i32 @factorial(i32 %9)
  %11 = mul i32 %7, %10
  ret i32 %11
}

define void @main() {
entry:
  %0 = alloca i32, align 4
  %1 = call i32 @factorial(i32 5)
  store i32 %1, ptr %0, align 4
  %2 = load i32, ptr %0, align 4
  call void @print_int(i32 %2)
  ret void
}

declare void @print_int(i32 %0)

Generating RISC-V Assembly with llc

# Generate llvm ir
moon run main --target native > fact.ll

# Generate RISC-V 64-bit assembly code
llc -march=riscv64 -mattr=+m -o fact.s fact.ll

Generated RISC-V Assembly Snippet

factorial:
.Lfunc_begin0:
	.cfi_startproc
	addi	sp, sp, -32
	.cfi_def_cfa_offset 32
	sd	ra, 24(sp)
	.cfi_offset ra, -8
	sd	s0, 16(sp)
	.cfi_offset s0, -16
	addi	s0, sp, 32
	.cfi_def_cfa s0, 0
	sw	a0, -20(s0)
	lw	a0, -20(s0)
	li	a1, 1
	blt	a1, a0, .LBB0_2
	li	a0, 1
	j	.LBB0_3
.LBB0_2:
	lw	a0, -20(s0)
	lw	a1, -20(s0)
	addi	a1, a1, -1
	sw	a0, -24(s0)
	mv	a0, a1
	call	factorial
	lw	a1, -24(s0)
	mul	a0, a1, a0
.LBB0_3:
	ld	ra, 24(sp)
	ld	s0, 16(sp)
	addi	sp, sp, 32
	ret

Conclusion

Through this two-part series, we have completed a fully functional, albeit simple, compiler implementation. From the lexical analysis of a character stream to the construction of an abstract syntax tree, and finally to the generation of LLVM IR and machine code output.

Review

Part 1:

An elegant lexer based on pattern matching
Implementation of a recursive descent parser
A complete type-checking system
Scope management with an environment chain

Part 2:

A deep dive into the LLVM type and value systems
Variable management strategies in SSA form
Correct implementation of control flow instructions
A complete code generation pipeline

Moonbit's Advantages in Compiler Development

Through this practical project, we have gained a deep appreciation for Moonbit's unique value in the field of compiler construction:

Expressive Pattern Matching: Greatly simplifies the complexity of AST processing and type analysis.
Functional Programming Paradigm: Immutable data structures and pure functions make the compiler logic clearer and more reliable.
Modern Type System: Trait objects, generics, and error handling mechanisms provide ample abstraction capabilities.
Excellent Engineering Features: Features like derive and JSON serialization significantly improve development efficiency.

Final Words

Compiler technology represents the perfect combination of computer science theory and engineering practice. With a modern tool like Moonbit, we can explore this ancient yet vibrant field in a more elegant and efficient way.

We hope this series of articles will provide readers with a powerful aid on their journey into compiler design.

Recommended Learning Resources

Moonbit Official Documentation

llvm.mbt Documentation

llvm.mbt Project

LLVM Official Tutorial

Introduction​

Chapter 1: Representing the LLVM Type System in Moonbit​

Trait Objects: An Abstract Representation of Types​

Type Identification and Conversion​

Safe Type Conversion Strategies​

Constructing Composite Types​

Chapter 2: The LLVM Value System and the BasicValue Concept​

Practical Application Example​

Complete Classification of Value Types​

💡 Best Practices for Value Conversion​

Chapter 3: Practical LLVM IR Generation​

Infrastructure Initialization​

A Simple Function Generation Example​

Generated LLVM IR​

💡 Code Generation Best Practices​

Chapter 4: TinyMoonbit Compiler Implementation​

Type Mapping: From Parser to LLVM​

Example

Environment Management: Mapping Variables to Values​

Example

Variable Handling: Memory Allocation Strategy​

Expression Code Generation​

Chapter 5: Implementation of Control Flow Instructions​

Conditional Branches: Implementing if-else Statements​

Generated LLVM IR Example​

Loop Structures: Implementing while Statements​

Generated LLVM IR Example​

Chapter 6: From LLVM IR to Machine Code​

Compiling with the llc Toolchain​

Complete Compilation Flow Example​

Conclusion​

Review​

Moonbit's Advantages in Compiler Development​

Final Words​

Introduction

Chapter 1: Representing the LLVM Type System in Moonbit

Trait Objects: An Abstract Representation of Types

Type Identification and Conversion

Safe Type Conversion Strategies

Constructing Composite Types

Chapter 2: The LLVM Value System and the BasicValue Concept

Practical Application Example

Complete Classification of Value Types

💡 Best Practices for Value Conversion

Chapter 3: Practical LLVM IR Generation

Infrastructure Initialization

A Simple Function Generation Example

Generated LLVM IR

💡 Code Generation Best Practices

Chapter 4: TinyMoonbit Compiler Implementation

Type Mapping: From Parser to LLVM

Environment Management: Mapping Variables to Values

Variable Handling: Memory Allocation Strategy

Expression Code Generation

Chapter 5: Implementation of Control Flow Instructions

Conditional Branches: Implementing if-else Statements

Generated LLVM IR Example

Loop Structures: Implementing while Statements

Generated LLVM IR Example

Chapter 6: From LLVM IR to Machine Code

Compiling with the `llc` Toolchain

Complete Compilation Flow Example

Conclusion

Review

Moonbit's Advantages in Compiler Development

Final Words