Dancing with LLVM: A Moonbit Chronicle (Part 2) - LLVM Backend Generation

Introduction
In the process of programming language design, the frontend is responsible for understanding and verifying the structure and semantics of a program, while the compiler backend takes on the task of translating these abstract concepts into executable machine code. The implementation of the backend not only requires a deep understanding of the target architecture but also mastery of complex optimization techniques to generate efficient code.
LLVM (Low Level Virtual Machine), as a comprehensive modern compiler infrastructure, provides us with a powerful and flexible solution. By converting a program into LLVM Intermediate Representation (IR), we can leverage LLVM's mature toolchain to compile the code to various target architectures, including RISC-V, ARM, and x86.
Moonbit's LLVM Ecosystem
Moonbit officially provides two important LLVM-related projects:
llvm.mbt: Moonbit language bindings for the original LLVM, providing direct access to the llvm-c interface. It requires the installation of the complete LLVM toolchain, can only generate for native backends, and requires you to handle compilation and linking yourself, but it can generate IR that is fully compatible with the original LLVM.MoonLLVM: A pure Moonbit implementation of an LLVM-like system. It can generate LLVM IR without external dependencies and supports JavaScript and WebAssembly backends.This article chooses
llvm.mbtas our tool. Its API design is inspired by the highly acclaimedinkwelllibrary in the Rust ecosystem.
In the previous article, "Dancing with LLVM: A Moonbit Chronicle (Part 1) - Implementing the Frontend," we completed the conversion from source code to a typed abstract syntax tree. This article will build on that achievement, focusing on the core techniques and implementation details of code generation.
Chapter 1: Representing the LLVM Type System in Moonbit
Before diving into code generation, we first need to understand how llvm.mbt represents LLVM's various concepts within Moonbit's type system. LLVM's type system is quite complex, containing multiple levels such as basic types, composite types, and function types.
Trait Objects: An Abstract Representation of Types
In the API design of llvm.mbt, you will frequently encounter the core concept of &Type. This is not a concrete struct or enum, but a Trait Object—which can be understood as the functional equivalent of an abstract base class in object-oriented programming.
// &Type is a trait object representing any LLVM type
let Unit
some_type: &Type = Unit
context.() -> Unit
i32_type()
Type Identification and Conversion
To determine the specific type of a &Type, we need to perform a runtime type check using the as_type_enum interface:
pub fn fn identify_type(ty : Unit) -> String
identify_type(Unit
ty: &Type) -> String
String {
match Unit
ty.() -> Unit
as_type_enum() {
(Unit) -> Unit
IntType(Unit
int_ty) => "Integer type with \{Unit
int_ty.() -> Unit
get_bit_width()} bits"
(_/0) -> Unit
FloatType(_/0
float_ty) => "Floating point type"
(_/0) -> Unit
PointerType(_/0
ptr_ty) => "Pointer type"
(_/0) -> Unit
FunctionType(_/0
func_ty) => "Function type"
(_/0) -> Unit
ArrayType(_/0
array_ty) => "Array type"
(_/0) -> Unit
StructType(_/0
struct_ty) => "Structure type"
(_/0) -> Unit
VectorType(_/0
vec_ty) => "Vector type"
(_/0) -> Unit
ScalableVectorType(_/0
svec_ty) => "Scalable vector type"
(_/0) -> Unit
MetadataType(_/0
meta_ty) => "Metadata type"
}
}
Safe Type Conversion Strategies
When we are certain that a &Type has a specific type, there are several conversion methods to choose from:
-
Direct Conversion (for deterministic scenarios)
letty: &Type =Unitcontext.Uniti32_type() let() -> Uniti32_ty =?ty.let ty : Unitinto_int_type() // Direct conversion, errors are handled by llvm.mbt let() -> ?bit_width =?i32_ty.let i32_ty : ?get_bit_width() // Call a method specific to IntType() -> ? -
Defensive Conversion (recommended for production environments)
letty: &Type =Unitget_some_type() // An unknown type obtained from somewhere guard ty.as_type_enum() is IntType(i32_ty) else { raise CodeGenError("Expected integer type, got \{ty}") } // Now it's safe to use i32_ty let() -> Unitbit_width =?i32_ty.let i32_ty : ?get_bit_width()() -> ?
Constructing Composite Types
LLVM supports various composite types, which are usually constructed through methods of basic types:
pub fn fn create_composite_types(context : ?) -> Unit
create_composite_types(?
context: @llvm.Context) -> Unit
Unit {
let Unit
i32_ty = ?
context.() -> Unit
i32_type()
let Unit
f64_ty = ?
context.() -> Unit
f64_type()
// Array type: [16 x i32]
let Unit
i32_array_ty = Unit
i32_ty.(Int) -> Unit
array_type(16)
// Function type: i32 (i32, i32)
let Unit
add_func_ty = Unit
i32_ty.(Array[Unit]) -> Unit
fn_type([Unit
i32_ty, Unit
i32_ty])
// Struct type: {i32, f64}
let Unit
struct_ty = ?
context.(Array[Unit]) -> Unit
struct_type([Unit
i32_ty, Unit
f64_ty])
// Pointer type (all pointers are opaque in LLVM 18+)
let Unit
ptr_ty = Unit
i32_ty.() -> Unit
ptr_type()
// Output type information for verification
fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:
value : The value to be printed. Must implement the Show trait.
Example:
test {
if false {
println(42)
println("Hello, World!")
}
}
println("Array type: \{Unit
i32_array_ty}") // [16 x i32]
fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:
value : The value to be printed. Must implement the Show trait.
Example:
test {
if false {
println(42)
println("Hello, World!")
}
}
println("Function type: \{Unit
add_func_ty}") // i32 (i32, i32)
fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:
value : The value to be printed. Must implement the Show trait.
Example:
test {
if false {
println(42)
println("Hello, World!")
}
}
println("Struct type: \{Unit
struct_ty}") // {i32, f64}
fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:
value : The value to be printed. Must implement the Show trait.
Example:
test {
if false {
println(42)
println("Hello, World!")
}
}
println("Pointer type: \{Unit
ptr_ty}") // ptr
}
Important Reminder: Opaque Pointers
Starting with LLVM version 18, all pointer types use the opaque pointer design. This means that regardless of the type they point to, all pointers are represented as
ptrin the IR, and the specific type information they point to is no longer visible in the type system.
Chapter 2: The LLVM Value System and the BasicValue Concept
Compared to the type system, LLVM's value system is more complex. llvm.mbt, consistent with inkwell, divides values into two important abstract layers: Value and BasicValue. The difference lies in distinguishing the source of value creation from the way values are used:
- Value: Focuses on how a value is produced (e.g., constants, instruction results).
- BasicValue: Focuses on what basic type a value has (e.g., integer, float, pointer).
Practical Application Example
pub fn fn demonstrate_value_system(context : ?, builder : ?) -> Unit
demonstrate_value_system(?
context: Context, ?
builder: Builder) -> Unit
Unit {
let Unit
i32_ty = ?
context.() -> Unit
i32_type()
// Create two integer constants - these are directly IntValue
let Unit
const1 = Unit
i32_ty.(Int) -> Unit
const_int(10) // Value: IntValue, BasicValue: IntValue
let Unit
const2 = Unit
i32_ty.(Int) -> Unit
const_int(20) // Value: IntValue, BasicValue: IntValue
// Perform an addition operation - the result is an InstructionValue
let Unit
add_result = ?
builder.(Unit, Unit) -> Unit
build_int_add(Unit
const1, Unit
const2)
// In different contexts, we need different perspectives:
// As an instruction to check its properties
let Unit
instruction = Unit
add_result.() -> Unit
as_instruction()
fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:
value : The value to be printed. Must implement the Show trait.
Example:
test {
if false {
println(42)
println("Hello, World!")
}
}
println("Instruction opcode: \{Unit
instruction.() -> Unit
get_opcode()}")
// As a basic value to get its type
let Unit
basic_value = Unit
add_result.() -> Unit
into_basic_value()
fn[T : Show] println(input : T) -> Unit
Prints any value that implements the Show trait to the standard output,
followed by a newline.
Parameters:
value : The value to be printed. Must implement the Show trait.
Example:
test {
if false {
println(42)
println("Hello, World!")
}
}
println("Result type: \{Unit
basic_value.() -> Unit
get_type()}")
// As an integer value for subsequent calculations
let Unit
int_value = Unit
add_result.() -> Unit
into_int_value()
let Unit
final_result = ?
builder.(Unit, Unit) -> Unit
build_int_mul(Unit
int_value, Unit
const1)
}
Complete Classification of Value Types
-
ValueEnum: All possible value types
pub enum ValueEnum {IntValue(IntValue) // Integer value(?) -> ValueEnumFloatValue(FloatValue) // Floating-point value(?) -> ValueEnumPointerValue(PointerValue) // Pointer value(?) -> ValueEnumStructValue(StructValue) // Struct value(?) -> ValueEnumFunctionValue(FunctionValue) // Function value(?) -> ValueEnumArrayValue(ArrayValue) // Array value(?) -> ValueEnumVectorValue(VectorValue) // Vector value(?) -> ValueEnumPhiValue(PhiValue) // Phi node value(?) -> ValueEnumScalableVectorValue(ScalableVectorValue) // Scalable vector value(?) -> ValueEnumMetadataValue(MetadataValue) // Metadata value(?) -> ValueEnumCallSiteValue(CallSiteValue) // Call site value(?) -> ValueEnumGlobalValue(GlobalValue) // Global value(?) -> ValueEnumInstructionValue(InstructionValue) // Instruction value } derive((?) -> ValueEnumShow)trait Show { output(Self, &Logger) -> Unit to_string(Self) -> String }Trait for types that can be converted to
String -
BasicValueEnum: Values that have a basic type
pub enum BasicValueEnum {ArrayValue(ArrayValue) // Array value(?) -> BasicValueEnumIntValue(IntValue) // Integer value(?) -> BasicValueEnumFloatValue(FloatValue) // Floating-point value(?) -> BasicValueEnumPointerValue(PointerValue) // Pointer value(?) -> BasicValueEnumStructValue(StructValue) // Struct value(?) -> BasicValueEnumVectorValue(VectorValue) // Vector value(?) -> BasicValueEnumScalableVectorValue(ScalableVectorValue) // Scalable vector value } derive((?) -> BasicValueEnumShow)trait Show { output(Self, &Logger) -> Unit to_string(Self) -> String }Trait for types that can be converted to
String