Skip to main content

Two Approaches to Regex Engines: Derivative and Thompson VM

· 11 min read

Regular expression engines can be implemented using fundamentally different approaches, each with distinct trade-offs in performance, memory usage, and implementation complexity. This article explores two mathematically equivalent but practically different methods for regex matching: Brzozowski derivatives and Thompson's virtual machine approach.

Both methods operate on the same abstract syntax tree representation, providing a unified foundation for direct performance comparison. The key insight is how these seemingly different approaches solve identical problems through different computational strategies—one through algebraic transformation, the other through program execution.

Conventions & Definitions

To establish a common foundation, both regex engines start with a shared AST representation that captures the essential structure of regular expressions in a tree format:

enum Ast {
  
(Char) -> Ast
Chr
(
Char
Char
)
(Ast, Ast) -> Ast
Seq
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
)
(Ast, Int?) -> Ast
Rep
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Int
Int
?)
(Ast) -> Ast
Opt
(
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
)

Additionally, we provide smart constructors to simplify regex construction:

fn 
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(chr : Char) -> Ast
chr
(
Char
chr
:
Char
Char
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Char) -> Ast
Chr
(
Char
chr
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(self : Ast, other : Ast) -> Ast
seq
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Ast
other
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Ast, Ast) -> Ast
Seq
(
Ast
self
,
Ast
other
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(self : Ast, n? : Int) -> Ast
rep
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
,
Int?
n
? :
Int
Int
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
(Ast, Int?) -> Ast
Rep
(
Ast
self
,
Int?
n
)
} fn
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(self : Ast) -> Ast
opt
(
Ast
self
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
{
Unit
@fs.
(Ast) -> Ast
Opt
(
Ast
self
)
}

The AST defines four fundamental regex operations:

  1. Chr(Char) matches a single literal character.
  2. Seq(Ast, Ast) matches one pattern followed by another through concatenation.
  3. Rep(Ast, Int?) repeats a pattern either unlimited times when None or exactly n times when Some(n).
  4. Opt(Ast) makes a pattern optional, equivalent to pattern? in standard regex syntax.

For example, we can build the regex (ab*)?—an optional sequence of 'a' followed by zero or more 'b's—as:

Ast::chr('a').seq(Ast::chr('b').rep()).opt()

Brzozowski Derivative

The derivative-based approach transforms regular expressions algebraically using formal language theory. For each input character, it computes the "derivative" of the regex by asking: "what remains to be matched after consuming this character?" This creates a new regex representing the remaining pattern.

We extend the basic Ast type to represent derivatives and nullability explicitly:

enum Exp {
  
Exp
Nil
Exp
Eps
(Char) -> Exp
Chr
(
Char
Char
)
(Exp, Exp) -> Exp
Alt
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
(Exp, Exp) -> Exp
Seq
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
(Exp) -> Exp
Rep
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
,
trait Hash {
  hash_combine(Self, Hasher) -> Unit
  hash(Self) -> Int
}

Trait for types that can be hashed

The hash method should return a hash value for the type, which is used in hash tables and other data structures. The hash_combine method is used to combine the hash of the current value with another hash value, typically used to hash composite types.

When two values are equal according to the Eq trait, they should produce the same hash value.

The hash method does not need to be implemented if hash_combine is implemented, When implemented separately, hash does not need to produce a hash value that is consistent with hash_combine.

Hash
,
trait Eq {
  equal(Self, Self) -> Bool
  op_equal(Self, Self) -> Bool
}

Trait for types whose elements can test for equality

Eq
,
trait Compare {
  compare(Self, Self) -> Int
}

Trait for types whose elements are ordered

The return value of [compare] is:

  • zero, if the two arguments are equal
  • negative, if the first argument is smaller
  • positive, if the first argument is greater
Compare
)

The constructors in Exp represent:

  1. Nil represents an impossible pattern that can never match anything.
  2. Eps matches the empty string.
  3. Chr(Char) matches a single character.
  4. Alt(Exp, Exp) represents alternation, providing choice between patterns.
  5. Seq(Exp, Exp) represents concatenation of two patterns.
  6. Rep(Exp) represents repetition of a pattern.

We use the Exp::of_ast function to convert the Ast into the more expressive Exp format:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match
Ast
ast
{
(Char) -> Ast
Chr
(
Char
c
) =>
(Char) -> Exp
Chr
(
Char
c
)
(Ast, Ast) -> Ast
Seq
(
Ast
a
,
Ast
b
) =>
(Exp, Exp) -> Exp
Seq
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
),
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
b
))
(Ast, Int?) -> Ast
Rep
(
Ast
a
,
Int?
None
) =>
(Exp) -> Exp
Rep
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
))
(Ast, Int?) -> Ast
Rep
(
Ast
a
,
(Int) -> Int?
Some
(
Int
n
)) => {
let
Exp
sec
=
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
)
let mut
Exp
exp
=
Exp
sec
for _ in
Int
1
..<
Int
n
{
Exp
exp
=
(Exp, Exp) -> Exp
Seq
(
Exp
exp
,
Exp
sec
)
}
Exp
exp
}
(Ast) -> Ast
Opt
(
Ast
a
) =>
(Exp, Exp) -> Exp
Alt
(
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
a
),
Exp
Eps
)
} }

We also provide smart constructors for Exp to simplify pattern building:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(a : Exp, b : Exp) -> Exp
seq
(
Exp
a
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Exp
b
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match (
Exp
a
,
Exp
b
) {
(
Exp
Nil
, _) | (_,
Exp
Nil
) =>
Exp
Nil
(
Exp
Eps
,
Exp
b
) =>
Exp
b
(
Exp
a
,
Exp
Eps
) =>
Exp
a
(
Exp
a
,
Exp
b
) =>
(Exp, Exp) -> Exp
Seq
(
Exp
a
,
Exp
b
)
} }

However, the smart constructor for Alt is strictly necessary—it ensures that the constructed Exp is normalized to "similarity" as mentioned in the original paper by Brzozowski. Two regexes are similar if one can be reduced to the other by applying the following rules:

AAABBAA(BC)(AB)C \begin{align} & A \mid \emptyset &&\rightarrow A \\ & A \mid B &&\rightarrow B \mid A \\ & A \mid (B \mid C) &&\rightarrow (A \mid B) \mid C \end{align}

Therefore, we normalize the Alt construction to always use the same associativity and order of alternatives:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(a : Exp, b : Exp) -> Exp
alt
(
Exp
a
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Exp
b
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match (
Exp
a
,
Exp
b
) {
(
Exp
Nil
,
Exp
b
) =>
Exp
b
(
Exp
a
,
Exp
Nil
) =>
Exp
a
(
(Exp, Exp) -> Exp
Alt
(
Exp
a
,
Exp
b
),
Exp
c
) =>
Exp
a
.
(a : Exp, b : Exp) -> Exp
alt
(
Exp
b
.
(a : Exp, b : Exp) -> Exp
alt
(
Exp
c
))
(
Exp
a
,
Exp
b
) => {
if
Exp
a
(Exp, Exp) -> Bool

automatically derived

==
Exp
b
{
Exp
a
} else if
Exp
a
(self_ : Exp, other : Exp) -> Bool
>
Exp
b
{
(Exp, Exp) -> Exp
Alt
(
Exp
b
,
Exp
a
)
} else {
(Exp, Exp) -> Exp
Alt
(
Exp
a
,
Exp
b
)
} } } }

The nullable function determines if a pattern can match the empty string without consuming input:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(self : Exp) -> Bool
nullable
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
) ->
Bool
Bool
{
match
Exp
self
{
Exp
Nil
=> false
Exp
Eps
=> true
(Char) -> Exp
Chr
(_) => false
(Exp, Exp) -> Exp
Alt
(
Exp
l
,
Exp
r
) =>
Exp
l
.
(self : Exp) -> Bool
nullable
()
(Bool, Bool) -> Bool
||
Exp
r
.
(self : Exp) -> Bool
nullable
()
(Exp, Exp) -> Exp
Seq
(
Exp
l
,
Exp
r
) =>
Exp
l
.
(self : Exp) -> Bool
nullable
()
(Bool, Bool) -> Bool
&&
Exp
r
.
(self : Exp) -> Bool
nullable
()
(Exp) -> Exp
Rep
(_) => true
} }

The deriv function computes the derivative of a pattern with respect to a character, transforming the pattern based on the rules defined in the Brzozowski derivative. We have reordered the rules to match the order in the deriv function:

Da=Daϵ=Daa=ϵDab= for (ab)Da(PQ)=(DaP)(DaQ)Da(PQ)=(DaPQ)(ν(P)DaQ)Da(P)=DaPP \begin{align} D_{a} \emptyset &= \emptyset \\ D_{a} \epsilon &= \emptyset \\ D_{a} a &= \epsilon \\ D_{a} b &= \emptyset & \text{ for }(a \neq b) \\ D_{a} (P \mid Q) &= (D_{a} P) \mid (D_{a} Q) \\ D_{a} (P \cdot Q) &= (D_{a} P \cdot Q) \mid (\nu(P) \cdot D_{a} Q) \\ D_{a} (P\ast) &= D_{a} P \cdot P\ast \\ \end{align}
fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(self : Exp, c : Char) -> Exp
deriv
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
Char
c
:
Char
Char
) ->
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
{
match
Exp
self
{
Exp
Nil
=>
Exp
self
Exp
Eps
=>
Exp
Nil
(Char) -> Exp
Chr
(
Char
d
) if
Char
d
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
Char
c
=>
Exp
Eps
(Char) -> Exp
Chr
(_) =>
Exp
Nil
(Exp, Exp) -> Exp
Alt
(
Exp
l
,
Exp
r
) =>
Exp
l
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
).
(a : Exp, b : Exp) -> Exp
alt
(
Exp
r
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
))
(Exp, Exp) -> Exp
Seq
(
Exp
l
,
Exp
r
) => {
let
Exp
dl
=
Exp
l
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
)
if
Exp
l
.
(self : Exp) -> Bool
nullable
() {
Exp
dl
.
(a : Exp, b : Exp) -> Exp
seq
(
Exp
r
).
(a : Exp, b : Exp) -> Exp
alt
(
Exp
r
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
))
} else {
Exp
dl
.
(a : Exp, b : Exp) -> Exp
seq
(
Exp
r
)
} }
(Exp) -> Exp
Rep
(
Exp
e
) =>
Exp
e
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
).
(a : Exp, b : Exp) -> Exp
seq
(
Exp
self
)
} }

To simplify our implementation, we only perform strict matching—the pattern must match the entire input string. Therefore, we only check for nullability after the entire input has been consumed:

fn 
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(self : Exp, s : String) -> Bool
matches
(
Exp
self
:
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
,
String
s
:
String
String
) ->
Bool
Bool
{
loop (
Exp
self
,
String
s
.
(self : String, start_offset? : Int, end_offset? : Int) -> @string.View

Creates a View into a String.

Example

  let str = "Hello🤣🤣🤣"
  let view1 = str.view()
  inspect(view1, content=
   "Hello🤣🤣🤣"
  )
  let start_offset = str.offset_of_nth_char(1).unwrap()
  let end_offset = str.offset_of_nth_char(6).unwrap() // the second emoji
  let view2 = str.view(start_offset~, end_offset~)
  inspect(view2, content=
   "ello🤣"
  )
view
()) {
(
Exp
Nil
, _) => {
return false } (
Exp
e
, []) => {
return
Exp
e
.
(self : Exp) -> Bool
nullable
()
} (
Exp
e
,
@string.View
[
Char
c
@string.View
, .. s]
) => {
continue (
Exp
e
.
(self : Exp, c : Char) -> Exp
deriv
(
Char
c
),
@string.View
s
)
} } }

Virtual Machine

The VM approach compiles regular expressions into bytecode instructions for a simple virtual machine. This method transforms the pattern-matching problem into program execution, where the VM simulates all possible paths through a non-deterministic finite automaton simultaneously.

Ken Thompson's 1968 paper described a regex engine that compiled patterns into IBM 7094 machine code. The key insight was to avoid exponential backtracking by maintaining multiple execution threads that advance through input in lockstep, processing one character at a time across all possible matching paths.

Instruction Set and Program Representation

The VM operates on four fundamental instructions that correspond to NFA operations:

enum Ops {
  
Ops
Done
(Char) -> Ops
Char
(
Char
Char
)
(Int) -> Ops
Jump
(
Int
Int
)
(Int) -> Ops
Fork
(
Int
Int
)
} derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)

Each instruction serves a specific purpose in NFA simulation. Done marks successful completion of pattern matching, equivalent to Thompson's original match. Char(c) consumes input character c and advances to the next instruction. Jump(addr) provides unconditional jump to instruction at address addr (Thompson's jmp). Fork(addr) creates two execution paths—one continues to the next instruction, another jumps to addr (Thompson's split).

The Fork instruction is crucial for handling non-determinism in patterns like alternation and repetition, where multiple execution paths must be explored simultaneously. This maps directly to NFA ε-transitions, where execution can spontaneously branch without consuming input.

We define a Prg that wraps an array of instructions with convenience methods for building and manipulating bytecode programs.

struct Prg(
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
]) derive(
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
)
fn
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg, inst : Ops) -> Unit
push
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
Ops
inst
:
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
) ->
Unit
Unit
{
Prg
self
.
Array[Ops]
0
.
(self : Array[Ops], value : Ops) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Ops
inst
)
} fn
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg) -> Int
length
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
) ->
Int
Int
{
Prg
self
.
Array[Ops]
0
.
(self : Array[Ops]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
()
} fn
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg, index : Int, inst : Ops) -> Unit
op_set
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
Int
index
:
Int
Int
,
Ops
inst
:
enum Ops {
  Done
  Char(Char)
  Jump(Int)
  Fork(Int)
} derive(Show)
Ops
) ->
Unit
Unit
{
Prg
self
.
Array[Ops]

Sets the element at the specified index in the array to a new value. The original value at that index is overwritten.

Parameters:

  • array : The array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws an error if index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
0
(Array[Ops], Int, Ops) -> Unit

Sets the element at the specified index in the array to a new value. The original value at that index is overwritten.

Parameters:

  • array : The array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws an error if index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
[
index] =
Ops
inst
}

AST Compilation to Bytecode

The Prg::of_ast function translates AST patterns into VM instructions using standard NFA construction techniques:

  1. Seq(a, b):

    code for a
    code for b
    
  2. Rep(a, None) (unbounded repetition):

        Fork L1, L2
    L1: code for a
        Jump L1
    L2:
    
  3. Rep(a, Some(n)) (fixed repetition):

    code for a
    code for a
    ... (n times) ...
    
  4. Opt(a) (optional):

        Fork L1, L2
    L1: code for a
    L2:
    

Note that the Fork constructor only accepts one address, because we always want to proceed to the next instruction after the Fork.

fn 
type Prg Array[Ops] derive(Show)
Prg
::
(ast : Ast) -> Prg
of_ast
(
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
type Prg Array[Ops] derive(Show)
Prg
{
fn
(Prg, Ast) -> Unit
compile
(
Prg
prog
:
type Prg Array[Ops] derive(Show)
Prg
,
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
) ->
Unit
Unit
{
match
Ast
ast
{
(Char) -> Ast
Chr
(
Char
chr
) =>
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Char) -> Ops
Char
(
Char
chr
))
(Ast, Ast) -> Ast
Seq
(
Ast
l
,
Ast
r
) => {
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
l
)
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
r
)
}
(Ast, Int?) -> Ast
Rep
(
Ast
e
,
Int?
None
) => {
let
Int
fork
=
Prg
prog
.
(self : Prg) -> Int
length
()
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Fork
(0))
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Jump
(
Int
fork
))
Prg
prog
(Prg, Int, Ops) -> Unit
[
fork] =
(Int) -> Ops
Fork
(
Prg
prog
.
(self : Prg) -> Int
length
())
}
(Ast, Int?) -> Ast
Rep
(
Ast
e
,
(Int) -> Int?
Some
(
Int
n
)) =>
for _ in
Int
0
..<
Int
n
{
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
}
(Ast) -> Ast
Opt
(
Ast
e
) => {
let
Int
fork_inst
=
Prg
prog
.
(self : Prg) -> Int
length
()
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
(Int) -> Ops
Fork
(0))
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
e
)
Prg
prog
(Prg, Int, Ops) -> Unit
[
fork_inst] =
(Int) -> Ops
Fork
(
Prg
prog
.
(self : Prg) -> Int
length
())
} } } let
Prg
prog
:
type Prg Array[Ops] derive(Show)
Prg
= []
(Prg, Ast) -> Unit
compile
(
Prg
prog
,
Ast
ast
)
Prg
prog
.
(self : Prg, inst : Ops) -> Unit
push
(
Ops
Done
)
Prg
prog
}

VM Execution Loop

In Rob Pike's implementation, the VM executes one-past the end of the input string to handle the final acceptance state. To make this explicit, our matches function implements the core VM execution loop using a two-phase approach:

Phase 1 handles character processing. For each input character, it processes all active threads in the current context. Char instructions that match the current character create new threads in the next context. Jump and Fork instructions immediately spawn new threads in the current context. After processing all threads, it swaps contexts and continues with the next character.

Phase 2 handles final acceptance. After consuming all input, it processes remaining threads looking for Done instructions. It handles any final Jump/Fork instructions that don't consume input. It returns true if any thread reaches a Done instruction.

fn 
type Prg Array[Ops] derive(Show)
Prg
::
(self : Prg, data : @string.View) -> Bool
matches
(
Prg
self
:
type Prg Array[Ops] derive(Show)
Prg
,
@string.View
data
:
#builtin.valtype
type @string.View

A @string.View represents a view of a String that maintains proper Unicode character boundaries. It allows safe access to a substring while handling multi-byte characters correctly.

@string.View
) ->
Bool
Bool
{
let
(Array[Ops]) -> Prg
Prg
(
Array[Ops]
prog
) =
Prg
self
let mut
Ctx
curr
=
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(length : Int) -> Ctx
new
(
Array[Ops]
prog
.
(self : Array[Ops]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
())
let mut
Ctx
next
=
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(length : Int) -> Ctx
new
(
Array[Ops]
prog
.
(self : Array[Ops]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
())
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(0)
for
Char
c
in
@string.View
data
{
while
Ctx
curr
.
(self : Ctx) -> Int?
pop
() is
(Int) -> Int?
Some
(
Int
pc
) {
match
Array[Ops]
prog
(Array[Ops], Int) -> Ops

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
pc] {
Ops
Done
=> ()
(Char) -> Ops
Char
(
Char
char
) if
Char
char
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
Char
c
=> {
Ctx
next
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
}
(Int) -> Ops
Jump
(
Int
jump
) =>
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
jump
)
(Int) -> Ops
Fork
(
Int
fork
) => {
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
fork
)
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
} _ => () } } let
Ctx
temp
=
Ctx
curr
Ctx
curr
=
Ctx
next
Ctx
next
=
Ctx
temp
Ctx
next
.
(self : Ctx) -> Unit
reset
()
} while
Ctx
curr
.
(self : Ctx) -> Int?
pop
() is
(Int) -> Int?
Some
(
Int
pc
) {
match
Array[Ops]
prog
(Array[Ops], Int) -> Ops

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
pc] {
Ops
Done
=> return true
(Int) -> Ops
Jump
(
Int
x
) =>
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
x
)
(Int) -> Ops
Fork
(
Int
x
) => {
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
x
)
Ctx
curr
.
(self : Ctx, pc : Int) -> Unit
add
(
Int
pc
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1)
} _ => () } } false }

In the original blog post, Rob Pike uses a recursive function to handle Fork and Jump instructions so that threads are executed according to their priorities. Instead, we use a stack-like structure to manage all threads of execution, which naturally respects thread priority:

struct Ctx {
  
@deque.Deque[Int]
deque
:
type @deque.Deque[A]
@deque.Deque
[
Int
Int
]
FixedArray[Bool]
visit
:
type FixedArray[A]
FixedArray
[
Bool
Bool
]
} fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(length : Int) -> Ctx
new
(
Int
length
:
Int
Int
) ->
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
{
{
@deque.Deque[Int]
deque
:
(capacity? : Int) -> @deque.Deque[Int]

Creates a new empty deque with an optional initial capacity.

Parameters:

  • capacity : The initial capacity of the deque. If not specified, defaults to 0 and will be automatically adjusted as elements are added.

Returns a new empty deque of type T[A] where A is the type of elements the deque will hold.

Example

  let dq : @deque.Deque[Int] = @deque.new()
  inspect(dq.length(), content="0")
  inspect(dq.capacity(), content="0")

  let dq : @deque.Deque[Int] = @deque.new(capacity=10)
  inspect(dq.length(), content="0")
  inspect(dq.capacity(), content="10")
@deque.new
(),
FixedArray[Bool]
visit
:
type FixedArray[A]
FixedArray
::
(len : Int, init : Bool) -> FixedArray[Bool]

Creates a new fixed-size array with the specified length, initializing all elements with the given value.

Parameters:

  • length : The length of the array to create. Must be non-negative.
  • initial_value : The value used to initialize all elements in the array.

Returns a new fixed-size array of type FixedArray[T] with length elements, where each element is initialized to initial_value.

Throws a panic if length is negative.

Example:

  let arr = FixedArray::make(3, 42)
  inspect(arr[0], content="42")
  inspect(arr.length(), content="3")

WARNING: A common pitfall is creating with the same initial value, for example:

  let two_dimension_array = FixedArray::make(10, FixedArray::make(10, 0))
  two_dimension_array[0][5] = 10
  assert_eq(two_dimension_array[5][5], 10)

This is because all the cells reference to the same object (the FixedArray[Int] in this case). One should use makei() instead which creates an object for each index.

make
(
Int
length
, false) }
} fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(self : Ctx, pc : Int) -> Unit
add
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
,
Int
pc
:
Int
Int
) ->
Unit
Unit
{
if
Bool
!
Ctx
self
Bool
.
FixedArray[Bool]
visit
(FixedArray[Bool], Int) -> Bool

Retrieves an element at the specified index from a fixed-size array. This function implements the array indexing operator [].

Parameters:

  • array : The fixed-size array to access.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a runtime error if the index is out of bounds (negative or greater than or equal to the length of the array).

Example:

  let arr = FixedArray::make(3, 42)
  inspect(arr[1], content="42")
[
Bool
pc]
{
Ctx
self
.
@deque.Deque[Int]
deque
.
(self : @deque.Deque[Int], value : Int) -> Unit

Adds an element to the back of the deque.

If the deque is at capacity, it will be reallocated.

Example

  let dv = @deque.of([1, 2, 3, 4, 5])
  dv.push_back(6)
  assert_eq(dv.back(), Some(6))
push_back
(
Int
pc
)
Ctx
self
.
FixedArray[Bool]
visit
(FixedArray[Bool], Int, Bool) -> Unit

Sets a value at the specified index in a fixed-size array. The original value at that index is overwritten.

Parameters:

  • array : The fixed-size array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws a runtime error if the index is out of bounds (less than 0 or greater than or equal to the length of the array).

Example:

  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
[
pc] = true
} } fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(self : Ctx) -> Int?
pop
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
) ->
Int
Int
? {
match
Ctx
self
.
@deque.Deque[Int]
deque
.
(self : @deque.Deque[Int]) -> Int?

Removes a back element from a deque and returns it, or None if it is empty.

Example

  let dv = @deque.of([1, 2, 3, 4, 5])
  assert_eq(dv.pop_back(), Some(5))
pop_back
() {
(Int) -> Int?
Some
(
Int
pc
) => {
Ctx
self
.
FixedArray[Bool]
visit
(FixedArray[Bool], Int, Bool) -> Unit

Sets a value at the specified index in a fixed-size array. The original value at that index is overwritten.

Parameters:

  • array : The fixed-size array to modify.
  • index : The position in the array where the value will be set.
  • value : The new value to assign at the specified index.

Throws a runtime error if the index is out of bounds (less than 0 or greater than or equal to the length of the array).

Example:

  let arr = [1, 2, 3]
  arr[1] = 42
  inspect(arr, content="[1, 42, 3]")
[
pc] = false
(Int) -> Int?
Some
(
Int
pc
)
}
Int?
None
=>
Int?
None
} } fn
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
::
(self : Ctx) -> Unit
reset
(
Ctx
self
:
struct Ctx {
  deque: @deque.Deque[Int]
  visit: FixedArray[Bool]
}
Ctx
) ->
Unit
Unit
{
Ctx
self
.
@deque.Deque[Int]
deque
.
(self : @deque.Deque[Int]) -> Unit

Clears the deque, removing all values.

This method has no effect on the allocated capacity of the deque, only setting the length to 0.

Example

  let dv = @deque.of([1, 2, 3, 4, 5])
  dv.clear()
  inspect(dv.length(), content="0")
clear
()
Ctx
self
.
FixedArray[Bool]
visit
.
(self : FixedArray[Bool], value : Bool, start? : Int, end? : Int) -> Unit

Fill the array with a given value.

This method fills all or part of a FixedArray with the given value.

Parameters

  • value: The value to fill the array with
  • start: The starting index (inclusive, default: 0)
  • end: The ending index (exclusive, optional)

If end is not provided, fills from start to the end of the array. If start equals end, no elements are modified.

Panics

  • Panics if start is negative or greater than or equal to the array length
  • Panics if end is provided and is less than start or greater than array length
  • Does nothing if the array is empty

Example

// Fill entire array
let fa : FixedArray[Int] = [0, 0, 0, 0, 0]
fa.fill(3)
inspect(fa, content="[3, 3, 3, 3, 3]")

// Fill from index 1 to 3 (exclusive)
let fa2 : FixedArray[Int] = [0, 0, 0, 0, 0]
fa2.fill(9, start=1, end=3)
inspect(fa2, content="[0, 9, 9, 0, 0]")

// Fill from index 2 to end
let fa3 : FixedArray[String] = ["a", "b", "c", "d"]
fa3.fill("x", start=2)
inspect(fa3, content=(
  #|["a", "b", "x", "x"]
))
fill
(false)
}

The visit array is used to drop low-priority threads. When a new thread is added, we first check if it is already in the deque using the visit array. If it is, we drop it; otherwise, we add it to the deque and mark it as visited. This mechanism is necessary to avoid infinite loops or exponential blowup when the regex contains patterns that can be expanded indefinitely, such as (a?)*.

Benchmarks and Performance Analysis

The benchmark demonstrates both approaches on a pathological case that challenges many regex implementations:

test (
@bench.T
b
:
type @bench.T
@bench.T
) {
let
Int
n
= 15
let
String
txt
= "a".
(self : String, n : Int) -> String

Returns a new string with self repeated n times.

repeat
(
Int
n
)
let
Ast
chr
=
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
::
(chr : Char) -> Ast
chr
('a')
let
Ast
ast
:
enum Ast {
  Chr(Char)
  Seq(Ast, Ast)
  Rep(Ast, Int?)
  Opt(Ast)
} derive(Show, Hash, Eq)
Ast
=
Ast
chr
.
(self : Ast) -> Ast
opt
().
(self : Ast, n~ : Int) -> Ast
rep
(
Int
n
~).
(self : Ast, other : Ast) -> Ast
seq
(
Ast
chr
.
(self : Ast, n~ : Int) -> Ast
rep
(
Int
n
~))
let
Exp
exp
=
enum Exp {
  Nil
  Eps
  Chr(Char)
  Alt(Exp, Exp)
  Seq(Exp, Exp)
  Rep(Exp)
} derive(Show, Hash, Eq, Compare)
Exp
::
(ast : Ast) -> Exp
of_ast
(
Ast
ast
)
@bench.T
b
.
(self : @bench.T, name~ : String, f : () -> Unit, count? : UInt) -> Unit

Run a benchmark in batch mode

bench
(
String
name
="derive", () =>
Exp
exp
.
(self : Exp, s : String) -> Bool
matches
(
String
txt
) |>
(t : Bool) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
())
let
Prg
tvm
=
type Prg Array[Ops] derive(Show)
Prg
::
(ast : Ast) -> Prg
of_ast
(
Ast
ast
)
@bench.T
b
.
(self : @bench.T, name~ : String, f : () -> Unit, count? : UInt) -> Unit

Run a benchmark in batch mode

bench
(
String
name
="thompson", () =>
Prg
tvm
.
(self : Prg, data : @string.View) -> Bool
matches
(
String
txt
) |>
(t : Bool) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each((x) => { sum = sum + x })) // Ignore the Unit return value of each()
ignore
())
}

This pattern (a?){n}a{n} represents a classical exponential blowup case for backtracking engines. The pattern allows n different ways to match n 'a' characters, creating exponential search spaces in naive implementations.

name     time (mean ± σ)         range (min … max)
derive     41.78 µs ±   0.14 µs    41.61 µs …  42.13 µs  in 10 ×   2359 runs
thompson   12.79 µs ±   0.04 µs    12.74 µs …  12.84 µs  in 10 ×   7815 runs

The benchmark results show that the VM approach is significantly faster than the derivative-based approach for this case. The derivative method frequently allocates intermediate regex structures, leading to higher overhead and slower performance. In contrast, the VM executes a fixed set of instructions and rarely allocates new structures once the deque grows to its full size.

However, the derivative approach is easier to reason about. We can easily prove termination of the algorithm, as the number of derivatives to be computed is bounded by the size of the AST and strictly decreases with each recursive application of the deriv function. The VM approach, on the other hand, can potentially run indefinitely if the input Prg contains infinite loops, and requires careful handling of thread priority to avoid infinite loops and exponential blowup in the number of threads.