Skip to main content

MoonBit Pearls Vol.01:Implementing a Pratt Parser in MoonBit

· 8 min read
Zhu HongCheng

Last week, the MoonBit community launched the "MoonBit Pearls" campaign, calling for high-quality documentation and code examples. After meticulous selection, we are proud to officially release the first featured article in the "MoonBit Pearls" column this week.

As a long-term knowledge repository, this column will continuously curate and showcase outstanding documentation. We encourage more developers to contribute in the future, collectively enriching the MoonBit community ecosystem.

Below is the content of the first submission. The author provides a comprehensive case study demonstrating how to implement a Pratt parser in MoonBit:

Implementing a Pratt Parser in MoonBit

During the compilation process, syntax analysis (also known as parsing) is a critical step. The primary responsibility of a parser is to convert a stream of tokens into an Abstract Syntax Tree (AST).

This article introduces an implementation algorithm for parsers: Pratt Parsing, a top-down operator precedence parsing method, and demonstrates how to implement it using MoonBit.

Why Use a Pratt Parser?

Infix expressions are familiar to almost every programmer. Even staunch Lisp/Forth programmers are aware that a majority of the world writes arithmetic expressions like this:

24 * (x + 4)

For compiler (or interpreter) writers, such infix expressions are slightly more challenging to parse compared to the prefix expressions used in Lisp or the postfix expressions used in Forth.

For example, a naive handwritten recursive descent parser would require multiple mutually recursive functions and the elimination of left recursion during syntax analysis, making the code less maintainable as the number of operators grows. Parser generator tools are also not entirely satisfactory for this problem. Consider the BNF for a simple arithmetic expression with addition and multiplication:

Expr ::=
    Factor
    | Expr '+' Factor
Factor ::=
    Atom
    | Factor '*' Atom
Atom ::=
    'number'
    | '(' Expr ')'

This does not look very intuitive and might require revisiting formal language theory from university courses.

Some languages, like Haskell, support custom infix operators, which are almost impossible to handle simply with parser generator tools.

The Pratt parser elegantly solves the problem of parsing infix expressions while also being easily extensible to support new operators (without modifying the source code!). It is recommended alongside recursive descent parsers in the renowned compilation textbook Crafting Interpreters and is used in projects like rust-analyzer.

Binding Power

In Pratt parsers, the concept used to describe associativity and precedence is called binding power. For each infix operator, the binding power is represented as a pair of integers—one for the left and one for the right. For example:

expr:   A     +     B     *     C
power:     3     3     5     5

As the name suggests, a higher number means higher priority in capturing an operand (in this example, A, B, and C are operands).

The above example demonstrates operators with different precedence levels, while the associativity of the same operator is represented by asymmetric binding powers.

expr:   A     +     B     +     C
power:     1     2     1     2

In this case, when parsing reaches B, the expression is grouped as follows due to the higher binding power on the left:

expr:   (A + B)     +     C
power:           1     2

Next, let’s see how the Pratt parser uses this concept during execution.

Overview and Preparation

The main framework of a Pratt parser looks like this:

fn parse(self : Tokens, min_bp : Int) -> SExpr ! ParseError {
    ...
    while true {
       parse(...)
    }
    ...
}

As shown above, it is implemented using a combination of recursion and loops. This corresponds to two modes:

  • The leftmost expression is always the innermost, e.g., "1 + 2 + 3" = "(1 + 2) + 3", which can be parsed using just a loop.
  • The rightmost expression is always the innermost, e.g., "1 + 2 + 3" = "1 + (2 + 3)", which can be parsed using recursion alone.

The min_bp parameter represents the binding power of the nearest unfinished operator on the left.

Our goal is to read a token stream and output a prefix expression without worrying about precedence:

enum SExpr {
  
(String) -> SExpr
Atom
(
String
String
)
(Char, Array[SExpr]) -> SExpr
Cons
(
Char
Char
,
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr
])
} impl
trait Show {
  output(Self, &Logger) -> Unit
  to_string(Self) -> String
}

Trait for types that can be converted to String

Show
for
enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr
with
(self : SExpr, logger : &Logger) -> Unit
output
(
SExpr
self
,
&Logger
logger
) {
match
SExpr
self
{
(String) -> SExpr
Atom
(
String
s
) =>
&Logger
logger
.
(&Logger, String) -> Unit
write_string
(
String
s
)
(Char, Array[SExpr]) -> SExpr
Cons
(
Char
op
,
Array[SExpr]
args
) => {
&Logger
logger
.
(&Logger, Char) -> Unit
write_char
('(')
&Logger
logger
.
(&Logger, Char) -> Unit
write_char
(
Char
op
)
for
Int
i
= 0;
Int
i
(self_ : Int, other : Int) -> Bool
<
Array[SExpr]
args
.
(self : Array[SExpr]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
();
Int
i
=
Int
i
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+
1 {
&Logger
logger
.
(&Logger, Char) -> Unit
write_char
(' ')
&Logger
logger
.
(&Logger, String) -> Unit
write_string
(
Array[SExpr]
args
(Array[SExpr], Int) -> SExpr

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
i].
(self : SExpr) -> String
to_string
())
}
&Logger
logger
.
(&Logger, Char) -> Unit
write_char
(')')
} } } test {
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(Char, Array[SExpr]) -> SExpr
Cons
('+', [
(String) -> SExpr
Atom
("3"),
(String) -> SExpr
Atom
("4")]),
String
content
="(+ 3 4)")
}

Since errors may occur during this process, the return type of parseExpr is SExpr ! ParseError.

Before writing the parser, however, we need to split the input string into a simple token stream.

enum Token {
  
Token
LParen
Token
RParen
(String) -> Token
Operand
(
String
String
)
(Char) -> Token
Operator
(
Char
Char
)
Token
Eof
} derive(
(Token, &Logger) -> Unit

Trait for types that can be converted to String

Show
,
(Token, Token) -> Bool

Trait for types whose elements can test for equality

Eq
)
struct Tokens { mut
Int
position
:
Int
Int
Array[Token]
tokens
:
type Array[T]

An Array is a collection of values that supports random access and can grow in size.

Array
[
enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
}
Token
]
}

This token stream requires two methods: peek() and pop().

The peek() method retrieves the first token in the stream without modifying the state—in other words, it is side-effect-free and merely "peeks" at the upcoming content. For an empty token stream, it returns Eof.

fn 
(self : Tokens) -> Token
peek
(
Tokens
self
:
struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens
) ->
enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
}
Token
{
if
Tokens
self
.
Int
position
(self_ : Int, other : Int) -> Bool
<
Tokens
self
.
Array[Token]
tokens
.
(self : Array[Token]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
() {
Tokens
self
.
Array[Token]
tokens
.
(self : Array[Token], idx : Int) -> Token

Retrieves the element at the specified index from an array without bounds checking.

Parameters:

  • array : The array from which to retrieve the element.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Example:

  let arr = [1, 2, 3]
  inspect(arr.unsafe_get(1), content="2")
unsafe_get
(
Tokens
self
.
Int
position
)
} else {
Token
Eof
} }

The pop() method consumes a token after peeking.

fn 
(self : Tokens) -> Token
pop
(
Tokens
self
:
struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens
) ->
enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
}
Token
{
if
Tokens
self
.
Int
position
(self_ : Int, other : Int) -> Bool
<
Tokens
self
.
Array[Token]
tokens
.
(self : Array[Token]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
() {
let
Int
pos
=
Tokens
self
.
Int
position
Tokens
self
.
Int
position
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
Tokens
self
.
Array[Token]
tokens
.
(self : Array[Token], idx : Int) -> Token

Retrieves the element at the specified index from an array without bounds checking.

Parameters:

  • array : The array from which to retrieve the element.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Example:

  let arr = [1, 2, 3]
  inspect(arr.unsafe_get(1), content="2")
unsafe_get
(
Int
pos
)
} else {
Token
Eof
} }

The tokenize function is responsible for converting a string into a token stream.

fn 
(this : Char) -> Bool
isDigit
(
Char
this
:
Char
Char
) ->
Bool
Bool
{
Char
this
is '0'..='9'
} fn
(this : Char) -> Bool
isAlpha
(
Char
this
:
Char
Char
) ->
Bool
Bool
{
Char
this
is 'A'..='Z'
(Bool, Bool) -> Bool
||
Char
this
is 'a'..='z'
} fn
(this : Char) -> Bool
isWhiteSpace
(
Char
this
:
Char
Char
) ->
Bool
Bool
{
Char
this
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
' '
(Bool, Bool) -> Bool
||
Char
this
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
'\t'
(Bool, Bool) -> Bool
||
Char
this
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
'\n'
} fn
(this : Char) -> Bool
isOperator
(
Char
this
:
Char
Char
) ->
Bool
Bool
{
let
String
operators
= "+-*/"
String
operators
.
(self : String, c : Char) -> Bool

Returns true if this string contains the given character.

contains_char
(
Char
this
)
} type! LexError
Int
Int
fn
(source : String) -> Tokens raise LexError
tokenize
(
String
source
:
String
String
) ->
struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens
!
type! LexError Int
LexError
{
let
Array[Token]
tokens
= []
let
Array[Char]
source
=
String
source
.
(self : String) -> Array[Char]

Converts the String into an array of Chars.

to_array
()
let
StringBuilder
buf
=
type StringBuilder
StringBuilder
::
(size_hint~ : Int) -> StringBuilder

Creates a new string builder with an optional initial capacity hint.

Parameters:

  • size_hint : An optional initial capacity hint for the internal buffer. If less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes, not the size of characters. size_hint may be ignored on some platforms, JS for example.

Returns a new StringBuilder instance with the specified initial capacity.

new
(
Int
size_hint
= 100)
let mut
Int
i
= 0
while
Int
i
(self_ : Int, other : Int) -> Bool
<
Array[Char]
source
.
(self : Array[Char]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
() {
let
Char
ch
=
Array[Char]
source
.
(self : Array[Char], idx : Int) -> Char

Retrieves the element at the specified index from an array without bounds checking.

Parameters:

  • array : The array from which to retrieve the element.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Example:

  let arr = [1, 2, 3]
  inspect(arr.unsafe_get(1), content="2")
unsafe_get
(
Int
i
)
Int
i
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
if
Char
ch
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
'('{
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Token
LParen
)
} else if
Char
ch
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
')' {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
Token
RParen
)
} else if
(this : Char) -> Bool
isOperator
(
Char
ch
) {
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(Char) -> Token
Operator
(
Char
ch
))
} else if
(this : Char) -> Bool
isAlpha
(
Char
ch
) {
StringBuilder
buf
.
(self : StringBuilder, ch : Char) -> Unit

Writes a character to the StringBuilder.

write_char
(
Char
ch
)
while
Int
i
(self_ : Int, other : Int) -> Bool
<
Array[Char]
source
.
(self : Array[Char]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
()
(Bool, Bool) -> Bool
&&
(
(this : Char) -> Bool
isAlpha
(
Array[Char]
source
(Array[Char], Int) -> Char

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
i])
(Bool, Bool) -> Bool
||
(this : Char) -> Bool
isDigit
(
Array[Char]
source
(Array[Char], Int) -> Char

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
i])
(Bool, Bool) -> Bool
||
Array[Char]
source
(Array[Char], Int) -> Char

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
i]
(self : Char, other : Char) -> Bool

Compares two characters for equality.

Parameters:

  • self : The first character to compare.
  • other : The second character to compare.

Returns true if both characters represent the same Unicode code point, false otherwise.

Example:

  let a = 'A'
  let b = 'A'
  let c = 'B'
  inspect(a == b, content="true")
  inspect(a == c, content="false")
==
'_') {
StringBuilder
buf
.
(self : StringBuilder, ch : Char) -> Unit

Writes a character to the StringBuilder.

write_char
(
Array[Char]
source
(Array[Char], Int) -> Char

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
i])
Int
i
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
}
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Operand
(
StringBuilder
buf
.
(self : StringBuilder) -> String

Returns the current content of the StringBuilder as a string.

to_string
()))
StringBuilder
buf
.
(self : StringBuilder) -> Unit

Resets the string builder to an empty state.

reset
()
} else if
(this : Char) -> Bool
isDigit
(
Char
ch
) {
StringBuilder
buf
.
(self : StringBuilder, ch : Char) -> Unit

Writes a character to the StringBuilder.

write_char
(
Char
ch
)
while
Int
i
(self_ : Int, other : Int) -> Bool
<
Array[Char]
source
.
(self : Array[Char]) -> Int

Returns the number of elements in the array.

Parameters:

  • array : The array whose length is to be determined.

Returns the number of elements in the array as an integer.

Example:

  let arr = [1, 2, 3]
  inspect(arr.length(), content="3")
  let empty : Array[Int] = []
  inspect(empty.length(), content="0")
length
()
(Bool, Bool) -> Bool
&&
(this : Char) -> Bool
isDigit
(
Array[Char]
source
(Array[Char], Int) -> Char

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
i]) {
StringBuilder
buf
.
(self : StringBuilder, ch : Char) -> Unit

Writes a character to the StringBuilder.

write_char
(
Array[Char]
source
(Array[Char], Int) -> Char

Retrieves an element from the array at the specified index.

Parameters:

  • array : The array to get the element from.
  • index : The position in the array from which to retrieve the element.

Returns the element at the specified index.

Throws a panic if the index is negative or greater than or equal to the length of the array.

Example:

  let arr = [1, 2, 3]
  inspect(arr[1], content="2")
[
i])
Int
i
(self : Int, other : Int) -> Int

Adds two 32-bit signed integers. Performs two's complement arithmetic, which means the operation will wrap around if the result exceeds the range of a 32-bit integer.

Parameters:

  • self : The first integer operand.
  • other : The second integer operand.

Returns a new integer that is the sum of the two operands. If the mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to 2,147,483,647), the result wraps around according to two's complement rules.

Example:

  inspect(42 + 1, content="43")
  inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+=
1
}
Array[Token]
tokens
.
(self : Array[Token], value : Token) -> Unit

Adds an element to the end of the array.

If the array is at capacity, it will be reallocated.

Example

  let v = []
  v.push(3)
push
(
(String) -> Token
Operand
(
StringBuilder
buf
.
(self : StringBuilder) -> String

Returns the current content of the StringBuilder as a string.

to_string
()))
StringBuilder
buf
.
(self : StringBuilder) -> Unit

Resets the string builder to an empty state.

reset
()
} else if
(this : Char) -> Bool
isWhiteSpace
(
Char
ch
) {
continue } else { raise
(Int) -> LexError
LexError
(
Int
i
)
} } else { return
struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens
::{
Int
position
: 0,
Array[Token]
tokens
}
} } test {
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(source : String) -> Tokens raise LexError
tokenize
("(((((47)))))").
Array[Token]
tokens
,
String
content
=
#|[LParen, LParen, LParen, LParen, LParen, Operand("47"), RParen, RParen, RParen, RParen, RParen] )
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(source : String) -> Tokens raise LexError
tokenize
("13 + 6 + 5 * 3").
Array[Token]
tokens
,
String
content
=
#|[Operand("13"), Operator('+'), Operand("6"), Operator('+'), Operand("5"), Operator('*'), Operand("3")] ) }

Finally, we need a function to compute the binding power of operators, which can be implemented simply using match. In practice, a key-value container should be used for easy addition of new operators.

fn 
(op : Char) -> (Int, Int)?
infix_binding_power
(
Char
op
:
Char
Char
) -> (
Int
Int
,
Int
Int
)? {
match
Char
op
{
'+' =>
((Int, Int)) -> (Int, Int)?
Some
((1, 2))
'-' =>
((Int, Int)) -> (Int, Int)?
Some
((1, 2))
'/' =>
((Int, Int)) -> (Int, Int)?
Some
((3, 4))
'*' =>
((Int, Int)) -> (Int, Int)?
Some
((3, 4))
_ =>
(Int, Int)?
None
} }

Parser Implementation

First, retrieve the first token and assign it to the variable lhs (short for left-hand side, representing the left operand).

  • If it is an operand, store it.
  • If it is a left parenthesis, recursively parse the first expression and then consume a matching right parenthesis.
  • Any other result indicates an error, which should be raised.

Next, we peek at the first operator:

  • If the result is Eof, this is not a failure—a single operand can be a complete expression, so we break the loop.
  • If the result is an operator, proceed normally.
  • If the result is a right parenthesis, break the loop.
  • Any other result raises a ParseError.

We then need to determine which operator the lhs belongs to. This is where the min_bp parameter comes into play, representing the binding power of the nearest unfinished operator on the left. Its initial value is 0 (no operator is competing for the first operand). However, we first check if the operator is a parenthesis—if so, it means we are parsing an expression inside parentheses and should break the loop to end. This is one of the reasons for using the peek method, as we cannot determine whether to consume the operator here.

After calculating the binding power (l_bp, r_bp) of the current operator, we compare l_bp with min_bp:

  • If l_bp is less than min_bp, immediately break, returning lhs to the higher-level operator waiting for the right operand.
  • Otherwise, consume the current operator using pop, recursively call parseExpr to obtain the right operand with r_bp as the new min_bp, and assign the result to lhs before continuing the loop.
type! ParseError (
Int
Int
,
enum Token {
  LParen
  RParen
  Operand(String)
  Operator(Char)
  Eof
}
Token
) derive (
(ParseError, &Logger) -> Unit

Trait for types that can be converted to String

Show
)
fn
(self : Tokens, min_bp~ : Int = ..) -> SExpr raise ParseError
parseExpr
(
Tokens
self
:
struct Tokens {
  mut position: Int
  tokens: Array[Token]
}
Tokens
,
Int
min_bp
~ :
Int
Int
= 0) ->
enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr
!
type! ParseError (Int, Token)
ParseError
{
let mut
SExpr
lhs
= match
Tokens
self
.
(self : Tokens) -> Token
pop
() {
Token
LParen
=> {
let
SExpr
expr
=
Tokens
self
.
(self : Tokens, min_bp~ : Int = ..) -> SExpr raise ParseError
parseExpr
()
if
Tokens
self
.
(self : Tokens) -> Token
peek
() is
Token
RParen
{
(t : Token) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each(fn(x) { sum = sum + x })) // Ignore the Unit return value of each()
ignore
(
Tokens
self
.
(self : Tokens) -> Token
pop
())
SExpr
expr
} else { raise
((Int, Token)) -> ParseError
ParseError
((
Tokens
self
.
Int
position
,
Tokens
self
.
(self : Tokens) -> Token
peek
()))
} }
(String) -> Token
Operand
(
String
s
) =>
(String) -> SExpr
Atom
(
String
s
)
Token
t
=> raise
((Int, Token)) -> ParseError
ParseError
((
Tokens
self
.
Int
position
(self : Int, other : Int) -> Int

Performs subtraction between two 32-bit integers, following standard two's complement arithmetic rules. When the result overflows or underflows, it wraps around within the 32-bit integer range.

Parameters:

  • self : The minuend (the number being subtracted from).
  • other : The subtrahend (the number to subtract).

Returns the difference between self and other.

Example:

  let a = 42
  let b = 10
  inspect(a - b, content="32")
  let max = 2147483647 // Int maximum value
  inspect(max - -1, content="-2147483648") // Overflow case
-
1,
Token
t
))
} while true { let
Char
op
= match
Tokens
self
.
(self : Tokens) -> Token
peek
() {
Token
Eof
|
Token
RParen
=> break
(Char) -> Token
Operator
(
Char
op
) =>
Char
op
Token
t
=> raise
((Int, Token)) -> ParseError
ParseError
((
Tokens
self
.
Int
position
,
Token
t
))
} guard
(op : Char) -> (Int, Int)?
infix_binding_power
(
Char
op
) is
((Int, Int)) -> (Int, Int)?
Some
((
Int
l_bp
,
Int
r_bp
)) else {
raise
((Int, Token)) -> ParseError
ParseError
((
Tokens
self
.
Int
position
,
(Char) -> Token
Operator
(
Char
op
)))
} if
Int
l_bp
(self_ : Int, other : Int) -> Bool
<
Int
min_bp
{
break }
(t : Token) -> Unit

Evaluates an expression and discards its result. This is useful when you want to execute an expression for its side effects but don't care about its return value, or when you want to explicitly indicate that a value is intentionally unused.

Parameters:

  • value : The value to be ignored. Can be of any type.

Example:

  let x = 42
  ignore(x) // Explicitly ignore the value
  let mut sum = 0
  ignore([1, 2, 3].iter().each(fn(x) { sum = sum + x })) // Ignore the Unit return value of each()
ignore
(
Tokens
self
.
(self : Tokens) -> Token
pop
())
let
SExpr
rhs
=
Tokens
self
.
(self : Tokens, min_bp~ : Int) -> SExpr raise ParseError
parseExpr
(
Int
min_bp
=
Int
r_bp
)
SExpr
lhs
=
(Char, Array[SExpr]) -> SExpr
Cons
(
Char
op
, [
SExpr
lhs
,
SExpr
rhs
])
continue } return
SExpr
lhs
} fn
(s : String) -> SExpr raise Error
parse
(
String
s
:
String
String
) ->
enum SExpr {
  Atom(String)
  Cons(Char, Array[SExpr])
}
SExpr
!
type Error
Error
{
(source : String) -> Tokens raise LexError
tokenize
(
String
s
).
(self : Tokens, min_bp~ : Int = ..) -> SExpr raise ParseError
parseExpr
()
}

Now we have an extensible parser for arithmetic expressions. Additional test cases can be added to verify its correctness:

test {
  
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(s : String) -> SExpr raise Error
parse
("13 + 6 + 5 * 3"),
String
content
="(+ (+ 13 6) (* 5 3))")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(s : String) -> SExpr raise Error
parse
("3 * 3 + 5 * 5"),
String
content
="(+ (* 3 3) (* 5 5))")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(s : String) -> SExpr raise Error
parse
("(3 + 4) * 3 * (17 * 5)"),
String
content
="(* (* (+ 3 4) 3) (* 17 5))")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError

Tests if the string representation of an object matches the expected content. Used primarily in test cases to verify the correctness of Show implementations and program outputs.

Parameters:

  • object : The object to be inspected. Must implement the Show trait.
  • content : The expected string representation of the object. Defaults to an empty string.
  • location : Source code location information for error reporting. Automatically provided by the compiler.
  • arguments_location : Location information for function arguments in source code. Automatically provided by the compiler.

Throws an InspectError if the actual string representation of the object does not match the expected content. The error message includes detailed information about the mismatch, including source location and both expected and actual values.

Example:

  inspect(42, content="42")
  inspect("hello", content="hello")
  inspect([1, 2, 3], content="[1, 2, 3]")
inspect
(
(s : String) -> SExpr raise Error
parse
("(((47)))"),
String
content
="47")
}

However, the capabilities of Pratt parsers extend beyond this. They can also parse prefix operators (e.g., bitwise NOT !n), array indexing operators arr[i], and even ternary operators c ? e1 : e2. For more detailed parsing techniques, refer to Simple but Powerful Pratt Parsing. The author of this blog implemented an industrial-grade Pratt parser in the renowned program analysis tool rust-analyzer.