MoonBit Pearls Vol.01ļ¼Implementing a Pratt Parser in MoonBit
Last week, the MoonBit community launched the "MoonBit Pearls" campaign, calling for high-quality documentation and code examples. After meticulous selection, we are proud to officially release the first featured article in the "MoonBit Pearls" column this week.
As a long-term knowledge repository, this column will continuously curate and showcase outstanding documentation. We encourage more developers to contribute in the future, collectively enriching the MoonBit community ecosystem.
Below is the content of the first submission. The author provides a comprehensive case study demonstrating how to implement a Pratt parser in MoonBit:
Implementing a Pratt Parser in MoonBitā
During the compilation process, syntax analysis (also known as parsing) is a critical step. The primary responsibility of a parser is to convert a stream of tokens into an Abstract Syntax Tree (AST).
This article introduces an implementation algorithm for parsers: Pratt Parsing, a top-down operator precedence parsing method, and demonstrates how to implement it using MoonBit.
Why Use a Pratt Parser?ā
Infix expressions are familiar to almost every programmer. Even staunch Lisp/Forth programmers are aware that a majority of the world writes arithmetic expressions like this:
24 * (x + 4)
For compiler (or interpreter) writers, such infix expressions are slightly more challenging to parse compared to the prefix expressions used in Lisp or the postfix expressions used in Forth.
For example, a naive handwritten recursive descent parser would require multiple mutually recursive functions and the elimination of left recursion during syntax analysis, making the code less maintainable as the number of operators grows. Parser generator tools are also not entirely satisfactory for this problem. Consider the BNF for a simple arithmetic expression with addition and multiplication:
Expr ::=
Factor
| Expr '+' Factor
Factor ::=
Atom
| Factor '*' Atom
Atom ::=
'number'
| '(' Expr ')'
This does not look very intuitive and might require revisiting formal language theory from university courses.
Some languages, like Haskell, support custom infix operators, which are almost impossible to handle simply with parser generator tools.
The Pratt parser elegantly solves the problem of parsing infix expressions while also being easily extensible to support new operators (without modifying the source code!). It is recommended alongside recursive descent parsers in the renowned compilation textbookĀ Crafting InterpretersĀ and is used in projects like rust-analyzer.
Binding Powerā
In Pratt parsers, the concept used to describe associativity and precedence is calledĀ binding power. For each infix operator, the binding power is represented as a pair of integersāone for the left and one for the right. For example:
expr: A + B * C
power: 3 3 5 5
As the name suggests, a higher number means higher priority in capturing an operand (in this example, A, B, and C are operands).
The above example demonstrates operators with different precedence levels, while the associativity of the same operator is represented by asymmetric binding powers.
expr: A + B + C
power: 1 2 1 2
In this case, when parsing reaches B, the expression is grouped as follows due to the higher binding power on the left:
expr: (A + B) + C
power: 1 2
Next, letās see how the Pratt parser uses this concept during execution.
Overview and Preparationā
The main framework of a Pratt parser looks like this:
fn parse(self : Tokens, min_bp : Int) -> SExpr ! ParseError {
...
while true {
parse(...)
}
...
}
As shown above, it is implemented using a combination of recursion and loops. This corresponds to two modes:
- The leftmost expression is always the innermost, e.g.,Ā
"1 + 2 + 3" = "(1 + 2) + 3"
, which can be parsed using just a loop. - The rightmost expression is always the innermost, e.g.,Ā
"1 + 2 + 3" = "1 + (2 + 3)"
, which can be parsed using recursion alone.
TheĀ min_bp
Ā parameter represents the binding power of the nearest unfinished operator on the left.
Our goal is to read a token stream and output a prefix expression without worrying about precedence:
enum SExpr {
(String) -> SExpr
Atom(String
String)
(Char, Array[SExpr]) -> SExpr
Cons(Char
Char, type Array[T]
An Array
is a collection of values that supports random access and can
grow in size.
Array[enum SExpr {
Atom(String)
Cons(Char, Array[SExpr])
}
SExpr])
}
impl trait Show {
output(Self, &Logger) -> Unit
to_string(Self) -> String
}
Trait for types that can be converted to String
Show for enum SExpr {
Atom(String)
Cons(Char, Array[SExpr])
}
SExpr with (self : SExpr, logger : &Logger) -> Unit
output(SExpr
self, &Logger
logger) {
match SExpr
self {
(String) -> SExpr
Atom(String
s) => &Logger
logger.(&Logger, String) -> Unit
write_string(String
s)
(Char, Array[SExpr]) -> SExpr
Cons(Char
op, Array[SExpr]
args) => {
&Logger
logger.(&Logger, Char) -> Unit
write_char('(')
&Logger
logger.(&Logger, Char) -> Unit
write_char(Char
op)
for Int
i = 0; Int
i (self_ : Int, other : Int) -> Bool
< Array[SExpr]
args.(self : Array[SExpr]) -> Int
Returns the number of elements in the array.
Parameters:
array
: The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length(); Int
i = Int
i (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:
self
: The first integer operand.
other
: The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+ 1 {
&Logger
logger.(&Logger, Char) -> Unit
write_char(' ')
&Logger
logger.(&Logger, String) -> Unit
write_string(Array[SExpr]
args(Array[SExpr], Int) -> SExpr
Retrieves an element from the array at the specified index.
Parameters:
array
: The array to get the element from.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
let arr = [1, 2, 3]
inspect(arr[1], content="2")
[i].(self : SExpr) -> String
to_string())
}
&Logger
logger.(&Logger, Char) -> Unit
write_char(')')
}
}
}
test {
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:
object
: The object to be inspected. Must implement the Show
trait.
content
: The expected string representation of the object. Defaults to
an empty string.
location
: Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location
: Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError
if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect((Char, Array[SExpr]) -> SExpr
Cons('+', [(String) -> SExpr
Atom("3"), (String) -> SExpr
Atom("4")]), String
content="(+ 3 4)")
}
Since errors may occur during this process, the return type ofĀ parseExpr
Ā isĀ SExpr ! ParseError
.
Before writing the parser, however, we need to split the input string into a simple token stream.
enum Token {
Token
LParen
Token
RParen
(String) -> Token
Operand(String
String)
(Char) -> Token
Operator(Char
Char)
Token
Eof
} derive((Token, &Logger) -> Unit
Trait for types that can be converted to String
Show, (Token, Token) -> Bool
Trait for types whose elements can test for equality
Eq)
struct Tokens {
mut Int
position : Int
Int
Array[Token]
tokens : type Array[T]
An Array
is a collection of values that supports random access and can
grow in size.
Array[enum Token {
LParen
RParen
Operand(String)
Operator(Char)
Eof
}
Token]
}
This token stream requires two methods:Ā peek()
Ā andĀ pop()
.
TheĀ peek()
Ā method retrieves the first token in the stream without modifying the stateāin other words, it is side-effect-free and merely "peeks" at the upcoming content. For an empty token stream, it returnsĀ Eof.
fn (self : Tokens) -> Token
peek(Tokens
self : struct Tokens {
mut position: Int
tokens: Array[Token]
}
Tokens) -> enum Token {
LParen
RParen
Operand(String)
Operator(Char)
Eof
}
Token {
if Tokens
self.Int
position (self_ : Int, other : Int) -> Bool
< Tokens
self.Array[Token]
tokens.(self : Array[Token]) -> Int
Returns the number of elements in the array.
Parameters:
array
: The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length() {
Tokens
self.Array[Token]
tokens.(self : Array[Token], idx : Int) -> Token
Retrieves the element at the specified index from an array without bounds
checking.
Parameters:
array
: The array from which to retrieve the element.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Example:
let arr = [1, 2, 3]
inspect(arr.unsafe_get(1), content="2")
unsafe_get(Tokens
self.Int
position)
} else {
Token
Eof
}
}
TheĀ pop()
Ā method consumes a token after peeking.
fn (self : Tokens) -> Token
pop(Tokens
self : struct Tokens {
mut position: Int
tokens: Array[Token]
}
Tokens) -> enum Token {
LParen
RParen
Operand(String)
Operator(Char)
Eof
}
Token {
if Tokens
self.Int
position (self_ : Int, other : Int) -> Bool
< Tokens
self.Array[Token]
tokens.(self : Array[Token]) -> Int
Returns the number of elements in the array.
Parameters:
array
: The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length() {
let Int
pos = Tokens
self.Int
position
Tokens
self.Int
position (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:
self
: The first integer operand.
other
: The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+= 1
Tokens
self.Array[Token]
tokens.(self : Array[Token], idx : Int) -> Token
Retrieves the element at the specified index from an array without bounds
checking.
Parameters:
array
: The array from which to retrieve the element.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Example:
let arr = [1, 2, 3]
inspect(arr.unsafe_get(1), content="2")
unsafe_get(Int
pos)
} else {
Token
Eof
}
}
TheĀ tokenize
Ā function is responsible for converting a string into a token stream.
fn (this : Char) -> Bool
isDigit(Char
this : Char
Char) -> Bool
Bool {
Char
this is '0'..='9'
}
fn (this : Char) -> Bool
isAlpha(Char
this : Char
Char) -> Bool
Bool {
Char
this is 'A'..='Z' (Bool, Bool) -> Bool
|| Char
this is 'a'..='z'
}
fn (this : Char) -> Bool
isWhiteSpace(Char
this : Char
Char) -> Bool
Bool {
Char
this (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:
self
: The first character to compare.
other
: The second character to compare.
Returns true
if both characters represent the same Unicode code point,
false
otherwise.
Example:
let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
== ' ' (Bool, Bool) -> Bool
|| Char
this (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:
self
: The first character to compare.
other
: The second character to compare.
Returns true
if both characters represent the same Unicode code point,
false
otherwise.
Example:
let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
== '\t' (Bool, Bool) -> Bool
|| Char
this (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:
self
: The first character to compare.
other
: The second character to compare.
Returns true
if both characters represent the same Unicode code point,
false
otherwise.
Example:
let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
== '\n'
}
fn (this : Char) -> Bool
isOperator(Char
this : Char
Char) -> Bool
Bool {
let String
operators = "+-*/"
String
operators.(self : String, c : Char) -> Bool
Returns true if this string contains the given character.
contains_char(Char
this)
}
type! LexError Int
Int
fn (source : String) -> Tokens raise LexError
tokenize(String
source : String
String) -> struct Tokens {
mut position: Int
tokens: Array[Token]
}
Tokens!type! LexError Int
LexError {
let Array[Token]
tokens = []
let Array[Char]
source = String
source.(self : String) -> Array[Char]
Converts the String into an array of Chars.
to_array()
let StringBuilder
buf = type StringBuilder
StringBuilder::(size_hint~ : Int) -> StringBuilder
Creates a new string builder with an optional initial capacity hint.
Parameters:
size_hint
: An optional initial capacity hint for the internal buffer. If
less than 1, a minimum capacity of 1 is used. Defaults to 0. It is the size of bytes,
not the size of characters. size_hint
may be ignored on some platforms, JS for example.
Returns a new StringBuilder
instance with the specified initial capacity.
new(Int
size_hint = 100)
let mut Int
i = 0
while Int
i (self_ : Int, other : Int) -> Bool
< Array[Char]
source.(self : Array[Char]) -> Int
Returns the number of elements in the array.
Parameters:
array
: The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length() {
let Char
ch = Array[Char]
source.(self : Array[Char], idx : Int) -> Char
Retrieves the element at the specified index from an array without bounds
checking.
Parameters:
array
: The array from which to retrieve the element.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Example:
let arr = [1, 2, 3]
inspect(arr.unsafe_get(1), content="2")
unsafe_get(Int
i)
Int
i (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:
self
: The first integer operand.
other
: The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+= 1
if Char
ch (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:
self
: The first character to compare.
other
: The second character to compare.
Returns true
if both characters represent the same Unicode code point,
false
otherwise.
Example:
let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
== '('{
Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
let v = []
v.push(3)
push(Token
LParen)
} else if Char
ch (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:
self
: The first character to compare.
other
: The second character to compare.
Returns true
if both characters represent the same Unicode code point,
false
otherwise.
Example:
let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
== ')' {
Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
let v = []
v.push(3)
push(Token
RParen)
} else if (this : Char) -> Bool
isOperator(Char
ch) {
Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
let v = []
v.push(3)
push((Char) -> Token
Operator(Char
ch))
} else if (this : Char) -> Bool
isAlpha(Char
ch) {
StringBuilder
buf.(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Char
ch)
while Int
i (self_ : Int, other : Int) -> Bool
< Array[Char]
source.(self : Array[Char]) -> Int
Returns the number of elements in the array.
Parameters:
array
: The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length() (Bool, Bool) -> Bool
&& ((this : Char) -> Bool
isAlpha(Array[Char]
source(Array[Char], Int) -> Char
Retrieves an element from the array at the specified index.
Parameters:
array
: The array to get the element from.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
let arr = [1, 2, 3]
inspect(arr[1], content="2")
[i]) (Bool, Bool) -> Bool
|| (this : Char) -> Bool
isDigit(Array[Char]
source(Array[Char], Int) -> Char
Retrieves an element from the array at the specified index.
Parameters:
array
: The array to get the element from.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
let arr = [1, 2, 3]
inspect(arr[1], content="2")
[i]) (Bool, Bool) -> Bool
|| Array[Char]
source(Array[Char], Int) -> Char
Retrieves an element from the array at the specified index.
Parameters:
array
: The array to get the element from.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
let arr = [1, 2, 3]
inspect(arr[1], content="2")
[i] (self : Char, other : Char) -> Bool
Compares two characters for equality.
Parameters:
self
: The first character to compare.
other
: The second character to compare.
Returns true
if both characters represent the same Unicode code point,
false
otherwise.
Example:
let a = 'A'
let b = 'A'
let c = 'B'
inspect(a == b, content="true")
inspect(a == c, content="false")
== '_') {
StringBuilder
buf.(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Array[Char]
source(Array[Char], Int) -> Char
Retrieves an element from the array at the specified index.
Parameters:
array
: The array to get the element from.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
let arr = [1, 2, 3]
inspect(arr[1], content="2")
[i])
Int
i (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:
self
: The first integer operand.
other
: The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+= 1
}
Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
let v = []
v.push(3)
push((String) -> Token
Operand(StringBuilder
buf.(self : StringBuilder) -> String
Returns the current content of the StringBuilder as a string.
to_string()))
StringBuilder
buf.(self : StringBuilder) -> Unit
Resets the string builder to an empty state.
reset()
} else if (this : Char) -> Bool
isDigit(Char
ch) {
StringBuilder
buf.(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Char
ch)
while Int
i (self_ : Int, other : Int) -> Bool
< Array[Char]
source.(self : Array[Char]) -> Int
Returns the number of elements in the array.
Parameters:
array
: The array whose length is to be determined.
Returns the number of elements in the array as an integer.
Example:
let arr = [1, 2, 3]
inspect(arr.length(), content="3")
let empty : Array[Int] = []
inspect(empty.length(), content="0")
length() (Bool, Bool) -> Bool
&& (this : Char) -> Bool
isDigit(Array[Char]
source(Array[Char], Int) -> Char
Retrieves an element from the array at the specified index.
Parameters:
array
: The array to get the element from.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
let arr = [1, 2, 3]
inspect(arr[1], content="2")
[i]) {
StringBuilder
buf.(self : StringBuilder, ch : Char) -> Unit
Writes a character to the StringBuilder.
write_char(Array[Char]
source(Array[Char], Int) -> Char
Retrieves an element from the array at the specified index.
Parameters:
array
: The array to get the element from.
index
: The position in the array from which to retrieve the element.
Returns the element at the specified index.
Throws a panic if the index is negative or greater than or equal to the
length of the array.
Example:
let arr = [1, 2, 3]
inspect(arr[1], content="2")
[i])
Int
i (self : Int, other : Int) -> Int
Adds two 32-bit signed integers. Performs two's complement arithmetic, which
means the operation will wrap around if the result exceeds the range of a
32-bit integer.
Parameters:
self
: The first integer operand.
other
: The second integer operand.
Returns a new integer that is the sum of the two operands. If the
mathematical sum exceeds the range of a 32-bit integer (-2,147,483,648 to
2,147,483,647), the result wraps around according to two's complement rules.
Example:
inspect(42 + 1, content="43")
inspect(2147483647 + 1, content="-2147483648") // Overflow wraps around to minimum value
+= 1
}
Array[Token]
tokens.(self : Array[Token], value : Token) -> Unit
Adds an element to the end of the array.
If the array is at capacity, it will be reallocated.
Example
let v = []
v.push(3)
push((String) -> Token
Operand(StringBuilder
buf.(self : StringBuilder) -> String
Returns the current content of the StringBuilder as a string.
to_string()))
StringBuilder
buf.(self : StringBuilder) -> Unit
Resets the string builder to an empty state.
reset()
} else if (this : Char) -> Bool
isWhiteSpace(Char
ch) {
continue
} else {
raise (Int) -> LexError
LexError(Int
i)
}
} else {
return struct Tokens {
mut position: Int
tokens: Array[Token]
}
Tokens::{ Int
position : 0, Array[Token]
tokens }
}
}
test {
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:
object
: The object to be inspected. Must implement the Show
trait.
content
: The expected string representation of the object. Defaults to
an empty string.
location
: Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location
: Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError
if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect((source : String) -> Tokens raise LexError
tokenize("(((((47)))))").Array[Token]
tokens, String
content=
#|[LParen, LParen, LParen, LParen, LParen, Operand("47"), RParen, RParen, RParen, RParen, RParen]
)
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:
object
: The object to be inspected. Must implement the Show
trait.
content
: The expected string representation of the object. Defaults to
an empty string.
location
: Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location
: Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError
if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect((source : String) -> Tokens raise LexError
tokenize("13 + 6 + 5 * 3").Array[Token]
tokens, String
content=
#|[Operand("13"), Operator('+'), Operand("6"), Operator('+'), Operand("5"), Operator('*'), Operand("3")]
)
}
Finally, we need a function to compute the binding power of operators, which can be implemented simply usingĀ match
. In practice, a key-value container should be used for easy addition of new operators.
fn (op : Char) -> (Int, Int)?
infix_binding_power(Char
op : Char
Char) -> (Int
Int, Int
Int)? {
match Char
op {
'+' => ((Int, Int)) -> (Int, Int)?
Some((1, 2))
'-' => ((Int, Int)) -> (Int, Int)?
Some((1, 2))
'/' => ((Int, Int)) -> (Int, Int)?
Some((3, 4))
'*' => ((Int, Int)) -> (Int, Int)?
Some((3, 4))
_ => (Int, Int)?
None
}
}
Parser Implementationā
First, retrieve the first token and assign it to the variableĀ lhs
Ā (short for left-hand side, representing the left operand).
- If it is an operand, store it.
- If it is a left parenthesis, recursively parse the first expression and then consume a matching right parenthesis.
- Any other result indicates an error, which should be raised.
Next, we peek at the first operator:
- If the result isĀ
Eof
, this is not a failureāa single operand can be a complete expression, so we break the loop. - If the result is an operator, proceed normally.
- If the result is a right parenthesis, break the loop.
- Any other result raises aĀ
ParseError
.
We then need to determine which operator theĀ lhs
Ā belongs to. This is where theĀ min_bp
Ā parameter comes into play, representing the binding power of the nearest unfinished operator on the left. Its initial value is 0 (no operator is competing for the first operand). However, we first check if the operator is a parenthesisāif so, it means we are parsing an expression inside parentheses and should break the loop to end. This is one of the reasons for using theĀ peek
Ā method, as we cannot determine whether to consume the operator here.
After calculating the binding powerĀ (l_bp, r_bp)
Ā of the current operator, we compareĀ l_bp
Ā withĀ min_bp
:
- IfĀ
l_bp
Ā is less thanĀmin_bp
, immediately break, returningĀlhs
Ā to the higher-level operator waiting for the right operand. - Otherwise, consume the current operator usingĀ
pop
, recursively callĀparseExpr
Ā to obtain the right operand withĀr_bp
Ā as the newĀmin_bp
, and assign the result toĀlhs
Ā before continuing the loop.
type! ParseError (Int
Int, enum Token {
LParen
RParen
Operand(String)
Operator(Char)
Eof
}
Token) derive ((ParseError, &Logger) -> Unit
Trait for types that can be converted to String
Show)
fn (self : Tokens, min_bp~ : Int = ..) -> SExpr raise ParseError
parseExpr(Tokens
self : struct Tokens {
mut position: Int
tokens: Array[Token]
}
Tokens, Int
min_bp~ : Int
Int = 0) -> enum SExpr {
Atom(String)
Cons(Char, Array[SExpr])
}
SExpr ! type! ParseError (Int, Token)
ParseError {
let mut SExpr
lhs = match Tokens
self.(self : Tokens) -> Token
pop() {
Token
LParen => {
let SExpr
expr = Tokens
self.(self : Tokens, min_bp~ : Int = ..) -> SExpr raise ParseError
parseExpr()
if Tokens
self.(self : Tokens) -> Token
peek() is Token
RParen {
(t : Token) -> Unit
Evaluates an expression and discards its result. This is useful when you want
to execute an expression for its side effects but don't care about its return
value, or when you want to explicitly indicate that a value is intentionally
unused.
Parameters:
value
: The value to be ignored. Can be of any type.
Example:
let x = 42
ignore(x) // Explicitly ignore the value
let mut sum = 0
ignore([1, 2, 3].iter().each(fn(x) { sum = sum + x })) // Ignore the Unit return value of each()
ignore(Tokens
self.(self : Tokens) -> Token
pop())
SExpr
expr
} else {
raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position, Tokens
self.(self : Tokens) -> Token
peek()))
}
}
(String) -> Token
Operand(String
s) => (String) -> SExpr
Atom(String
s)
Token
t => raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position (self : Int, other : Int) -> Int
Performs subtraction between two 32-bit integers, following standard two's
complement arithmetic rules. When the result overflows or underflows, it
wraps around within the 32-bit integer range.
Parameters:
self
: The minuend (the number being subtracted from).
other
: The subtrahend (the number to subtract).
Returns the difference between self
and other
.
Example:
let a = 42
let b = 10
inspect(a - b, content="32")
let max = 2147483647 // Int maximum value
inspect(max - -1, content="-2147483648") // Overflow case
- 1, Token
t))
}
while true {
let Char
op = match Tokens
self.(self : Tokens) -> Token
peek() {
Token
Eof | Token
RParen => break
(Char) -> Token
Operator(Char
op) => Char
op
Token
t => raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position, Token
t))
}
guard (op : Char) -> (Int, Int)?
infix_binding_power(Char
op) is ((Int, Int)) -> (Int, Int)?
Some((Int
l_bp, Int
r_bp)) else {
raise ((Int, Token)) -> ParseError
ParseError((Tokens
self.Int
position, (Char) -> Token
Operator(Char
op)))
}
if Int
l_bp (self_ : Int, other : Int) -> Bool
< Int
min_bp {
break
}
(t : Token) -> Unit
Evaluates an expression and discards its result. This is useful when you want
to execute an expression for its side effects but don't care about its return
value, or when you want to explicitly indicate that a value is intentionally
unused.
Parameters:
value
: The value to be ignored. Can be of any type.
Example:
let x = 42
ignore(x) // Explicitly ignore the value
let mut sum = 0
ignore([1, 2, 3].iter().each(fn(x) { sum = sum + x })) // Ignore the Unit return value of each()
ignore(Tokens
self.(self : Tokens) -> Token
pop())
let SExpr
rhs = Tokens
self.(self : Tokens, min_bp~ : Int) -> SExpr raise ParseError
parseExpr(Int
min_bp = Int
r_bp)
SExpr
lhs = (Char, Array[SExpr]) -> SExpr
Cons(Char
op, [SExpr
lhs, SExpr
rhs])
continue
}
return SExpr
lhs
}
fn (s : String) -> SExpr raise Error
parse(String
s : String
String) -> enum SExpr {
Atom(String)
Cons(Char, Array[SExpr])
}
SExpr ! type Error
Error {
(source : String) -> Tokens raise LexError
tokenize(String
s).(self : Tokens, min_bp~ : Int = ..) -> SExpr raise ParseError
parseExpr()
}
Now we have an extensible parser for arithmetic expressions. Additional test cases can be added to verify its correctness:
test {
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:
object
: The object to be inspected. Must implement the Show
trait.
content
: The expected string representation of the object. Defaults to
an empty string.
location
: Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location
: Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError
if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect((s : String) -> SExpr raise Error
parse("13 + 6 + 5 * 3"), String
content="(+ (+ 13 6) (* 5 3))")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:
object
: The object to be inspected. Must implement the Show
trait.
content
: The expected string representation of the object. Defaults to
an empty string.
location
: Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location
: Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError
if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect((s : String) -> SExpr raise Error
parse("3 * 3 + 5 * 5"), String
content="(+ (* 3 3) (* 5 5))")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:
object
: The object to be inspected. Must implement the Show
trait.
content
: The expected string representation of the object. Defaults to
an empty string.
location
: Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location
: Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError
if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect((s : String) -> SExpr raise Error
parse("(3 + 4) * 3 * (17 * 5)"), String
content="(* (* (+ 3 4) 3) (* 17 5))")
(obj : &Show, content~ : String, loc~ : SourceLoc = _, args_loc~ : ArgsLoc = _) -> Unit raise InspectError
Tests if the string representation of an object matches the expected content.
Used primarily in test cases to verify the correctness of Show
implementations and program outputs.
Parameters:
object
: The object to be inspected. Must implement the Show
trait.
content
: The expected string representation of the object. Defaults to
an empty string.
location
: Source code location information for error reporting.
Automatically provided by the compiler.
arguments_location
: Location information for function arguments in
source code. Automatically provided by the compiler.
Throws an InspectError
if the actual string representation of the object
does not match the expected content. The error message includes detailed
information about the mismatch, including source location and both expected
and actual values.
Example:
inspect(42, content="42")
inspect("hello", content="hello")
inspect([1, 2, 3], content="[1, 2, 3]")
inspect((s : String) -> SExpr raise Error
parse("(((47)))"), String
content="47")
}
However, the capabilities of Pratt parsers extend beyond this. They can also parse prefix operators (e.g., bitwise NOTĀ !n
), array indexing operatorsĀ arr[i]
, and even ternary operatorsĀ c ? e1 : e2
. For more detailed parsing techniques, refer toĀ Simple but Powerful Pratt Parsing. The author of this blog implemented an industrial-grade Pratt parser in the renowned program analysis tool rust-analyzer.