Module scrolls.ast
The parser implementation.
Using The Parser
Quickstart
Often, all you need to do is parse a script and get the syntax tree. To do this:
import scrolls
script = "..."
tokenizer = scrolls.Tokenizer(script)
ast = scrolls.parse_scroll(tokenizer)
The AST
(Abstract Syntax Tree) is a generic structure
that represents the semantic content of a script. This structure is what is actually interpreted by
the Scrolls interpreter. See scrolls.interpreter
for more detail on AST
interpretation. See the following
sections for a more detailed description of the parsing process.
Tokenizing
Parsing is done in two stages, lexical analysis (tokenizing), and syntactic analysis. First, the
Tokenizer
is used to break a script into a list of pieces, assigning
meaning to each. These pieces are called tokens (see Token
).
>>> import scrolls
>>> script = """
... !repeat(4) {
... print "Hello, world!"
... }
... """
>>> tokenizer = scrolls.Tokenizer(script)
>>> tokens = tokenizer.get_all_tokens()
>>> for tok in tokens:
... print(tok)
...
CONTROL_SIGIL:'!'
STRING_LITERAL:'repeat'
OPEN_PAREN:'('
STRING_LITERAL:'4'
CLOSE_PAREN:')'
OPEN_BLOCK:'{'
COMMAND_SEP:'\n'
STRING_LITERAL:'print'
STRING_LITERAL:'Hello, world!'
COMMAND_SEP:'\n'
CLOSE_BLOCK:'}'
EOF:''
>>>
Each token represents a TokenType
and an associated value. For instance,
the second token shown above, STRING_LITERAL:'repeat'
is a string literal token, with the value repeat
.
Note
Typically, you won't need to pull tokens from the Tokenizer
, just configure it. It's just
helpful to understand what it actually does.
Syntactic Analysis
The tokens are analyzed for their syntactic structure, and a data structure is built based on it.
The analysis starts at parse_scroll()
. This function will
automatically pull tokens from a Tokenizer
object, and generate the corresponding
AST
.
>>> import scrolls
>>> script = """
... !repeat(4) {
... print "Hello, world!"
... }
... """
>>> tokenizer = scrolls.Tokenizer(script)
>>> ast = scrolls.parse_scroll(tokenizer)
>>> print(ast.prettify())
{
"_tok": "None",
"_type": "ROOT",
"children": [
{
"_tok": "CONTROL_SIGIL:'!'",
"_type": "CONTROL_CALL",
"children": [
{
"_tok": "STRING_LITERAL:'repeat'",
"_type": "STRING",
"children": []
},
{
"_tok": "OPEN_PAREN:'('",
"_type": "CONTROL_ARGUMENTS",
"children": [
{
"_tok": "STRING_LITERAL:'4'",
"_type": "STRING",
"children": []
}
]
},
{
"_tok": "OPEN_BLOCK:'{'",
"_type": "BLOCK",
"children": [
{
"_tok": "STRING_LITERAL:'print'",
"_type": "COMMAND_CALL",
"children": [
{
"_tok": "STRING_LITERAL:'print'",
"_type": "STRING",
"children": []
},
{
"_tok": "STRING_LITERAL:'Hello, world!'",
"_type": "COMMAND_ARGUMENTS",
"children": [
{
"_tok": "STRING_LITERAL:'Hello, world!'",
"_type": "STRING",
"children": []
}
]
}
]
}
]
}
]
}
]
}
AST instances consist of a tree of ASTNode
objects. Each node keeps track of the token that triggered
its generation. This is used primarily for informative display of errors during interpreter runtime.
Scrolls uses a recursive descent
approach, implemented with parser combinators.
The parsing scheme of Scrolls is intentionally barebones, and does not include any control structures
at all. Instead, all identifiers are ASTNodeType.STRING
, which are interpreted at runtime based on
their location in the syntax tree.
Expand source code
"""
The parser implementation.
.. include:: ./pdoc/ast.md
"""
from .ast_constants import *
from .ast_errors import *
from .streams import *
from .syntax import *
from .tokenizer import *
Sub-modules
scrolls.ast.ast_constants
-
Numeric and string constants for the language.
scrolls.ast.ast_errors
-
Errors related to language parsing.
scrolls.ast.streams
-
Character streams for feeding
Tokenizer
objects. scrolls.ast.syntax
-
Syntactic analysis …
scrolls.ast.tokenizer
-
The tokenizer implementation …