Module `scrolls.ast`

The parser implementation.

Using The Parser

Quickstart

Often, all you need to do is parse a script and get the syntax tree. To do this:

import scrolls

script = "..."
tokenizer = scrolls.Tokenizer(script)
ast = scrolls.parse_scroll(tokenizer)

The AST (Abstract Syntax Tree) is a generic structure that represents the semantic content of a script. This structure is what is actually interpreted by the Scrolls interpreter. See scrolls.interpreter for more detail on AST interpretation. See the following sections for a more detailed description of the parsing process.

Tokenizing

Parsing is done in two stages, lexical analysis (tokenizing), and syntactic analysis. First, the Tokenizer is used to break a script into a list of pieces, assigning meaning to each. These pieces are called tokens (see Token).

>>> import scrolls
>>> script = """
... !repeat(4) {
...     print "Hello, world!"
... }
... """
>>> tokenizer = scrolls.Tokenizer(script)
>>> tokens = tokenizer.get_all_tokens()
>>> for tok in tokens:
...     print(tok)
...
CONTROL_SIGIL:'!'
STRING_LITERAL:'repeat'
OPEN_PAREN:'('
STRING_LITERAL:'4'
CLOSE_PAREN:')'
OPEN_BLOCK:'{'
COMMAND_SEP:'\n'
STRING_LITERAL:'print'
STRING_LITERAL:'Hello, world!'
COMMAND_SEP:'\n'
CLOSE_BLOCK:'}'
EOF:''
>>>

Each token represents a TokenType and an associated value. For instance, the second token shown above, STRING_LITERAL:'repeat' is a string literal token, with the value repeat.

Note

Typically, you won't need to pull tokens from the Tokenizer, just configure it. It's just helpful to understand what it actually does.

Syntactic Analysis

The tokens are analyzed for their syntactic structure, and a data structure is built based on it. The analysis starts at parse_scroll(). This function will automatically pull tokens from a Tokenizer object, and generate the corresponding AST.

>>> import scrolls
>>> script = """
... !repeat(4) {
...     print "Hello, world!"
... }
... """
>>> tokenizer = scrolls.Tokenizer(script)
>>> ast = scrolls.parse_scroll(tokenizer)
>>> print(ast.prettify())
{
    "_tok": "None",
    "_type": "ROOT",
    "children": [
        {
            "_tok": "CONTROL_SIGIL:'!'",
            "_type": "CONTROL_CALL",
            "children": [
                {
                    "_tok": "STRING_LITERAL:'repeat'",
                    "_type": "STRING",
                    "children": []
                },
                {
                    "_tok": "OPEN_PAREN:'('",
                    "_type": "CONTROL_ARGUMENTS",
                    "children": [
                        {
                            "_tok": "STRING_LITERAL:'4'",
                            "_type": "STRING",
                            "children": []
                        }
                    ]
                },
                {
                    "_tok": "OPEN_BLOCK:'{'",
                    "_type": "BLOCK",
                    "children": [
                        {
                            "_tok": "STRING_LITERAL:'print'",
                            "_type": "COMMAND_CALL",
                            "children": [
                                {
                                    "_tok": "STRING_LITERAL:'print'",
                                    "_type": "STRING",
                                    "children": []
                                },
                                {
                                    "_tok": "STRING_LITERAL:'Hello, world!'",
                                    "_type": "COMMAND_ARGUMENTS",
                                    "children": [
                                        {
                                            "_tok": "STRING_LITERAL:'Hello, world!'",
                                            "_type": "STRING",
                                            "children": []
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

AST instances consist of a tree of ASTNode objects. Each node keeps track of the token that triggered its generation. This is used primarily for informative display of errors during interpreter runtime.

Scrolls uses a recursive descent approach, implemented with parser combinators. The parsing scheme of Scrolls is intentionally barebones, and does not include any control structures at all. Instead, all identifiers are ASTNodeType.STRING, which are interpreted at runtime based on their location in the syntax tree.

Expand source code

"""
The parser implementation.

.. include:: ./pdoc/ast.md
"""

from .ast_constants import *
from .ast_errors import *
from .streams import *
from .syntax import *
from .tokenizer import *

Sub-modules

scrolls.ast.ast_constants: Numeric and string constants for the language.
scrolls.ast.ast_errors: Errors related to language parsing.
scrolls.ast.streams: Character streams for feeding Tokenizer objects.
scrolls.ast.syntax: Syntactic analysis …
scrolls.ast.tokenizer: The tokenizer implementation …