REBOL Enhancement Proposal for the PARSE dialect

Author: Gabriele Santilli
Date: 9-May-2006


1. Abstract
4. TO and THRU
5. NOT
7. IF or CHECK
11. RULE! datatype
12. PARSE handling FUNCTION! values
13. DO
13.1 Why raise an ERROR! ?

1. Abstract

This REP is a summary of various proposals regarding the PARSE dialect that have been made in the past few years. I wrote a REP for PARSE on January 2003, and Ladislav Mecir wrote other proposals a little while later; new ideas have been proposed lately by Brian Hawley on the REBOL3 AltME world.

I originally proposed adding the DO and THROW commands. Ladislav proposed the LITERAL command, making TO and THRU work with subrules, the NOT command, the FAIL command and the IF command; he also proposes a DO-RULE command that should not be confused with my DO proposal. Brian proposes CHECK (similar to Ladislav's IF), REMOVE, REPLACE, REPLACE-ONLY and INTO-STRING; he's also proposing a RULE! datatype. I'm now proposing that PARSE should handle FUNCTION! values specially, and this would allow adding most of the commands proposed here as mezzanine code (although, it may still make sense to have them as keywords to avoid clashes with normal REBOL functions).


The THROW command modifies the behavior of PARSE when a rule fails. It could be used while parsing both blocks and strings. If the rule that follows the THROW command fails, PARSE should raise an ERROR!, reporting the position of the PARSE cursor just before matching that rule in the NEAR field of the error. THROW should accept a string argument as the error message.

    rule: [pair! | 2 number!]

    parse ["something else"] rule
    ; PARSE returns FALSE

    parse ["something else"] [throw "Size expected" rule]
    ; ** Parse Error: Size expected
    ; ** Near: "something else"

    digit: charset "0123456789"
    parse "1234abc" [throw "Digit expected" [some digit end]]
    ; ** Parse Error: Digit expected
    ; ** Near: abc

With this command available, just by adding the THROW keyword in the rules you would get better error reporting for your dialect.


Let's suppose, that we want to use PARSE to check, whether a block contains numbers 1 2 3 in this order. The rule can be:

    parse block [1 1 1 1 1 2 1 1 3]

which looks awful. For this reason Ladislav proposes something like:

    parse block [literal [1 2 3]]

4. TO and THRU

It has been asked many times by the community to allow TO and THRU to accept a subrule as well as a literal value to search for. Basically,

    thru rule

should be equivalent to

    (cont: [end skip]) some [rule (cont: none) break | skip] cont

(with maybe some optimizations where possible).

5. NOT

The rule

    not rule

would succeed if rule fails and viceversa. It would be equivalent to:

    [rule (cont: [end skip]) | none (cont: none)] cont


Even in the examples above we often use the [end skip] idiom to force a rule to fail. PARSE should probably have a FAIL keyword that always fails (the opposite of NONE), or maybe we should have:

    fail: [end skip]

by default in REBOL.

7. IF or CHECK

Ladislav proposes an IF command, so that:

    if [condition]

is equivalent to:

    (cont: unless condition [[end skip]]) cont

Brian proposes to call it CHECK and express it as:

    check (condition)


Ladislav proposes a way to apply a dynamically built rule: PARSE would use the result of evaluating some REBOL code as the rule to match.

    do-rule [append ['a] ['b]]

would match the rule:

    'a 'b

I'd prefer to use a paren instead of a block for the code though.


Brian proposes the following commands and the equivalent rules for the current PARSE:

    remove rule              ==> tmp1: rule tmp2: :tmp1 (remove/part :tmp1 :tmp2)
    replace rule (code)      ==> tmp1: rule tmp2: :tmp1 (tmp1: change/part :tmp1 (code) :tmp2) :tmp1
    replace-only rule (code) ==> tmp1: rule tmp2: :tmp1 (tmp1: change/part/only :tmp1 (code) :tmp2) :tmp1

Note that if parse operations are changed to take refinements, replace-only could be expressed as replace/only. This would be slower in a native implementation but it would look more REBOL-like if that matters to you.


Brian proposes a way to parse a substring while doing block parsing.

    into-string rule ==> set tmp1 string! (tmp1: unless parse tmp1 rule [fail]) tmp1

As above, this could maybe be expressed as INTO/STRING.

11. RULE! datatype

Using Brian's words:

Here's my first attempt at a pattern for recursion-safe temporaries:

    use [var ...] [rule ...] ==> (tmp1: use [var ...] copy/deep [[rule ...]]) tmp1

It would only work with a directly specified variable and rule block, and you should only use the temporaries directly in the rule block or they won't get rebound. Now, using REBOL 3's closure (probably better):

    use [var ...] [rule ...] ==> (tmp1: do closure [/local var ...] [[rule ...]]) tmp1

REBOL's existing function recursion support wouldn't work because the function returns before the rule is run.

I would prefer a native implementation of this operation if possible.

The use operation above would be a good semantic model for parse rule closures with recursion-safe temporaries. Imagine a new datatype called rule!, a parse rule block bundled with a recursion-safe context for local variables. You would create one with a mezzanine like this:

    parse-rule: func [locals [block!] rule [block!]] [make rule! reduce [locals rule]]

It would be the equivalent of a function made by the HAS mezzanine - local variables, no parameters. The rule would be prebound to the context and the context would be fixed up on recursion just like function contexts are. Any time parse would accept a rule block! it would also accept a rule! value.

Now, that USE operation above was just giving a semantic model - it is too slow to use as-is. To be practical it would have to be implemented as the rule! type (whatever you want to call it) for efficiency. If you allowed a rule! to take parameters (that would take some significant changes to parse) you could do some really interesting guru stuff that even Perl 6 couldn't match, but that may be a feature for another day. For now, all I am suggesting is a bundled context that would be fixed up to be recursion-safe, just like functions are.

12. PARSE handling FUNCTION! values

To avoid adding a new datatype like Brian proposes, I would make PARSE handle FUNCTION! values, since a FUNCTION is already a block with a context, and they can already handle recursion correctly. Basically, the following code:

    rule: does ['a 'b]

    parse block [rule]

would be the same as:

    rule: ['a 'b]

    parse block [rule]

The advantage is that you can then write:

    rule: has [val] [set val number! (print val)]

    parse [1 2 3] [some rule]

which would even be recursion safe (as Brian asks for RULE!). If we even allow arguments, we can then implement most of the commands suggested above as functions, for example:

    remove: func [rule /local tmp1 tmp2] [tmp1: rule tmp2: :tmp1 (remove/part :tmp1 :tmp2)]

and so on. This will probably need more discussion.

13. DO

I'm leaving my DO proposal last because it is debatable, and although I'd use it a lot I don't want to push too much for it (as noone else seems to have asked for something like this). The reasoning behind DO is: I think it would make it easier to write dialect parsers; for example, dialects like VID that mix REBOL expressions with the dialect itself would greatly benefit from the DO command. I also think that such kind of mixing is very useful, even if it looks like it could create confusion for the user; if a dialect does not allow the user to directly use REBOL expression, the user will need to use COMPOSE (or other means) to construct the dialect code on the fly; this means consuming more memory and slowing down the process. Also, if the dialect is using PAREN!s (eg. like the PARSE dialect), the user has to use workarounds to avoid COMPOSE evaluating the wrong PAREN! and so on.

The DO command should work similarly to the SET command. (Like the SET command, the DO command makes sense only when parsing a BLOCK!.) But while SET sets a word with the value under the "PARSE cursor" if it matches the subsequent PARSE rule, the DO command should set a word with the value obtained by evaluating the REBOL expression under the PARSE cursor. If the result value does not match the rule immediately following DO, PARSE should raise an ERROR! (something like: "Expecting [pair! | number!], not 10-Jan-2003"); the rule should be matched by PARSE as if the expression was replaced by a block containing the resulting value and the rule was the argument of the INTO command. (I think this is the behavior that makes more sense. I'm very open to suggestions.)

    parse [1 + 1] [do result integer!]
    ; should evaluate the expression "1 + 1" and set the word 'RESULT to
    ; the resulting value (2); the PARSE cursor should be advanced after
    ; the expression (i.e. to the tail of the block, in this case).

    circle-rule: [
            do center pair!
            do radius number! (radius: to-integer radius + 0.5)

    parse [circle as-pair x y d / 2] circle-rule
    ; should evaluate "as-pair x y" and set 'CENTER to the result,
    ; then should evaluate "d / 2" and set 'RADIUS to the result, etc.

    parse [circle 10 20] circle-rule
    ; should raise an ERROR! like:
    ; ** Parse Error: Expecting pair!, not 10
    ; ** Near: 10 20
    ; (i.e. with the NEAR field reporting the position of the PARSE cursor
    ; before the evaluation)

    parse [circle 5x5 7 / 0] circle-rule
    ; ** Math Error: Attempt to divide by zero
    ; ** Near: 7 / 0

13.1 Why raise an ERROR! ?

The choice of raising an error instead of just failing the match in the case the result does not match the rule following DO was made for a reason. Since evaluating expression can have side effects, it would not be a good idea to just fail the match (i.e. like SET does).

While it makes perfect sense to write a rule like:

    circle-parameters: [any [set radius number! | set center pair!]]

it would not be a good idea to write:

    circle-parameters: [any [do radius number! | do center pair!]]

because code such as:

    c: 5x5
    ; ...
    parse [c: c * 2 10] circle-parameters

would actually set CENTER (and C) to 20x20, while the user was likely to expect it being set to 10x10.

A rule like that could be written as:

    circle-parameters: [any [do value [number! (radius: value) | pair! (center: value)]]]

and should raise an error if the result is not a number! nor a pair!, thus preventing PARSE continuing with other rules and evaluating the expression multiple times. If using a temporary word doesn't look a good idea, it would be possible to make DO accept NONE as the word to set with the meaning of "do not set any word, i.e. discard the result", and then write it as:

    circle-parameters: [any [do none [set radius number! | set center pair!]]]

taking advantage of the way I have defined the behavior of the rule following DO.

MakeDoc3 by REBOL - 9-May-2006