Structure matching and rewriting engine Purpose: { Implements a structure matching and rewriting engine using PARSE. } Author: "Gabriele Santilli" EMail: giesse@rebol.it File: %rewrite.r License: { Copyright (c) 2006, Gabriele Santilli All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * The name of Gabriele Santilli may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. } Date: 17-May-2006 Version: 1.2.1 ; majorv.minorv.status ; status: 0: unfinished; 1: testing; 2: stable History: [ 17-May-2006 1.1.0 "History start" 17-May-2006 1.2.1 "First version" ] ===Introduction Structure matching is the ability to match a pattern over a data structure. By using block parsing, it is possible to easily implement structure matching with parse rules. It also becomes easy to implement a rewriting engine that replaces a substructure matched by a pattern with another. Such an engine can be very useful in some special (but very interesting) cases, such as tree rewriting (which is a technique often used by optimizing compilers). ===Overview We provide two functions, |match| and |rewrite|. (The latter is based on the former.) The former takes a block or string as input, and tries to find a match for the |parse| rule given as second argument; it returns |true| if a match is found, otherwise returns false. (Since the pattern is a |parse| rule, you can use parens to apply any code to the matches.) |rewrite| takes a block or string as input, and modifies it according to the rules given as second argument; these consist of a list of pairs of blocks. The first block of each pair is a |parse| rule to use as a pattern, while the second block is |compose|'ed and replaces the content matched by the pattern whenever it matches. This rewrite process is iterative, and is repeated until no pattern in the rules matches. (So, be careful as it's easy to get infinite loops, e.g. if your pattern always matches the second block.) -main-: match: func [ "Match a pattern over data" data [block! string!] "Data to match the pattern to" rule [block!] "PARSE rule to use as pattern" /local -m-locals- ] [ -match-code- ] rewrite: func [ "Apply a list of rewrite rules to data" data [block! string!] "Data to change" rules [block!] "List of rewrite rules" /trace "Trace rewriting process (for debugging)" /local -r-locals- ] [ -rewrite-code- ] ===Match the |parse| rule |rule| over |data| We use |result| to hold the result of the match, which is |false| by default and becomes |true| as soon as we find one occurrence of the pattern. If |data| is a block, our |parse| rule needs to recurse into subblocks too; if it's a string instead, the |into| command is invalid; so we need two separate rules for strings and blocks. As you see, |rule| is applied repeatedly over |data|. Note that it does not stop when a match is found, but it will look for other matches too; this is useful if you have code in the parse rule (for example you could collect all matches). -match-code-: result: false recurse: either block? data [ [ some [ rule (result: true) | into recurse | skip ] ] ] [ [ some [ rule (result: true) | skip ] ] ] parse data recurse result ---|match|'s locals -m-locals-: result recurse ---Example An interesting example use of |match| is a |collect| function that collects all matches of a giver rule: -example1-: collect: func [output data rule /local x] [ match data [copy x rule (append/only output x)] output ] ===Apply the rewrite rules to |data| We need to apply the given rewrite rules (list of pairs of blocks in |rules|) to |data|. If |rules| is empty, we have nothing to do, so we just return |data|. Otherwise, we create a |parse| rules containing all the patterns in the list, in order. The pair |[rule] [production]| is added to the |parse| rule |rules*| as |[rule] (prod: compose/deep [production]) \||; this means |rules*| becomes a list of alternatives in the form |[... \| ... \| ... \|]|, and we need to remove the last |\||. Then, we use |match| with a pattern that tries to match |rules*|, and replaces any match with |prod| (that has been computed in |rules*|). After replacing the |parse| cursor is set back to the beginning of the change, so as to apply the rules again on it. The call to |match| is repeated until it returns |false| (nothing more to rewrite). If the |/trace| refinement has been used, each iteration of rewriting is printed on screen and a step-by-step tracing is performed. -rewrite-code-: if empty? rules [return data] rules*: make block! 16 foreach [pattern production] rules [ insert insert/only insert/only tail rules* pattern make paren! compose/only [ prod: compose/deep (production) ] '| ] remove back tail rules* until [ if trace [probe data ask "? "] not match data [mk1: rules* mk2: (change/part mk1 prod mk2) :mk1] ] data ---|rewrite|'s locals -r-locals-: rules* prod mk1 mk2 ---Examples -example2-: digits: charset "1234567890" reverse rewrite reverse "1234567890" [[x: 4 digits] [(copy/part x 3) "," (pick x 4)]] ; rewrites "1234567890" to "1,234,567,890" rules: [ ['lit set x number!] [push (x)] ['plus into ['lit 1 1 0] set x block!] [(x)] ['plus set x block! into ['lit 1 1 0]] [(x)] ['mult into ['lit 1 1 1] set x block!] [(x)] ['mult set x block! into ['lit 1 1 1]] [(x)] ['plus set x block! set y block!] [(x) (y) add] ['mult set x block! set y block!] [(x) (y) mul] ] test: [ plus [plus [mult [lit 3] [lit 1]] [lit 2]] [lit 0] ] rewrite test rules ; rewrites test into [push 3 push 2 add]