REBOL [ Title: "Pattern matching" Purpose: { Defines a function to perform pattern matching on strings. } Author: "Gabriele Santilli" EMail: giesse@rebol.it File: %pattern-matching.r License: { Copyright (c) 2003, Gabriele Santilli All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * The name of Gabriele Santilli may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. } Date: 25-Oct-2004 Version: 1.3.0 ; majorv.minorv.status ; status: 0: unfinished; 1: testing; 2: stable History: [ 25-Oct-2004 1.1.0 "History start" 25-Oct-2004 1.2.0 "First version" 25-Oct-2004 1.3.0 "Added documentation" ] ] ; comment the following line if you are using autodoc.r ;#do [document: func [text] [none]] #do [document { ===Pattern matching (pattern-matching.r) This script provides the MATCH function, that can be used to perform pattern matching on strings. The pattern is a block of strings and words. The words 'NEWLINE, 'SKIP and 'IGNORE are special, all the other words are treated as placeholders. The strings are matched directly to the string passed as argument to MATCH. 'SKIP and 'IGNORE are synonyms; they allow ignoring a part of the string being matched. 'NEWLINE indicates the newline character. Words are treated as "named wildcards". They match any content, and can be reused to refer to the content they matched. Examples: ; an empty block is not a valid pattern! >> match/all "My name is Gabriele. Gabriele is my name." [] ** User Error: Invalid pattern ** Near: match/all "My name is Gabriele. Gabriele is my name." [] ; "String" does not match the input string >> match/all "My name is Gabriele. Gabriele is my name." ["String"] == none ; This time we have a match, but no "variable" has been matched. >> match/all "My name is Gabriele. Gabriele is my name." ["My name is Gabriele. Gabriele is my name."] == [] ; Again, we have a match. To be able to match the string, the word ; 'name has to be given the value "Gabriele". >> match/all "My name is Gabriele. Gabriele is my name." ["My name is " name ". Gabriele is my name."] == [name "Gabriele"] ; As above, to match the string 'name has to refer to "Gabriele". >> match/all "My name is Gabriele. Gabriele is my name." ["My name is " name ". " name " is my name."] == [name "Gabriele"] ; There is no match, because there is no way to give a value to 'name ; that will match the whole string. >> match/all "My name is Gabriele. Luca is my name." ["My name is " name ". " name " is my name."] == none ; You can match the above by using two different words. >> match/all "My name is Gabriele. Luca is my name." ["My name is " name ". " name2 " is my name."] == [name "Gabriele" name2 "Luca"] Note that the implementation is currently basic. It doesn't support backtracking, so patterns are matched "linearly". This means that some patterns might not match even if there is a way to assign values to words so that you can reproduce the input string. This will be fixed if there is enough request. }] context [ literal: [ [ set item string! | 'newline (item: newline) | set item seen-word (item: select/skip words item 2) ] (add-literal item add-literal: :add-literal-norm) ] word: [['skip | 'ignore] (add-literal: :add-literal-to) | set item word! (add-word item)] element-after-word: [literal (element: element-after-literal)] element-after-literal: [literal | word (element: element-after-word)] item: none element: none rule: [ ] words: [ ] seen-word: ['newline] result: none last-word: last-sym: none add-literal-to: func [item] [insert insert insert tail rule 'to item item] add-literal-to-app: func [item] [ insert insert/only insert insert tail rule 'to item to paren! compose [insert insert tail result (to lit-word! last-word) (last-sym)] item ] add-literal-norm: add-literal: func [item] [insert tail rule item] add-word: func [item /local sym] [ sym: use [symbol] ['symbol] insert insert tail rule 'copy sym insert insert tail seen-word '| to lit-word! item insert insert tail words item sym add-literal: :add-literal-to-app last-word: item last-sym: sym ] set 'match func [ "Tries to match the pattern to the string" [catch] string [any-string!] pattern [block!] /all "Parses all chars including spaces" /case "Use case-sensitive comparison" /local path ] [ clear rule clear words seen-word: copy ['newline] result: make block! 8 add-literal: :add-literal-norm element: [word (element: element-after-word) | literal (element: element-after-literal)] if not parse pattern [some [element]] [throw make error! "Invalid pattern"] add-literal 'end path: to path! 'parse if all [insert tail path 'all] if case [insert tail path 'case] if do reduce [path string rule] [result] ] ]