REBOL [
    Title: "Pattern matching"
    Purpose: {
        Defines a function to perform pattern matching on strings.
    }
    Author: "Gabriele Santilli"
    EMail: giesse@rebol.it
    File: %pattern-matching.r
    License: {
Copyright (c) 2003, Gabriele Santilli
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

* Redistributions of source code must retain the above copyright
  notice, this list of conditions and the following disclaimer. 
  
* Redistributions in binary form must reproduce the above
  copyright notice, this list of conditions and the following
  disclaimer in the documentation and/or other materials provided
  with the distribution. 

* The name of Gabriele Santilli may not be used to endorse or
  promote products derived from this software without specific
  prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
    }
    Date: 25-Oct-2004
    Version: 1.3.0 ; majorv.minorv.status
                   ; status: 0: unfinished; 1: testing; 2: stable
    History: [
        25-Oct-2004 1.1.0 "History start"
        25-Oct-2004 1.2.0 "First version"
        25-Oct-2004 1.3.0 "Added documentation"
    ]
]

; comment the following line if you are using autodoc.r
;#do [document: func [text] [none]]

#do [document {
===Pattern matching (pattern-matching.r)

This script provides the MATCH function, that can be used to perform
pattern matching on strings.

The pattern is a block of strings and words. The words 'NEWLINE,
'SKIP and 'IGNORE are special, all the other words are treated as
placeholders. The strings are matched directly to the string passed
as argument to MATCH.

'SKIP and 'IGNORE are synonyms; they allow ignoring a part of the
string being matched. 'NEWLINE indicates the newline character.

Words are treated as "named wildcards". They match any content,
and can be reused to refer to the content they matched.

Examples:
    ; an empty block is not a valid pattern!
    >> match/all "My name is Gabriele. Gabriele is my name." []
    ** User Error: Invalid pattern
    ** Near: match/all "My name is Gabriele. Gabriele is my name." []

    ; "String" does not match the input string
    >> match/all "My name is Gabriele. Gabriele is my name." ["String"]
    == none

    ; This time we have a match, but no "variable" has been matched.
    >> match/all "My name is Gabriele. Gabriele is my name." ["My name is Gabriele. Gabriele is my name."]
    == []

    ; Again, we have a match. To be able to match the string, the word
    ; 'name has to be given the value "Gabriele".
    >> match/all "My name is Gabriele. Gabriele is my name." ["My name is " name ". Gabriele is my name."]
    == [name "Gabriele"]

    ; As above, to match the string 'name has to refer to "Gabriele".
    >> match/all "My name is Gabriele. Gabriele is my name." ["My name is " name ". " name " is my name."]
    == [name "Gabriele"]

    ; There is no match, because there is no way to give a value to 'name
    ; that will match the whole string.
    >> match/all "My name is Gabriele. Luca is my name." ["My name is " name ". " name " is my name."]
    == none

    ; You can match the above by using two different words.
    >> match/all "My name is Gabriele. Luca is my name." ["My name is " name ". " name2 " is my name."]
    == [name "Gabriele" name2 "Luca"]

Note that the implementation is currently basic. It doesn't support
backtracking, so patterns are matched "linearly". This means that
some patterns might not match even if there is a way to assign values
to words so that you can reproduce the input string. This will be
fixed if there is enough request.
}]

context [
    literal: [
        [   set item string! 
          | 'newline (item: newline) 
          | set item seen-word (item: select/skip words item 2)
        ] 
        (add-literal item add-literal: :add-literal-norm)
    ]
    word: [['skip | 'ignore] (add-literal: :add-literal-to) | set item word! (add-word item)]
    element-after-word: [literal (element: element-after-literal)]
    element-after-literal: [literal | word (element: element-after-word)]
    item: none
    element: none
    rule: [ ]
    words: [ ]
    seen-word: ['newline]
    result: none
    last-word: last-sym: none
    
    add-literal-to:
        func [item] [insert insert insert tail rule 'to item item]
    add-literal-to-app:
        func [item] [
            insert insert/only insert insert tail rule 
                'to item 
                to paren! compose [insert insert tail result (to lit-word! last-word) (last-sym)] 
                item
        ]
    add-literal-norm:
    add-literal:
        func [item] [insert tail rule item]

    add-word:
        func [item /local sym] [
            sym: use [symbol] ['symbol]
            insert insert tail rule 'copy sym
            insert insert tail seen-word '| to lit-word! item
            insert insert tail words item sym
            add-literal: :add-literal-to-app
            last-word: item last-sym: sym
        ]
    
    set 'match
        func [
            "Tries to match the pattern to the string" [catch]
            string [any-string!]
            pattern [block!]
            /all "Parses all chars including spaces"
            /case "Use case-sensitive comparison"

            /local path
        ] [
            clear rule
            clear words
            seen-word: copy ['newline]
            result: make block! 8
            add-literal: :add-literal-norm
            element: [word (element: element-after-word) | literal (element: element-after-literal)]
            if not parse pattern [some [element]] [throw make error! "Invalid pattern"]
            add-literal 'end
            path: to path! 'parse
            if all [insert tail path 'all]
            if case [insert tail path 'case]
            if do reduce [path string rule] [result]
        ]
]