MakeDoc 3 - WeTan, the literate programming backend (XHTML + REBOL)

Date	Version	Description	Author
2-Mar-2006	1.1.0	History start
19-Mar-2006	1.2.0	Converting to RLP format
20-Mar-2006	1.3.0	Converting to RLP format, adding code formatting
21-Mar-2006	1.4.0	Converting to RLP format, finished code formatting
22-Mar-2006	1.5.0	Converting to RLP format, adding document emitter
22-Mar-2006	1.6.0	Converting to RLP format, added build-doc
22-Mar-2006	1.7.0	Converting to RLP format, added temporary code to allow bootstrap
30-Mar-2006	1.8.0	Added support for #[...] (Brian's suggestion), changed code parsing rules (also fixes bugs)
30-Mar-2006	1.9.0	Added formatting for path! and keywords
12-May-2006	1.10.0	Added formatting for #lit and #include
12-May-2006	1.11.0	Added header formatting

3. Generate REBOL code starting from a given section

Given the name of a code section, start, we want to generate a REBOL block, starting from the section referred to by this name and resolving all references to other sections. Special directives also need to be processed here; formatting directives should be removed.

The parse-code-block function is used to parse a code block (and, recursively, any sub-block) and do all the processing; this function returns a new block that is the result of this processing.

Note that we return none if the given section name is not found.

〈Generate REBOL code starting from a given section〉 ≡

if section-data: select code-sections start [
parse-code-block section-data/code
]

3.1 generate-code's locals

Since this is the code for the generate-code function, we want to add the words we're using to the list of locals for that function.

〈generate-code's locals〉 ≡

section-data

3.2 Parse code and build result

Our parse rule analyzes each value in the block. The following cases are handled:

the value is a word, and it is the name of a code section; in this case, the block for that code section is parsed and its values are processed;
the value is the issue #lit or #literal; the next value is appended to the result without any processing;
the value is the issue #include followed by a file! or url!; the file or url is loaded and the contents are appended to the result (notice that if the file cannot be loaded, an error is produced);
the value is a sub-block or paren; parse-code-block is called recursively and the its return value is appended to result as a sub-block;
otherwise, the value is appended to the result.

〈Parse code and build result〉 ≡

parse :code rule: [
some [
  set word word! (either code: select code-sections word [parse code/code rule] [append result word])
  |
  [#lit | #literal] set value skip (append/only result :value)
  |
  #include set value [file! | url!] (append result load value)
  |
  set code [block! | paren!] (append/only result parse-code-block code)
  |
  set value skip (append/only result :value)
]
]

3.2.1 parse-code-block's locals

We want to add the words we're using to the list of locals for the parse-code-block function.

〈parse-code-block's locals〉 ≡

rule word value

3.3 Definition of parse-code-block

〈Definition of parse-code-block〉 ≡

parse-code-block: func [code [any-block!] /local result 〈parse-code-block's locals〉] [
result: make :code length? :code
〈Parse code and build result〉
result
]

4. Format a code section into XHTML

We have seen how to generate REBOL code from a code section; now we'll look at how to generate XHTML from it so that it can be rendered in a pleasant way by the browser, provided some CSS. We will use a technique similar to that used by Carl Sassenrath's %color-code.r.

If the first value of the code is a set-word!, it is assumed to be the section name, and it is treated specially. Then code is detab-ed, and parsed as a sequence of one or more lines of REBOL code. Note that if code is not valid REBOL code, then an error will eventually occurr; the caller is supposed to recover from this case.

〈Format a code section into XHTML〉 ≡

trim/auto code
emit <div class="code">
〈Emit the code section name, if present〉
emit <p>
parse/all detab code [
some [any tabs rebol-code]
]
emit </p>
emit </div>

4.1 Emit the code section name, if present

If the first value in the code is a set-word!, then it is the name of the section we are defining; we want to emit the section title in angle brackets, followed by a "is defined as" sign (HTML entity &#8801, that we can approximate in ASCII with "="). If we are appending to this section (i.e., this is not the first time we emit the section), then we want the section name to be followed by "+=" instead of just "=". For this reason we keep the visited flag in section-data.

The section name is usually (and should always be) followed by a newline, so we also skip it here. (Notice that we are not using parse/all so trailing spaces will be skipped as well.)

〈Emit the code section name, if present〉 ≡

set [name code] load/next code
either all [set-word? :name section-data: select code-sections to word! name] [
emit [
  <p class="sectdef">
  <span class="bra"> "〈" </span>
  section-data/title
  <span class="bra"> "〉" </span> " "
  either section-data/visited ["+"] [""]
  "≡"
  </p>
]
section-data/visited: yes
; skip any space and the newline char
parse code [opt newline code:]
] [
code: head code
]

4.1.1 format-code-section's locals

〈format-code-section's locals〉 ≡

name section-data

4.2 Code formatting rules

This section defines the parse rules that are used to format REBOL code (see 〈Format a code section into XHTML〉). The tabs rule consumes the spaces at the beginning of each line, and emits visual tabs (the CSS can be then used to control the size of the tab and so on; for example the default CSS renders a left border for tabs so that to have a light dotted line showing matching opening and closing brackets visually). We assume that the standard REBOL style of exactly four spaces per indentation has been used (note that we detab the code before parsing it, so if you just use tabs for indenting you are fine).

The rebol-code rule parses one line of REBOL code (after the indentation). Any remaining spaces at the beginning are ignored; the line is then parsed as a sequence of zero or more REBOL values or opening/closing brackets. (Note, that we don't actually need to recurse in case of brackets. It's not our job to enforce balancing; if the brackets are not balanced, the code will not load and thus the "tangle" process will fail; so, we can assume that the code we are parsing is valid.) Note that we treat a serialized value like a sub-block. (This change has been suggested by Brian Hawley.) The line can end with a comment introduced by ";". rebol-code also consumes the newline at the end of the line and emits a <br /> tag for it (earlier versions didn't do this, but that was trickier for no real benefit, so we decided to change it).

The rebol-value rule handles the #literal and #include commands, removing the former and formatting the latter specially; otherwise it just uses load/next to parse REBOL values.

〈Code formatting rules〉 ≡

tabs: [4 " " (emit {<span class="tab"> </span>})]
val: none
rebol-code: [
any " " some [
  newline (emit <br />) break
  |
  copy val [";" [to newline | to end]]
  (if val [emit [<span class="comment"> escape-html val </span>]])
  opt [newline (emit <br />)] break
  |
  copy val ["[" | "#[" | "(" | ")" | "]"] (emit val) opt [some " " (emit " ")]
  |
  rebol-value opt [some " " (emit " ")]
]
]
here: none
rebol-value: [
["#lit " | "#literal "] here: skip (set [val here] load/next here emit-value :val) :here
|
"#include " here: skip (set [val here] load/next here emit-include :val) :here
|
here: skip (set [val here] load/next here emit-value :val) :here
]
emit-value: func [value /local 〈emit-value's locals〉] [
〈Emit a REBOL value〉
]
emit-include: func [dest /local 〈emit-include's locals〉] [
〈Emit the #include directive〉
]

4.3 Emit a REBOL value

When emitting a value, we check if we have a specialized emitter for its type in the type-emitters object; in this case we use this emitter function. Otherwise, we just emit the molded value, giving it its datatype as the HTML class name. (This makes it easy to style each datatype with the CSS.)

〈Emit a REBOL value〉 ≡

either special: in type-emitters type?/word :value [
do get special :value
] [
emit [{<span class="} form type? :value {">} escape-html mold :value </span>]
]

4.3.1 emit-value's locals

〈emit-value's locals〉 ≡

special

4.3.2 Functions for emitting values depending on type

The type-emitters object contains a function for each datatype that we must treat specially when emitting.

For words, we must check if it is a section name. In this case, we have a section reference, and we emit it as the section title in angle brackets, with a link to the section itself. Otherwise the word is emitted with a two HTML class values and an HTML title; the first class is "word", and the second is taken from the keywords block, which defines a special class for words that need special rendering (such as datatypes, none and so on, or functions). The title comes from the keywords block too, and is usually used by browsers to show a tool-tip; so we put useful info there, for example the help for a function.

Paths are emitted by emitting all the components separately; this allows rendering each value in the path correctly.

〈Code formatting rules〉 +≡

type-emitters: context [
word!: func [value /local section-data subclass title] [
  if section-data: select code-sections value [
   emit [
    <span class="ref">
    <span class="bra"> "〈" </span>
    {<a href="#} section-data/id {">}
    section-data/title
    </a>
    <span class="bra"> "〉" </span>
    </span>
   ]
   exit
  ]
  either set [subclass title] select keywords value [
   subclass: join " " subclass
   title: rejoin [{title="} escape-html title {"}]
  ] [
   subclass: title: ""
  ]
  emit [{<span class="word} subclass {"} title ">" escape-html mold value </span>]
]
path!: func [value] [
  emit <span class="path">
  ; should always be a word, but we don't assume it
  emit-value first value
  foreach element next value [
   emit "/"
   emit-value element
  ]
  emit </span>
]
]

4.3.3 The keywords block

The keywords block has the format some [word! into [string! string!]], with the first string being the subclass (e.g., "key", "type", and so on), and the second being the title (e.g., help text).

〈Code formatting rules〉 +≡

keywords: [
#include %keyword-list.r
]

4.4 Emit the #include directive

At this point, dest should contain the file! or url! of the file that will be included here. (If dest is of a different type, we emit an error message.) We'll emit a link to the file documentation (if it exists - we assume it has the same name of the included file, with the suffix replaced with %.html), or to the file itself. (We avoid checking for the existence of an HTML file in case dest is an url!. You may want to change this if you include from urls a lot and you have no problems with exists? being called on them.)

〈Emit the #include directive〉 ≡

either any [file? :dest url? :dest] [
target: copy dest
if file? target [
  either target: find/last target %. [
   target: head change/part next target %html tail target
  ] [
   target: join target %.html
  ]
  if not exists? target [target: dest]
]
emit [
  <span class="directive"> "#include "
  {<a href="} escape-html target {">}
]
emit-value dest
emit "</a></span>"
] [
emit [
  <span class="include-error"> "Cannot use "
  <span class="directive"> "#include" </span>
  " with the value "
]
emit-value mold :dest
emit "!</span>"
]

4.4.1 emit-include's locals

〈emit-include's locals〉 ≡

target

4.5 Definition of the code formatting functions

To format code, we need two functions; one is used to format code sections; the other is used to format code embedded in text. In the last case code is assumed to be only one line. (It is not being trimmed because it could contain a string; we don't think it makes sense to have a multiline string in this case, but the code sort-of handles the case.)

〈Definition of the code formatting functions〉 ≡

〈Code formatting rules〉
format-code-section: func [code [string!] /local 〈format-code-section's locals〉] [
〈Format a code section into XHTML〉
]
format-code: func [code [string!]] [
emit <span class="code">
parse/all detab code rebol-code
emit </span>
]

5. The code emitter

The code emitter consists in the generate-code function, and its helper function parse-code-block.

〈The code emitter〉 ≡

〈Definition of parse-code-block〉
generate-code: func [start [word!] /local 〈generate-code's locals〉] [
〈Generate REBOL code starting from a given section〉
]

6. The documentation emitter

The XHTML emitter for documentation is not much different from other MD3 emitters. For this reason, this is the least interesting part of this program. (It still does not come last, since we don't really know what we need to do in the first pass until we know what we need to have in the second pass.)

Here we define the initial state of the state machine for the emitter, which just skips the RLP header, emits it and moves to the normal state. (The header has been already collected during the first pass.) We also define the inline processing rules; we use format-code to format embedded REBOL code in the text, and we handle strong and emphasized text too. The embedded dialect (like "=[example]") is not yet implemented.

〈The documentation emitter〉 ≡

; initial state for the FSM
initial: [
; skip title and header
title: sect1: (emit-header) discard-header
code: (emit-header) normal
options: ( )
default: (emit-header) continue normal
]
discard-header: [
code: normal
options: ( )
default: continue normal
]
; inline processing
inline: [
normal: (emit escape-html data)
word: (format-code data)
strong: (emit [<strong> escape-html data </strong>])
emph: (emit [<em> escape-html data </em>])
rebol: (process-rebol data)
code: (format-code data)
]
process-rebol: func [block] [
emit "[Not yet implemented]"
]
〈Definition of the code formatting functions〉

6.1 The emit-header function

The emit-header function formats the REBOL header into XHTML. If there's an history block, it is formatted as a table; the license should be formatted by using MakeDoc recursively, but this is not implemented yet. This function also emits the table of contents.