Contents:

1. Introduction

This program is an emitter for MakeDoc3 that serves as a literate programming tool. It treats MD3 documents specially (we'll call such MD3 documents REBOL Literate Programs or RLPs), and it is able to emit both the REBOL code for a program and its documentation (so it merges the functionality of the WEAVE and TANGLE tools available in WEB).

The basic idea behind this is that a RLP is just like any other MakeDoc document; this emitter, however, treats code sections (also known as "example" or "indented" sections) specially. They are always assumed to be valid REBOL (i.e. load-able), and they are parsed according to special rules (so you can call this a REBOL dialect) to rearrange them into a complete REBOL program, and do other processing (for example, handle localization automatically). Code is also rendered in a pleasant to read way in the XHTML output file (the style is actually fully controlled by CSS; WeTan just makes sure to provide the correct markup for the code).

2. Overview

This is a top view of the emitter. (See MakeDoc3's documentation for more details on make-emitter.)

Overview

wetan-emitter: make-emitter [
 Initialization and first pass
 The documentation emitter
 The code emitter
 The build-doc function
]

We'll proceed with the most interesting parts first (so that readers can get to the nice stuff immediately, and worry about the details only if they need to).

3. Generate REBOL code starting from a given section

Given the name of a code section, start, we want to generate a REBOL block, starting from the section referred to by this name and resolving all references to other sections. Special directives also need to be processed here; formatting directives should be removed.

The parse-code-block function is used to parse a code block (and, recursively, any sub-block) and do all the processing; this function returns a new block that is the result of this processing.

Note that we return none if the given section name is not found.

Generate REBOL code starting from a given section

if section-data: select code-sections start [
 parse-code-block section-data/code
]

3.1 generate-code's locals

Since this is the code for the generate-code function, we want to add the words we're using to the list of locals for that function.

generate-code's locals

section-data

3.2 Parse code and build result

Our parse rule analyzes each value in the block. The following cases are handled:

  1. the value is a word, and it is the name of a code section; in this case, the block for that code section is parsed and its values are processed;
  2. the value is the issue #lit or #literal; the next value is appended to the result without any processing;
  3. the value is the issue #include followed by a file! or url!; the file or url is loaded and the contents are appended to the result (notice that if the file cannot be loaded, an error is produced);
  4. the value is a sub-block or paren; parse-code-block is called recursively and the its return value is appended to result as a sub-block;
  5. otherwise, the value is appended to the result.

Parse code and build result

parse :code rule: [
 some [
  set word word! (either code: select code-sections word [parse code/code rule] [append result word])
  |
  [#lit | #literal] set value skip (append/only result :value)
  |
  #include set value [file! | url!] (append result load value)
  |
  set code [block! | paren!] (append/only result parse-code-block code)
  |
  set value skip (append/only result :value)
 ]
]

3.2.1 parse-code-block's locals

We want to add the words we're using to the list of locals for the parse-code-block function.

parse-code-block's locals

rule word value

3.3 Definition of parse-code-block

Definition of parse-code-block

parse-code-block: func [code [any-block!] /local result parse-code-block's locals] [
 result: make :code length? :code
 Parse code and build result
 result
]

4. Format a code section into XHTML

We have seen how to generate REBOL code from a code section; now we'll look at how to generate XHTML from it so that it can be rendered in a pleasant way by the browser, provided some CSS. We will use a technique similar to that used by Carl Sassenrath's %color-code.r.

If the first value of the code is a set-word!, it is assumed to be the section name, and it is treated specially. Then code is detab-ed, and parsed as a sequence of one or more lines of REBOL code. Note that if code is not valid REBOL code, then an error will eventually occurr; the caller is supposed to recover from this case.

Format a code section into XHTML

trim/auto code
emit <div class="code">
Emit the code section name, if present
emit <p>
parse/all detab code [
 some [any tabs rebol-code]
]
emit </p>
emit </div>

4.1 Emit the code section name, if present

If the first value in the code is a set-word!, then it is the name of the section we are defining; we want to emit the section title in angle brackets, followed by a "is defined as" sign (HTML entity &#8801, that we can approximate in ASCII with "="). If we are appending to this section (i.e., this is not the first time we emit the section), then we want the section name to be followed by "+=" instead of just "=". For this reason we keep the visited flag in section-data.

The section name is usually (and should always be) followed by a newline, so we also skip it here. (Notice that we are not using parse/all so trailing spaces will be skipped as well.)

Emit the code section name, if present

set [name code] load/next code
either all [set-word? :name section-data: select code-sections to word! name] [
 emit [
  <p class="sectdef">
  <span class="bra"> "&#9001;" </span>
  section-data/title
  <span class="bra"> "&#9002;" </span> " "
  either section-data/visited ["+"] [""]
  "&#8801;"
  </p>
 ]
 section-data/visited: yes
 ; skip any space and the newline char
 parse code [opt newline code:]
] [
 code: head code
]

4.1.1 format-code-section's locals

format-code-section's locals

name section-data

4.2 Code formatting rules

This section defines the parse rules that are used to format REBOL code (see Format a code section into XHTML). The tabs rule consumes the spaces at the beginning of each line, and emits visual tabs (the CSS can be then used to control the size of the tab and so on; for example the default CSS renders a left border for tabs so that to have a light dotted line showing matching opening and closing brackets visually). We assume that the standard REBOL style of exactly four spaces per indentation has been used (note that we detab the code before parsing it, so if you just use tabs for indenting you are fine).

The rebol-code rule parses one line of REBOL code (after the indentation). Any remaining spaces at the beginning are ignored; the line is then parsed as a sequence of zero or more REBOL values or opening/closing brackets. (Note, that we don't actually need to recurse in case of brackets. It's not our job to enforce balancing; if the brackets are not balanced, the code will not load and thus the "tangle" process will fail; so, we can assume that the code we are parsing is valid.) Note that we treat a serialized value like a sub-block. (This change has been suggested by Brian Hawley.) The line can end with a comment introduced by ";". rebol-code also consumes the newline at the end of the line and emits a <br /> tag for it (earlier versions didn't do this, but that was trickier for no real benefit, so we decided to change it).

The rebol-value rule handles the #literal and #include commands, removing the former and formatting the latter specially; otherwise it just uses load/next to parse REBOL values.

Code formatting rules

tabs: [4 " " (emit {<span class="tab">&nbsp;</span>})]
val: none
rebol-code: [
 any " " some [
  newline (emit <br />) break
  |
  copy val [";" [to newline | to end]]
  (if val [emit [<span class="comment"> escape-html val </span>]])
  opt [newline (emit <br />)] break
  |
  copy val ["[" | "#[" | "(" | ")" | "]"] (emit val) opt [some " " (emit " ")]
  |
  rebol-value opt [some " " (emit " ")]
 ]
]
here: none
rebol-value: [
 ["#lit " | "#literal "] here: skip (set [val here] load/next here emit-value :val) :here
 |
 "#include " here: skip (set [val here] load/next here emit-include :val) :here
 |
 here: skip (set [val here] load/next here emit-value :val) :here
]
emit-value: func [value /local emit-value's locals] [
 Emit a REBOL value
]
emit-include: func [dest /local emit-include's locals] [
 Emit the #include directive
]

4.3 Emit a REBOL value

When emitting a value, we check if we have a specialized emitter for its type in the type-emitters object; in this case we use this emitter function. Otherwise, we just emit the molded value, giving it its datatype as the HTML class name. (This makes it easy to style each datatype with the CSS.)

Emit a REBOL value

either special: in type-emitters type?/word :value [
 do get special :value
] [
 emit [{<span class="} form type? :value {">} escape-html mold :value </span>]
]

4.3.1 emit-value's locals

emit-value's locals

special

4.3.2 Functions for emitting values depending on type

The type-emitters object contains a function for each datatype that we must treat specially when emitting.

For words, we must check if it is a section name. In this case, we have a section reference, and we emit it as the section title in angle brackets, with a link to the section itself. Otherwise the word is emitted with a two HTML class values and an HTML title; the first class is "word", and the second is taken from the keywords block, which defines a special class for words that need special rendering (such as datatypes, none and so on, or functions). The title comes from the keywords block too, and is usually used by browsers to show a tool-tip; so we put useful info there, for example the help for a function.

Paths are emitted by emitting all the components separately; this allows rendering each value in the path correctly.

Code formatting rules +≡

type-emitters: context [
 word!: func [value /local section-data subclass title] [
  if section-data: select code-sections value [
   emit [
    <span class="ref">
    <span class="bra"> "&#9001;" </span>
    {<a href="#} section-data/id {">}
    section-data/title
    </a>
    <span class="bra"> "&#9002;" </span>
    </span>
   ]
   exit
  ]
  either set [subclass title] select keywords value [
   subclass: join " " subclass
   title: rejoin [{title="} escape-html title {"}]
  ] [
   subclass: title: ""
  ]
  emit [{<span class="word} subclass {"} title ">" escape-html mold value </span>]
 ]
 path!: func [value] [
  emit <span class="path">
  ; should always be a word, but we don't assume it
  emit-value first value
  foreach element next value [
   emit "/"
   emit-value element
  ]
  emit </span>
 ]
]

4.3.3 The keywords block

The keywords block has the format some [word! into [string! string!]], with the first string being the subclass (e.g., "key", "type", and so on), and the second being the title (e.g., help text).

Code formatting rules +≡

keywords: [
 #include %keyword-list.r
]

4.4 Emit the #include directive

At this point, dest should contain the file! or url! of the file that will be included here. (If dest is of a different type, we emit an error message.) We'll emit a link to the file documentation (if it exists - we assume it has the same name of the included file, with the suffix replaced with %.html), or to the file itself. (We avoid checking for the existence of an HTML file in case dest is an url!. You may want to change this if you include from urls a lot and you have no problems with exists? being called on them.)

Emit the #include directive

either any [file? :dest url? :dest] [
 target: copy dest
 if file? target [
  either target: find/last target %. [
   target: head change/part next target %html tail target
  ] [
   target: join target %.html
  ]
  if not exists? target [target: dest]
 ]
 emit [
  <span class="directive"> "#include "
  {<a href="} escape-html target {">}
 ]
 emit-value dest
 emit "</a></span>"
] [
 emit [
  <span class="include-error"> "Cannot use "
  <span class="directive"> "#include" </span>
  " with the value "
 ]
 emit-value mold :dest
 emit "!</span>"
]

4.4.1 emit-include's locals

emit-include's locals

target

4.5 Definition of the code formatting functions

To format code, we need two functions; one is used to format code sections; the other is used to format code embedded in text. In the last case code is assumed to be only one line. (It is not being trimmed because it could contain a string; we don't think it makes sense to have a multiline string in this case, but the code sort-of handles the case.)

Definition of the code formatting functions

Code formatting rules
format-code-section: func [code [string!] /local format-code-section's locals] [
 Format a code section into XHTML
]
format-code: func [code [string!]] [
 emit <span class="code">
 parse/all detab code rebol-code
 emit </span>
]

5. The code emitter

The code emitter consists in the generate-code function, and its helper function parse-code-block.

The code emitter

Definition of parse-code-block
generate-code: func [start [word!] /local generate-code's locals] [
 Generate REBOL code starting from a given section
]

6. The documentation emitter

The XHTML emitter for documentation is not much different from other MD3 emitters. For this reason, this is the least interesting part of this program. (It still does not come last, since we don't really know what we need to do in the first pass until we know what we need to have in the second pass.)

Here we define the initial state of the state machine for the emitter, which just skips the RLP header, emits it and moves to the normal state. (The header has been already collected during the first pass.) We also define the inline processing rules; we use format-code to format embedded REBOL code in the text, and we handle strong and emphasized text too. The embedded dialect (like "=[example]") is not yet implemented.

The documentation emitter

; initial state for the FSM
initial: [
 ; skip title and header
 title: sect1: (emit-header) discard-header
 code: (emit-header) normal
 options: ( )
 default: (emit-header) continue normal
]
discard-header: [
 code: normal
 options: ( )
 default: continue normal
]
; inline processing
inline: [
 normal: (emit escape-html data)
 word: (format-code data)
 strong: (emit [<strong> escape-html data </strong>])
 emph: (emit [<em> escape-html data </em>])
 rebol: (process-rebol data)
 code: (format-code data)
]
process-rebol: func [block] [
 emit "[Not yet implemented]"
]
Definition of the code formatting functions

6.1 The emit-header function

The emit-header function formats the REBOL header into XHTML. If there's an history block, it is formatted as a table; the license should be formatted by using MakeDoc recursively, but this is not implemented yet. This function also emits the table of contents.

The documentation emitter +≡

emit-header: has [tmp] [
 emit <div id="header">
 if in header 'title [
  ; header/title is the result of emit-inline so we don't need escape-html here
  emit [<h1 id="title"> header/title </h1>]
 ]
 if in header 'author [
  emit [<h2 id="author"> escape-html copy header/author]
  if in header 'email [
   emit [
    " &lt;"
    {<a href="mailto:} tmp: escape-html copy header/email {">} tmp </a>
    "&gt;"
   ]
  ]
  emit </h2>
 ]
 if any [in header 'date in header 'version] [
  emit <h2 id="dateversion">
  if in header 'date [
   emit escape-html form header/date
  ]
  if all [in header 'date in header 'version] [
   emit ", "
  ]
  if in header 'version [
   emit escape-html form header/version
  ]
  emit </h2>
 ]
 if in header 'purpose [
  emit [<p id="purpose"> escape-html header/purpose </p>]
 ]
 if in header 'license [
  emit [<div id="license"> escape-html header/license </div>]
 ]
 if all [in header 'history block? header/history] [
  emit [
   <table id="history">
   <thead>
    <tr><th> "Date" </th><th> "Version" </th>
    <th> "Description" </th><th> "Author" </th></tr>
   </thead>
   <tbody>
  ]
  parse header/history [
   some [
    set tmp date! (emit [<tr><td class="date"> escape-html form tmp </td>])
    set tmp tuple! (emit [<td class="version"> escape-html form tmp </td>])
    set tmp string! (emit [<td class="desc"> escape-html copy tmp </td>])
    opt [set tmp word! (emit [<td class="name"> escape-html form tmp </td>])]
    (emit </tr>)
   ]
  ]
  emit [</tbody></table>]
 ]
 emit </div>
 emit toc
]

6.2 Normal state

Many MD3 commands are currently unsupported. They will be added later, as they are less common in RLPs.

The documentation emitter +≡

normal: [
 para: (emit <p> emit-inline data emit </p>)
 sect1: (emit <div class="section"> emit-sect 1 data) in-sect (emit </div>)
 sect2: (emit-sect 2 data)
 sect3: (emit-sect 3 data)
 sect4: (emit-sect 4 data)
 bullet: bullet2: bullet3: (emit <ul>) continue in-bul (emit </ul>)
 enum: enum2: enum3: (emit <ol>) continue in-enum (emit </ol>)
 code: (format-code-section data)
 output: (emit data) ; to output html directly
 define: (emit <dl>) continue in-define (emit </dl>)
 image: (
  emit [
   either data/2 = 'center [<div class="image center">][<div class="image">]
   {<img src="} data/1 {">}
   </div>
  ]
 )
 center-in:
  (emit <div class="center">)
  in-center
  (emit </div>)
 center-out: (error "Unbalanced center-out")
 note-in:
  (emit [<div class="note"><h2>] emit-inline data emit </h2>)
  in-note
  (emit </div>)
 note-out: (error "Unbalanced note-out")
 indent-in:
  (emit <blockquote>)
  in-indent
  (emit </blockquote>)
 indent-out: (error "Unbalanced indent-out")
]

6.3 Sections

The documentation emitter +≡

in-sect: inherit normal [
 sect1: continue return
]

6.4 Bullets

The documentation emitter +≡

in-bul: [
 bullet: (emit <li> emit-inline data emit </li>)
 bullet2: bullet3: (emit <ul>) continue in-bul2 (emit </ul>)
 enum2: enum3: (emit <ol>) continue in-enum2 (emit </ol>)
 default: continue return
]
in-bul2: [
 bullet2: (emit <li> emit-inline data emit </li>)
 bullet3: (emit <ul>) continue in-bul3 (emit </ul>)
 enum3: (emit <ol>) continue in-enum3 (emit </ol>)
 default: continue return
]
in-bul3: [
 bullet3: (emit <li> emit-inline data emit </li>)
 default: continue return
]

6.5 Enumerations

The documentation emitter +≡

in-enum: [
 enum: (emit <li> emit-inline data emit </li>)
 bullet2: bullet3: (emit <ul>) continue in-bul2 (emit </ul>)
 enum2: enum3: (emit <ol>) continue in-enum2 (emit </ol>)
 default: continue return
]
in-enum2: [
 enum2: (emit <li> emit-inline data emit </li>)
 bullet3: (emit <ul>) continue in-bul3 (emit </ul>)
 enum3: (emit <ol>) continue in-enum3 (emit </ol>)
 default: continue return
]
in-enum3: [
 enum3: (emit <li> emit-inline data emit </li>)
 default: continue return
]

6.6 Definition lists

The documentation emitter +≡

in-define: [
 define: (emit-define data)
 default: continue return
]
emit-define: func [data [block!]] [
 if data/1 [
  emit <dt>
  emit-inline data/1
  emit </dt>
 ]
 if data/2 [
  emit <dd>
  emit-inline data/2
  emit <dd>
 ]
]

6.7 Centered sections

The documentation emitter +≡

in-center: inherit normal [
 center-out: return
]

6.8 Notes

The documentation emitter +≡

in-note: inherit normal [
 note-out: return
]

6.9 Indented sections

The documentation emitter +≡

in-indent: inherit normal [
 indent-out: return
]

6.10 Misc

The documentation emitter +≡

escape-html: func [text][
 ; Convert to avoid special HTML chars:
 foreach [from to] html-codes [replace/all text from to]
 text
]
html-codes: ["&" "&amp;" "<" "&lt;" ">" "&gt;"]

sects: 0.0.0.0
clear-sects: does [sects: 0.0.0.0]

next-section: func [level /local bump mask] [
 ; Return next section number. Clear sub numbers.
 set [bump mask] pick [
  [1.0.0.0 1.0.0.0]
  [0.1.0.0 1.1.0.0]
  [0.0.1.0 1.1.1.0]
  [0.0.0.1 1.1.1.1]
 ] level
 level: form sects: sects + bump * mask
 clear find level ".0"
 level
]

7. Initialization and first pass

At this point, we know the data we need to collect during the first pass. First of all, we need a code-sections block holding all the data for code sections, so that select code-sections section-name will return the section data for a given section-name; this section data should be an object, with the fields title, visited, code and id.

We also need to collect the header and store it in the header object. Then, we need to collect the section headers to generate the table of contents.

The init-emitter function is an emitter function that can be defined by users to do any initialization before the state machine starts. Please refer to the MakeDoc3 documentation for more details.

Initialization and first pass

code-sections: [ ]
code-section!: context [
 visited: no
 code: title: id: none
]

init-emitter: func [doc [block!]] [
 Initialize various values
 Do first pass
]

7.1 Initialize various values

One of the values that we need to initialize is code-sections itself.

Initialize various values

clear code-sections

7.2 Do first pass

We just use the FSM for the first pass as well. The fsm-do function is called with first-pass as the initial state. The generated table of contents is stored in toc.

Do first pass

clear-sects
toc: capture [
 emit [<div id="toc"><h2> "Contents:" </h2><ul>]
 fsm-do doc first-pass
 emit [</ul></div>]
]
clear-sects

7.2.1 Definition of the states for the first pass

Our state machine for the first pass will need to process the title and the header of the RLP, then paragraphs of the code kind, and handle the code text to the preprocess-code-section function; section headers will need to be collected for both the table of contents and for use as code sections titles.

first-pass moves to the in-header state after processing the title; in-header then moves to fp-normal after processing the header. In this state we generate the table of contents (|sect3| and sect4 are not displayed in the TOC) using the auxiliary state toc2.

The title text is also stored in title.

Initialization and first pass +≡

toc: none
title: none
last-section-title: last-section-id: none
first-pass: [
 title: sect1: (title: capture [emit-inline data]) in-header
 options: ( ) ; ignore
 default: (title: "Untitled") continue in-header
]
Header handling for the first pass
fp-normal: [
 code: (preprocess-code-section data)
 Section headers handling
]

preprocess-code-section: func [code [string!] /local preprocess-code-section's locals] [
 Preprocess a code section
]

7.2.2 Section headers handling

Section headers handling

sect1: (set-last-section/emit 1 data)
sect2: (emit <ul>) continue toc2 (emit </ul>)
sect3: (set-last-section 3 data)
sect4: (set-last-section 4 data)

7.2.3 Building the table of contents

Initialization and first pass +≡

toc2: inherit fp-normal [
 sect1: continue return
 sect2: (set-last-section/emit 2 data)
]

set-last-section: func [level data /emit] [
 last-section-title: capture [emit-inline copy/deep data]
 last-section-id: either emit [emit-toc-item level last-section-title] [join "section-" next-section level]
]
emit-toc-item: func [level title /local num] [
 num: next-section level
 emit [<li> {<a href="#section-} num {">} num pick [". " " "] level = 1 title </a></li>]
 join "section-" num
]

7.2.4 Header handling for the first pass

The in-header state considers the code section immediately following the document title to be the RLP header. The header is stored in header; if no header is present, header is set to the header template (unsurprisingly named header-template).

Header handling for the first pass

header: none
header-template: context [
 Title: "Untitled"
 File: %output.r
]
in-header: [
 code: (preprocess-header data) fp-normal
 options: ( ) ; ignore
 default: (header: make header-template []) continue fp-normal
]

preprocess-header: func [text [string!]] [
 Preprocess the header
]

7.2.5 Preprocess the header

The text is loaded and an header object is constructed from header-template. If there is any problem loading text, header will be set to just a copy of header-template. header/title is set to the previously collected document title.

Preprocess the header

header: attempt [to block! text]
header: construct/with any [header []] header-template
header/title: title

7.3 Preprocess a code section

The text for each code section needs to be loaded and added to the code-sections block with its name. If no section name is given, the name is assumed to be '-main-. If the section has already been defined, then new code is appended to it; this allows building sections incrementally in the document.

Preprocess a code section

code: attempt [load/all code]
if code [
 parse code [
  [set name set-word! | (name: '-main-)] code:
 ]
 name: to word! name
 either section-data: select code-sections name [
  append section-data/code code
 ] [
  insert/only insert tail code-sections name section-data: make code-section! [
   title: last-section-title
   id: last-section-id
  ]
  section-data/code: code
 ]
]

7.3.1 preprocess-code-section's locals

preprocess-code-section's locals

name section-data

8. The build-doc function

The build-doc function

template: read %wetan-template.html
build-doc: func [text /local tmp] [
 save/header header/file generate-code '-main- header
 ;foreach [file start] output-files [save file generate-code start]
 either template [
  ; Template variables all begin with $
  tmp: copy template ; in case it gets reused
  replace/all tmp "$title" title
  replace/all tmp "$date" now/date
  replace tmp "$content" text
  tmp
 ] [
  copy text
 ]
]