Defines an emitter for MD3 that is actually a literate programming backend, combining the functionality of WEAVE and TANGLE. Emits XHTML documentation and REBOL code.
Date | Version | Description | Author |
---|---|---|---|
2-Mar-2006 | 1.1.0 | History start | |
19-Mar-2006 | 1.2.0 | Converting to RLP format | |
20-Mar-2006 | 1.3.0 | Converting to RLP format, adding code formatting | |
21-Mar-2006 | 1.4.0 | Converting to RLP format, finished code formatting | |
22-Mar-2006 | 1.5.0 | Converting to RLP format, adding document emitter | |
22-Mar-2006 | 1.6.0 | Converting to RLP format, added build-doc | |
22-Mar-2006 | 1.7.0 | Converting to RLP format, added temporary code to allow bootstrap | |
30-Mar-2006 | 1.8.0 | Added support for #[...] (Brian's suggestion), changed code parsing rules (also fixes bugs) | |
30-Mar-2006 | 1.9.0 | Added formatting for path! and keywords | |
12-May-2006 | 1.10.0 | Added formatting for #lit and #include | |
12-May-2006 | 1.11.0 | Added header formatting |
This program is an emitter for MakeDoc3 that serves as a literate programming tool. It treats MD3 documents specially (we'll call such MD3 documents REBOL Literate Programs or RLPs), and it is able to emit both the REBOL code for a program and its documentation (so it merges the functionality of the WEAVE and TANGLE tools available in WEB).
The basic idea behind this is that a RLP is just like any other MakeDoc document; this emitter, however, treats code sections (also known as "example" or "indented" sections) specially. They are always assumed to be valid REBOL (i.e. load-able), and they are parsed according to special rules (so you can call this a REBOL dialect) to rearrange them into a complete REBOL program, and do other processing (for example, handle localization automatically). Code is also rendered in a pleasant to read way in the XHTML output file (the style is actually fully controlled by CSS; WeTan just makes sure to provide the correct markup for the code).
This is a top view of the emitter. (See MakeDoc3's documentation for more details on make-emitter.)
〈Overview〉 ≡
wetan-emitter: make-emitter [
〈Initialization and first pass〉
〈The documentation emitter〉
〈The code emitter〉
〈The build-doc function〉
]
We'll proceed with the most interesting parts first (so that readers can get to the nice stuff immediately, and worry about the details only if they need to).
Given the name of a code section, start, we want to generate a REBOL block, starting from the section referred to by this name and resolving all references to other sections. Special directives also need to be processed here; formatting directives should be removed.
The parse-code-block function is used to parse a code block (and, recursively, any sub-block) and do all the processing; this function returns a new block that is the result of this processing.
Note that we return none if the given section name is not found.
〈Generate REBOL code starting from a given section〉 ≡
if section-data: select code-sections start [
parse-code-block section-data/code
]
Since this is the code for the generate-code function, we want to add the words we're using to the list of locals for that function.
〈generate-code's locals〉 ≡
section-data
Our parse rule analyzes each value in the block. The following cases are handled:
〈Parse code and build result〉 ≡
parse :code rule: [
some [
set word word! (either code: select code-sections word [parse code/code rule] [append result word])
|
[#lit | #literal] set value skip (append/only result :value)
|
#include set value [file! | url!] (append result load value)
|
set code [block! | paren!] (append/only result parse-code-block code)
|
set value skip (append/only result :value)
]
]
We want to add the words we're using to the list of locals for the parse-code-block function.
〈parse-code-block's locals〉 ≡
rule word value
〈Definition of parse-code-block〉 ≡
parse-code-block: func [code [any-block!] /local result 〈parse-code-block's locals〉] [
result: make :code length? :code
〈Parse code and build result〉
result
]
We have seen how to generate REBOL code from a code section; now we'll look at how to generate XHTML from it so that it can be rendered in a pleasant way by the browser, provided some CSS. We will use a technique similar to that used by Carl Sassenrath's %color-code.r.
If the first value of the code is a set-word!, it is assumed to be the section name, and it is treated specially. Then code is detab-ed, and parsed as a sequence of one or more lines of REBOL code. Note that if code is not valid REBOL code, then an error will eventually occurr; the caller is supposed to recover from this case.
〈Format a code section into XHTML〉 ≡
trim/auto code
emit <div class="code">
〈Emit the code section name, if present〉
emit <p>
parse/all detab code [
some [any tabs rebol-code]
]
emit </p>
emit </div>
If the first value in the code is a set-word!, then it is the name of the section we are defining; we want to emit the section title in angle brackets, followed by a "is defined as" sign (HTML entity ≡, that we can approximate in ASCII with "="). If we are appending to this section (i.e., this is not the first time we emit the section), then we want the section name to be followed by "+=" instead of just "=". For this reason we keep the visited flag in section-data.
The section name is usually (and should always be) followed by a newline, so we also skip it here. (Notice that we are not using parse/all so trailing spaces will be skipped as well.)
〈Emit the code section name, if present〉 ≡
set [name code] load/next code
either all [set-word? :name section-data: select code-sections to word! name] [
emit [
<p class="sectdef">
<span class="bra"> "〈" </span>
section-data/title
<span class="bra"> "〉" </span> " "
either section-data/visited ["+"] [""]
"≡"
</p>
]
section-data/visited: yes
; skip any space and the newline char
parse code [opt newline code:]
] [
code: head code
]
〈format-code-section's locals〉 ≡
name section-data
This section defines the parse rules that are used to format REBOL code (see 〈Format a code section into XHTML〉). The tabs rule consumes the spaces at the beginning of each line, and emits visual tabs (the CSS can be then used to control the size of the tab and so on; for example the default CSS renders a left border for tabs so that to have a light dotted line showing matching opening and closing brackets visually). We assume that the standard REBOL style of exactly four spaces per indentation has been used (note that we detab the code before parsing it, so if you just use tabs for indenting you are fine).
The rebol-code rule parses one line of REBOL code (after the indentation). Any remaining spaces at the beginning are ignored; the line is then parsed as a sequence of zero or more REBOL values or opening/closing brackets. (Note, that we don't actually need to recurse in case of brackets. It's not our job to enforce balancing; if the brackets are not balanced, the code will not load and thus the "tangle" process will fail; so, we can assume that the code we are parsing is valid.) Note that we treat a serialized value like a sub-block. (This change has been suggested by Brian Hawley.) The line can end with a comment introduced by ";". rebol-code also consumes the newline at the end of the line and emits a <br /> tag for it (earlier versions didn't do this, but that was trickier for no real benefit, so we decided to change it).
The rebol-value rule handles the #literal and #include commands, removing the former and formatting the latter specially; otherwise it just uses load/next to parse REBOL values.
〈Code formatting rules〉 ≡
tabs: [4 " " (emit {<span class="tab"> </span>})]
val: none
rebol-code: [
any " " some [
newline (emit <br />) break
|
copy val [";" [to newline | to end]]
(if val [emit [<span class="comment"> escape-html val </span>]])
opt [newline (emit <br />)] break
|
copy val ["[" | "#[" | "(" | ")" | "]"] (emit val) opt [some " " (emit " ")]
|
rebol-value opt [some " " (emit " ")]
]
]
here: none
rebol-value: [
["#lit " | "#literal "] here: skip (set [val here] load/next here emit-value :val) :here
|
"#include " here: skip (set [val here] load/next here emit-include :val) :here
|
here: skip (set [val here] load/next here emit-value :val) :here
]
emit-value: func [value /local 〈emit-value's locals〉] [
〈Emit a REBOL value〉
]
emit-include: func [dest /local 〈emit-include's locals〉] [
〈Emit the #include directive〉
]
When emitting a value, we check if we have a specialized emitter for its type in the type-emitters object; in this case we use this emitter function. Otherwise, we just emit the molded value, giving it its datatype as the HTML class name. (This makes it easy to style each datatype with the CSS.)
〈Emit a REBOL value〉 ≡
either special: in type-emitters type?/word :value [
do get special :value
] [
emit [{<span class="} form type? :value {">} escape-html mold :value </span>]
]
〈emit-value's locals〉 ≡
special
The type-emitters object contains a function for each datatype that we must treat specially when emitting.
For words, we must check if it is a section name. In this case, we have a section reference, and we emit it as the section title in angle brackets, with a link to the section itself. Otherwise the word is emitted with a two HTML class values and an HTML title; the first class is "word", and the second is taken from the keywords block, which defines a special class for words that need special rendering (such as datatypes, none and so on, or functions). The title comes from the keywords block too, and is usually used by browsers to show a tool-tip; so we put useful info there, for example the help for a function.
Paths are emitted by emitting all the components separately; this allows rendering each value in the path correctly.
〈Code formatting rules〉 +≡
type-emitters: context [
word!: func [value /local section-data subclass title] [
if section-data: select code-sections value [
emit [
<span class="ref">
<span class="bra"> "〈" </span>
{<a href="#} section-data/id {">}
section-data/title
</a>
<span class="bra"> "〉" </span>
</span>
]
exit
]
either set [subclass title] select keywords value [
subclass: join " " subclass
title: rejoin [{title="} escape-html title {"}]
] [
subclass: title: ""
]
emit [{<span class="word} subclass {"} title ">" escape-html mold value </span>]
]
path!: func [value] [
emit <span class="path">
; should always be a word, but we don't assume it
emit-value first value
foreach element next value [
emit "/"
emit-value element
]
emit </span>
]
]
The keywords block has the format some [word! into [string! string!]], with the first string being the subclass (e.g., "key", "type", and so on), and the second being the title (e.g., help text).
〈Code formatting rules〉 +≡
keywords: [
#include %keyword-list.r
]
At this point, dest should contain the file! or url! of the file that will be included here. (If dest is of a different type, we emit an error message.) We'll emit a link to the file documentation (if it exists - we assume it has the same name of the included file, with the suffix replaced with %.html), or to the file itself. (We avoid checking for the existence of an HTML file in case dest is an url!. You may want to change this if you include from urls a lot and you have no problems with exists? being called on them.)
〈Emit the #include directive〉 ≡
either any [file? :dest url? :dest] [
target: copy dest
if file? target [
either target: find/last target %. [
target: head change/part next target %html tail target
] [
target: join target %.html
]
if not exists? target [target: dest]
]
emit [
<span class="directive"> "#include "
{<a href="} escape-html target {">}
]
emit-value dest
emit "</a></span>"
] [
emit [
<span class="include-error"> "Cannot use "
<span class="directive"> "#include" </span>
" with the value "
]
emit-value mold :dest
emit "!</span>"
]
〈emit-include's locals〉 ≡
target
To format code, we need two functions; one is used to format code sections; the other is used to format code embedded in text. In the last case code is assumed to be only one line. (It is not being trimmed because it could contain a string; we don't think it makes sense to have a multiline string in this case, but the code sort-of handles the case.)
〈Definition of the code formatting functions〉 ≡
〈Code formatting rules〉
format-code-section: func [code [string!] /local 〈format-code-section's locals〉] [
〈Format a code section into XHTML〉
]
format-code: func [code [string!]] [
emit <span class="code">
parse/all detab code rebol-code
emit </span>
]
The code emitter consists in the generate-code function, and its helper function parse-code-block.
〈The code emitter〉 ≡
〈Definition of parse-code-block〉
generate-code: func [start [word!] /local 〈generate-code's locals〉] [
〈Generate REBOL code starting from a given section〉
]
The XHTML emitter for documentation is not much different from other MD3 emitters. For this reason, this is the least interesting part of this program. (It still does not come last, since we don't really know what we need to do in the first pass until we know what we need to have in the second pass.)
Here we define the initial state of the state machine for the emitter, which just skips the RLP header, emits it and moves to the normal state. (The header has been already collected during the first pass.) We also define the inline processing rules; we use format-code to format embedded REBOL code in the text, and we handle strong and emphasized text too. The embedded dialect (like "=[example]") is not yet implemented.
〈The documentation emitter〉 ≡
; initial state for the FSM
initial: [
; skip title and header
title: sect1: (emit-header) discard-header
code: (emit-header) normal
options: ( )
default: (emit-header) continue normal
]
discard-header: [
code: normal
options: ( )
default: continue normal
]
; inline processing
inline: [
normal: (emit escape-html data)
word: (format-code data)
strong: (emit [<strong> escape-html data </strong>])
emph: (emit [<em> escape-html data </em>])
rebol: (process-rebol data)
code: (format-code data)
]
process-rebol: func [block] [
emit "[Not yet implemented]"
]
〈Definition of the code formatting functions〉
The emit-header function formats the REBOL header into XHTML. If there's an history block, it is formatted as a table; the license should be formatted by using MakeDoc recursively, but this is not implemented yet. This function also emits the table of contents.
〈The documentation emitter〉 +≡
emit-header: has [tmp] [
emit <div id="header">
if in header 'title [
; header/title is the result of emit-inline so we don't need escape-html here
emit [<h1 id="title"> header/title </h1>]
]
if in header 'author [
emit [<h2 id="author"> escape-html copy header/author]
if in header 'email [
emit [
" <"
{<a href="mailto:} tmp: escape-html copy header/email {">} tmp </a>
">"
]
]
emit </h2>
]
if any [in header 'date in header 'version] [
emit <h2 id="dateversion">
if in header 'date [
emit escape-html form header/date
]
if all [in header 'date in header 'version] [
emit ", "
]
if in header 'version [
emit escape-html form header/version
]
emit </h2>
]
if in header 'purpose [
emit [<p id="purpose"> escape-html header/purpose </p>]
]
if in header 'license [
emit [<div id="license"> escape-html header/license </div>]
]
if all [in header 'history block? header/history] [
emit [
<table id="history">
<thead>
<tr><th> "Date" </th><th> "Version" </th>
<th> "Description" </th><th> "Author" </th></tr>
</thead>
<tbody>
]
parse header/history [
some [
set tmp date! (emit [<tr><td class="date"> escape-html form tmp </td>])
set tmp tuple! (emit [<td class="version"> escape-html form tmp </td>])
set tmp string! (emit [<td class="desc"> escape-html copy tmp </td>])
opt [set tmp word! (emit [<td class="name"> escape-html form tmp </td>])]
(emit </tr>)
]
]
emit [</tbody></table>]
]
emit </div>
emit toc
]
Many MD3 commands are currently unsupported. They will be added later, as they are less common in RLPs.
〈The documentation emitter〉 +≡
normal: [
para: (emit <p> emit-inline data emit </p>)
sect1: (emit <div class="section"> emit-sect 1 data) in-sect (emit </div>)
sect2: (emit-sect 2 data)
sect3: (emit-sect 3 data)
sect4: (emit-sect 4 data)
bullet: bullet2: bullet3: (emit <ul>) continue in-bul (emit </ul>)
enum: enum2: enum3: (emit <ol>) continue in-enum (emit </ol>)
code: (format-code-section data)
output: (emit data) ; to output html directly
define: (emit <dl>) continue in-define (emit </dl>)
image: (
emit [
either data/2 = 'center [<div class="image center">][<div class="image">]
{<img src="} data/1 {">}
</div>
]
)
center-in:
(emit <div class="center">)
in-center
(emit </div>)
center-out: (error "Unbalanced center-out")
note-in:
(emit [<div class="note"><h2>] emit-inline data emit </h2>)
in-note
(emit </div>)
note-out: (error "Unbalanced note-out")
indent-in:
(emit <blockquote>)
in-indent
(emit </blockquote>)
indent-out: (error "Unbalanced indent-out")
]
〈The documentation emitter〉 +≡
in-sect: inherit normal [
sect1: continue return
]
〈The documentation emitter〉 +≡
in-bul: [
bullet: (emit <li> emit-inline data emit </li>)
bullet2: bullet3: (emit <ul>) continue in-bul2 (emit </ul>)
enum2: enum3: (emit <ol>) continue in-enum2 (emit </ol>)
default: continue return
]
in-bul2: [
bullet2: (emit <li> emit-inline data emit </li>)
bullet3: (emit <ul>) continue in-bul3 (emit </ul>)
enum3: (emit <ol>) continue in-enum3 (emit </ol>)
default: continue return
]
in-bul3: [
bullet3: (emit <li> emit-inline data emit </li>)
default: continue return
]
〈The documentation emitter〉 +≡
in-enum: [
enum: (emit <li> emit-inline data emit </li>)
bullet2: bullet3: (emit <ul>) continue in-bul2 (emit </ul>)
enum2: enum3: (emit <ol>) continue in-enum2 (emit </ol>)
default: continue return
]
in-enum2: [
enum2: (emit <li> emit-inline data emit </li>)
bullet3: (emit <ul>) continue in-bul3 (emit </ul>)
enum3: (emit <ol>) continue in-enum3 (emit </ol>)
default: continue return
]
in-enum3: [
enum3: (emit <li> emit-inline data emit </li>)
default: continue return
]
〈The documentation emitter〉 +≡
in-define: [
define: (emit-define data)
default: continue return
]
emit-define: func [data [block!]] [
if data/1 [
emit <dt>
emit-inline data/1
emit </dt>
]
if data/2 [
emit <dd>
emit-inline data/2
emit <dd>
]
]
〈The documentation emitter〉 +≡
in-center: inherit normal [
center-out: return
]
〈The documentation emitter〉 +≡
in-note: inherit normal [
note-out: return
]
〈The documentation emitter〉 +≡
in-indent: inherit normal [
indent-out: return
]
〈The documentation emitter〉 +≡
escape-html: func [text][
; Convert to avoid special HTML chars:
foreach [from to] html-codes [replace/all text from to]
text
]
html-codes: ["&" "&" "<" "<" ">" ">"]
sects: 0.0.0.0
clear-sects: does [sects: 0.0.0.0]
next-section: func [level /local bump mask] [
; Return next section number. Clear sub numbers.
set [bump mask] pick [
[1.0.0.0 1.0.0.0]
[0.1.0.0 1.1.0.0]
[0.0.1.0 1.1.1.0]
[0.0.0.1 1.1.1.1]
] level
level: form sects: sects + bump * mask
clear find level ".0"
level
]
At this point, we know the data we need to collect during the first pass. First of all, we need a code-sections block holding all the data for code sections, so that select code-sections section-name will return the section data for a given section-name; this section data should be an object, with the fields title, visited, code and id.
We also need to collect the header and store it in the header object. Then, we need to collect the section headers to generate the table of contents.
The init-emitter function is an emitter function that can be defined by users to do any initialization before the state machine starts. Please refer to the MakeDoc3 documentation for more details.
〈Initialization and first pass〉 ≡
code-sections: [ ]
code-section!: context [
visited: no
code: title: id: none
]
init-emitter: func [doc [block!]] [
〈Initialize various values〉
〈Do first pass〉
]
One of the values that we need to initialize is code-sections itself.
〈Initialize various values〉 ≡
clear code-sections
We just use the FSM for the first pass as well. The fsm-do function is called with first-pass as the initial state. The generated table of contents is stored in toc.
〈Do first pass〉 ≡
clear-sects
toc: capture [
emit [<div id="toc"><h2> "Contents:" </h2><ul>]
fsm-do doc first-pass
emit [</ul></div>]
]
clear-sects
Our state machine for the first pass will need to process the title and the header of the RLP, then paragraphs of the code kind, and handle the code text to the preprocess-code-section function; section headers will need to be collected for both the table of contents and for use as code sections titles.
first-pass moves to the in-header state after processing the title; in-header then moves to fp-normal after processing the header. In this state we generate the table of contents (|sect3| and sect4 are not displayed in the TOC) using the auxiliary state toc2.
The title text is also stored in title.
〈Initialization and first pass〉 +≡
toc: none
title: none
last-section-title: last-section-id: none
first-pass: [
title: sect1: (title: capture [emit-inline data]) in-header
options: ( ) ; ignore
default: (title: "Untitled") continue in-header
]
〈Header handling for the first pass〉
fp-normal: [
code: (preprocess-code-section data)
〈Section headers handling〉
]
preprocess-code-section: func [code [string!] /local 〈preprocess-code-section's locals〉] [
〈Preprocess a code section〉
]
〈Section headers handling〉 ≡
sect1: (set-last-section/emit 1 data)
sect2: (emit <ul>) continue toc2 (emit </ul>)
sect3: (set-last-section 3 data)
sect4: (set-last-section 4 data)
〈Initialization and first pass〉 +≡
toc2: inherit fp-normal [
sect1: continue return
sect2: (set-last-section/emit 2 data)
]
set-last-section: func [level data /emit] [
last-section-title: capture [emit-inline copy/deep data]
last-section-id: either emit [emit-toc-item level last-section-title] [join "section-" next-section level]
]
emit-toc-item: func [level title /local num] [
num: next-section level
emit [<li> {<a href="#section-} num {">} num pick [". " " "] level = 1 title </a></li>]
join "section-" num
]
The in-header state considers the code section immediately following the document title to be the RLP header. The header is stored in header; if no header is present, header is set to the header template (unsurprisingly named header-template).
〈Header handling for the first pass〉 ≡
header: none
header-template: context [
Title: "Untitled"
File: %output.r
]
in-header: [
code: (preprocess-header data) fp-normal
options: ( ) ; ignore
default: (header: make header-template []) continue fp-normal
]
preprocess-header: func [text [string!]] [
〈Preprocess the header〉
]
The text is loaded and an header object is constructed from header-template. If there is any problem loading text, header will be set to just a copy of header-template. header/title is set to the previously collected document title.
〈Preprocess the header〉 ≡
header: attempt [to block! text]
header: construct/with any [header []] header-template
header/title: title
The text for each code section needs to be loaded and added to the code-sections block with its name. If no section name is given, the name is assumed to be '-main-. If the section has already been defined, then new code is appended to it; this allows building sections incrementally in the document.
〈Preprocess a code section〉 ≡
code: attempt [load/all code]
if code [
parse code [
[set name set-word! | (name: '-main-)] code:
]
name: to word! name
either section-data: select code-sections name [
append section-data/code code
] [
insert/only insert tail code-sections name section-data: make code-section! [
title: last-section-title
id: last-section-id
]
section-data/code: code
]
]
〈preprocess-code-section's locals〉 ≡
name section-data
〈The build-doc function〉 ≡
template: read %wetan-template.html
build-doc: func [text /local tmp] [
save/header header/file generate-code '-main- header
;foreach [file start] output-files [save file generate-code start]
either template [
; Template variables all begin with $
tmp: copy template ; in case it gets reused
replace/all tmp "$title" title
replace/all tmp "$date" now/date
replace tmp "$content" text
tmp
] [
copy text
]
]