Motive
The Structured Text specification is an attempt at describing commonly used email formatting rules for turning text into HTML. The design principles behind it are:
Stick with existing email ‘de-facto’ formatting standards as much as possible
Try not to invent new markup just because ‘it would be nice to be able to do this HTML construct’.
Block Level Elements
Title (h1)
Titles are formatted as follows:
==================== Title of my document ====================
Should become:
<h1>Title of my document</h1>
Sections (h2)
Sections are formatted as follows:
This is a section =================
Should become:
<h2>This is a section</h2>
Sub-sections (h3)
Sub-sections are formatted as follows:
This is a sub-section ---------------------
Should become:
<h3>This is a sub-section</h3>
Paragraphs (p)
Paragraphs are simply a collection of lines which are not indented. A double carriage return separates paragraph.
Note that a single carriage return should not be turned into a <br /> as it disrupts the flow of text.
Paragraphs are formatted as follows:
Paragraphs are simply a collection of lines which are not indented. A double carriage return separates paragraph.
Should become:
<p>Paragraphs are simply a collection of lines which are not indented.</p> <p>A double carriage return separates paragraph.</p>
Quoted Text (blockquote)
Quoted text is text which starts with a ‘>’ sign.
> this is a piece of quoted text. > it's nice.
Should become:
<blockquote><p>This is a piece of quoted text. It's nice.</p></blockquote>
Pre-formatted text (pre)
Pre-formatted text is a collection of indented lines. For example:
This is a paragraph. And some pre-formatted text. -^.^-
Should become:
<p>This is a paragraph.</p> <pre>And some pre-formatted text -^.^-</pre>
Unordered lists (ol)
Unordered lists can be started using * or -.
* This is a list * It contains stuff
Should become:
<ul><li><p>This is a list</p></li> <li><p>It contains stuff</p></li></ul>
Ordered lists (li)
Ordered lists are marked up using any sequence of digits, typically:
1. Step 1 2. Step 2 3. Step 3
Should become:
<ol><li><p>Step 1</p></li> <li><p>Step 2</p></li> <li><p>Step 3</p></li></ol>
Inline Elements
Emphasized text (em)
Emphasized text is any text for which:
The first letter is immediately preceeded with an underscore, which itself is preceeded by whitespace or carriage return or beginning of file.
The last letter or punctuation character is immediately followed by an underscore, which itself is followed by whitespace or a carriage return or end of file.
_this is emphasized._
Should become:
<em>This is emphasized.</em>
Bold text (strong)
Bold text is any text for which:
The first letter is immediately preceeded with a star (*), which itself is preceeded by whitespace or carriage return or beginning of file.
The last letter or punctuation character is immediately followed by a star (*), which itself is followed by whitespace or a carriage return or end of file.
*this is strong.*
Should become:
<strong>This is strong.</strong>
Automagic Niceties
This section describes niceties which the implementation may provide.
Smart Quotes
This is lifted from here:
http://www.textism.com/tools/textile/help.html?item=what
Replace single and double primes (' and ") used as quotation marks with HTML entities for opening and closing quotation marks in readable text, while leaving untouched the primes required within HTML tags.
Opening smart single quote: ‘ Closing smart single quote: ’ Opening smart double quote: “ Closing smart double quote: ”
Hyphens
Replace double hyphens (--) with an em-dash (—) entity.
em-dash: —
Replace single hyphens surrounded by spaces with an en-dash (–) entity.
en-dash: –
Ellipsis
Replace triplets of periods (…) with an ellipsis (…) entity.
ellipsis: …
Explicit Abbreviations
Wrap an a <abbr> tag around runs of two or more capital letters which are not followed by a space and then some text within parenthesis. The ‘title’ attribute of the acronym should be extracted from the parenthesis, and then the parenthesis and the text should be removed.
ACLU(American Civil Liberties Union)
may become:
<abbr title="American Civil Liberties Union">ACLU</abbr>
Implicit Abbreviations
Wrap an a <abbr> tag around runs of two or more capital letters which are followed by a space and then some text within parenthesis. The ‘title’ attribute of the acronym should be extracted from the parenthesis.
ACLU (American Civil Liberties Union)
may become:
<abbr title="American Civil Liberties Union">ACLU</abbr> (American Civil Liberties Union)
Copyright / trademarks / etc
Convert the following characters:
(TM), (R), and (C)
to ™, ®, and © respectively.
(tm) : ™ (r): ® (c): ©
Dimension
Convert the letter x to a dimension sign: 2×4 to 2×4 and 8×10 to 8×10
10x2: 10×2
Hyperlinks
Text which looks like a hyperlink (i.e. http://www.example.com) should be turned into a hyperlink automagically.
Furthermore, the implementation may try to resolve the addess, extract a title and a description from the resource, and replace the hyperlink text with the proper information.
Notes
Nesting
Lists (ul, ol) and blockquote elements should be nestable, so that you can write:
* This is a section =================
> > this is some double-quoted text > It's nice! > > * We can have lists > in there too > > and even some pre text.
Would become:
<ul><li><h2>This is a section</h2> <blockquote><blockquote><p>this is some double-quoted text</p></blockquote> <p>It's nice!</p> <ul><li><p>We can have lists in there too</p></li></ul> <pre>and even some pre text.</pre></blockquote></li></ul>
Preformatted text
Pre-formatted text should display the text as it is, without turning any inline markup or doing any automagic niceties.