Structured Text Version 1.0

Motive

The Structured Text specification is an attempt at describing commonly used email formatting rules for turning text into HTML. The design principles behind it are:

Block Level Elements

Title (h1)

Titles are formatted as follows:

====================
Title of my document
====================

Should become:

<h1>Title of my document</h1>

Sections (h2)

Sections are formatted as follows:

This is a section
=================

Should become:

<h2>This is a section</h2>

Sub-sections (h3)

Sub-sections are formatted as follows:

This is a sub-section
---------------------

Should become:

<h3>This is a sub-section</h3>

Paragraphs (p)

Paragraphs are simply a collection of lines which are not indented. A double carriage return separates paragraph.

Note that a single carriage return should not be turned into a <br /> as it disrupts the flow of text.

Paragraphs are formatted as follows:

Paragraphs are simply a collection of lines which are not indented.
A double carriage return separates paragraph.

Should become:

<p>Paragraphs are simply a collection of lines which are not indented.</p>
<p>A double carriage return separates paragraph.</p>

Quoted Text (blockquote)

Quoted text is text which starts with a ‘>’ sign.

> this is a piece of quoted text.
> it's nice.

Should become:

<blockquote><p>This is a piece of quoted text.
It's nice.</p></blockquote>

Pre-formatted text (pre)

Pre-formatted text is a collection of indented lines. For example:

This is a paragraph.
  And some pre-formatted text.
     -^.^-

Should become:

<p>This is a paragraph.</p>
<pre>And some pre-formatted text
    -^.^-</pre>

Unordered lists (ol)

Unordered lists can be started using * or -.

* This is a list
* It contains stuff

Should become:

<ul><li><p>This is a list</p></li>
<li><p>It contains stuff</p></li></ul>

Ordered lists (li)

Ordered lists are marked up using any sequence of digits, typically:

1. Step 1
2. Step 2
3. Step 3

Should become:

<ol><li><p>Step 1</p></li>
<li><p>Step 2</p></li>
<li><p>Step 3</p></li></ol>

Inline Elements

Emphasized text (em)

Emphasized text is any text for which:

_this is emphasized._

Should become:

<em>This is emphasized.</em>

Bold text (strong)

Bold text is any text for which:

*this is strong.*

Should become:

<strong>This is strong.</strong>

Automagic Niceties

This section describes niceties which the implementation may provide.

Smart Quotes

This is lifted from here:

http://www.textism.com/tools/textile/help.html?item=what

Replace single and double primes (' and ") used as quotation marks with HTML entities for opening and closing quotation marks in readable text, while leaving untouched the primes required within HTML tags.

Opening smart single quote: ‘
Closing smart single quote: ’
Opening smart double quote: “
Closing smart double quote: ”

Hyphens

Replace double hyphens (--) with an em-dash (—) entity.

em-dash: —

Replace single hyphens surrounded by spaces with an en-dash (–) entity.

en-dash: –

Ellipsis

Replace triplets of periods (…) with an ellipsis (…) entity.

ellipsis: …

Explicit Abbreviations

Wrap an a <abbr> tag around runs of two or more capital letters which are not followed by a space and then some text within parenthesis. The ‘title’ attribute of the acronym should be extracted from the parenthesis, and then the parenthesis and the text should be removed.

ACLU(American Civil Liberties Union)

may become:

<abbr title="American Civil Liberties Union">ACLU</abbr>

Implicit Abbreviations

Wrap an a <abbr> tag around runs of two or more capital letters which are followed by a space and then some text within parenthesis. The ‘title’ attribute of the acronym should be extracted from the parenthesis.

ACLU (American Civil Liberties Union)

may become:

<abbr title="American Civil Liberties Union">ACLU</abbr> (American Civil Liberties Union)

Copyright / trademarks / etc

Convert the following characters:

(TM), (R), and (C)

to ™, ®, and © respectively.

(tm) : ™
(r): ®
(c): ©

Dimension

Convert the letter x to a dimension sign: 2×4 to 2×4 and 8×10 to 8×10

10x2: 10×2

Hyperlinks

Text which looks like a hyperlink (i.e. http://www.example.com) should be turned into a hyperlink automagically.

Furthermore, the implementation may try to resolve the addess, extract a title and a description from the resource, and replace the hyperlink text with the proper information.

Notes

Nesting

Lists (ul, ol) and blockquote elements should be nestable, so that you can write:

* This is a section
  =================
> > this is some double-quoted text
> It's nice!
>
> * We can have lists
>   in there too
>
>  and even some pre text.

Would become:

<ul><li><h2>This is a section</h2>
<blockquote><blockquote><p>this is some double-quoted text</p></blockquote>
<p>It's nice!</p>
<ul><li><p>We can have lists
in there too</p></li></ul>
<pre>and even some pre text.</pre></blockquote></li></ul>

Preformatted text

Pre-formatted text should display the text as it is, without turning any inline markup or doing any automagic niceties.

This document was created by Jean-Michel Hiver on 2004-07-22 12:18:14.
This document was last modified by Bruno Postle on 2005-02-04 09:22:53.
MKDoc Ltd., 31 Psalter Lane, Sheffield, S11 8YL, UK.
Copyright © 2001-2005 MKDoc Ltd.