# Writing Extensions for Python-Markdown
-Python-Markdown includes an API for extension writers to plug their own
-custom functionality and/or syntax into the parser. There are Preprocessors
-which allow you to alter the source before it is passed to the parser,
-inline patterns which allow you to add, remove or override the syntax of
-any inline elements, and Postprocessors which allow munging of the
-output of the parser before it is returned. If you really want to dive in,
-there are also Blockprocessors which are part of the core BlockParser.
-
-As the parser builds an [ElementTree][ElementTree] object which is later rendered
-as Unicode text, there are also some helpers provided to ease manipulation of
-the tree. Each part of the API is discussed in its respective section below.
-Additionally, reading the source of some [Available Extensions][] may be
-helpful. For example, the [Footnotes][] extension uses most of the features
-documented here.
-
-## Preprocessors {: #preprocessors }
-
-Preprocessors munge the source text before it is passed into the Markdown
-core. This is an excellent place to clean up bad syntax, extract things the
-parser may otherwise choke on and perhaps even store it for later retrieval.
-
-Preprocessors should inherit from `markdown.preprocessors.Preprocessor` and
-implement a `run` method with one argument `lines`. The `run` method of
-each Preprocessor will be passed the entire source text as a list of Unicode
-strings. Each string will contain one line of text. The `run` method should
-return either that list, or an altered list of Unicode strings.
+Python-Markdown includes an API for extension writers to plug their own custom functionality and syntax into the
+parser. An extension will patch into one or more stages of the parser:
-A pseudo example:
+* [*Preprocessors*](#preprocessors) alter the source before it is passed to the parser.
+* [*Block Processors*](#blockprocessors) work with blocks of text separated by blank lines.
+* [*Tree Processors*](#treeprocessors) modify the constructed ElementTree
+* [*Inline Processors*](#inlineprocessors) are common tree processors for inline elements, such as `*strong*`.
+* [*Postprocessors*](#postprocessors) munge of the output of the parser just before it is returned.
+
+The parser loads text, applies the preprocessors, creates and builds an [ElementTree][ElementTree] object from the
+block processors and inline processors, renders the ElementTree object as Unicode text, and then then applies the
+postprocessors.
+
+There are classes and helpers provided to ease writing your extension. Each part of the API is discussed in its
+respective section below. Additionally, you can walk through the [Tutorial on Writing Extensions][tutorial]; look at
+some of the [Available Extensions][] and their [source code][extension source]. As always, you may report bugs, ask
+for help, and discuss various other issues on the [bug tracker].
+
+## Phases of processing {: #stages }
+
+### Preprocessors {: #preprocessors }
+
+Preprocessors munge the source text before it is passed to the Markdown parser. This is an excellent place to clean up
+bad characters or to extract portions for later processing that the parser may otherwise choke on.
+
+Preprocessors inherit from `markdown.preprocessors.Preprocessor` and implement a `run` method, which takes a single
+parameter `lines`. This parameter is the entire source text stored as a list of Unicode strings, one per line. `run`
+should return its processed list of Unicode strings, one per line.
+
+#### Example
+
+This simple example removes any lines with 'NO RENDER' before processing:
```python
from markdown.preprocessors import Preprocessor
+import re
-class MyPreprocessor(Preprocessor):
+class NoRender(Preprocessor):
+ """ Skip any line with words 'NO RENDER' in it. """
def run(self, lines):
new_lines = []
for line in lines:
- m = MYREGEX.match(line)
- if m:
- # do stuff
- else:
- new_lines.append(line)
+ m = re.search("NO RENDER", line)
+ if not m:
+ # any line without NO RENDER is passed through
+ new_lines.append(line)
return new_lines
```
-## Inline Patterns {: #inlinepatterns }
+#### Usages
-### Legacy
+Some preprocessors in the Markdown source tree include:
-Inline Patterns implement the inline HTML element syntax for Markdown such as
-`*emphasis*` or `[links](http://example.com)`. Pattern objects should be
-instances of classes that inherit from `markdown.inlinepatterns.Pattern` or
-one of its children. Each pattern object uses a single regular expression and
-must have the following methods:
+| Class | Kind | Description |
+| ------------------------------|-----------|------------------------------------------------- |
+| [`NormalizeWhiteSpace`][c1] | built-in | Normalizes whitespace by expanding tabs, fixing `\r` line endings, etc. |
+| [`HtmlBlockPreprocessor`][c2] | built-in | Removes html blocks from the text and stores them for later processing |
+| [`ReferencePreprocessor`][c3] | built-in | Removes reference definitions from text and stores for later processing |
+| [`MetaPreprocessor`][c4] | extension | Strips and records meta data at top of documents |
+| [`FootnotesPreprocessor`][c5] | extension | Removes footnote blocks from the text and stores them for later processing |
-* **`getCompiledRegExp()`**:
+[c1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/preprocessors.py
+[c2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/preprocessors.py
+[c3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/preprocessors.py
+[c4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/meta.py
+[c5]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py
- Returns a compiled regular expression.
+### Block Processors {: #blockprocessors }
-* **`handleMatch(m)`**:
+A block processor parses blocks of text and adds new elements to the `ElementTree`. Blocks of text, separated from
+other text by blank lines, may have a different syntax and produce a differently structured tree than other Markdown.
+Block processors excel at code formatting, equation layouts, and tables.
- Accepts a match object and returns an ElementTree element of a plain
- Unicode string.
+Block processors inherit from `markdown.blockprocessors.BlockProcessor`, are passed `md.parser` on initialization, and
+implement both the `test` and `run` methods:
-Also, Inline Patterns can define the property `ANCESTOR_EXCLUDES` with either
-a list or tuple of undesirable ancestors. The pattern should not match if it
-would cause the content to be a descendant of one of the defined tag names.
+* `test(self, parent, block)` takes two parameters: `parent` is the parent `ElementTree` element and `block` is a
+ single, multi-line, Unicode string of the current block. `test`, often a regular expression match, returns a true
+ value if the block processor's `run` method should be called to process starting at that block.
+* `run(self, parent, blocks)` has the same `parent` parameter as `test`; and `blocks` is the list of all remaining
+ blocks in the document, starting with the `block` passed to `test`. `run` may return `False` (not `None`) to signal
+ failure, meaning that it did not process the blocks after all. On success, `run` is expected to `pop` one or more
+ blocks from the front of `blocks` and attach new nodes to `parent`.
-Note that any regular expression returned by `getCompiledRegExp` must capture
-the whole block. Therefore, they should all start with `r'^(.*?)'` and end
-with `r'(.*?)!'`. When using the default `getCompiledRegExp()` method
-provided in the `Pattern` you can pass in a regular expression without that
-and `getCompiledRegExp` will wrap your expression for you and set the
-`re.DOTALL` and `re.UNICODE` flags. This means that the first group of your
-match will be `m.group(2)` as `m.group(1)` will match everything before the
-pattern.
+Crafting block processors is more involved and flexible than the other processors, involving controlling recursive
+parsing of the block's contents and managing state across invocations. For example, a blank line is allowed in
+indented code, so the second invocation of the inline code processor appends to the element tree generated by the
+previous call. Other block processors may insert new text into the `blocks` list, signal to future calls of itself,
+and more.
-For an example, consider this simplified emphasis pattern:
+To make writing these complex beasts more tractable, three convenience functions have been provided by the
+`BlockProcessor` parent class:
-```python
-from markdown.inlinepatterns import Pattern
-import xml.etree.ElementTree as etree
+* `lastChild(parent)` returns the last child of the given element or `None` if it has no children.
+* `detab(text)` removes one level of indent (four spaces by default) from the front of each line of the given
+ multi-line, text string, until a non-blank line is indented less.
+* `looseDetab(text, level)` removes multiple levels
+ of indent from the front of each line of `text` but does not affect lines indented less.
-class EmphasisPattern(Pattern):
- def handleMatch(self, m):
- el = etree.Element('em')
- el.text = m.group(2)
- return el
-```
+Also, `BlockProcessor` provides the fields `self.tab_length`, the tab length (default 4), and `self.parser`, the
+current `BlockParser` instance.
+
+#### BlockParser
+
+`BlockParser`, not to be confused with `BlockProcessor`, is the class used by Markdown to cycle through all the
+registered block processors. You should never need to create your own instance; use `self.parser` instead.
+
+The `BlockParser` instance provides a stack of strings for its current state, which your processor can push with
+`self.parser.set(state)`, pop with `self.parser.reset()`, or check the the top state with
+`self.parser.isstate(state)`. Be sure your code pops the states it pushes.
+
+The `BlockParser` instance can also be called recursively, that is, to process blocks from within your block
+processor. There are three methods:
+
+* `parseDocument(lines)` parses a list of lines, each a single-line Unicode string, returning a complete
+ `ElementTree`.
+* `parseChunk(parent, text)` parses a single, multi-line, possibly multi-block, Unicode string `text` and attaches the
+ resulting tree to `parent`.
+* `parseBlocks(parent, blocks)` takes a list of `blocks`, each a multi-line Unicode string without blank lines, and
+ attaches the resulting tree to `parent`.
-As discussed in [Integrating Your Code Into Markdown][], an instance of this
-class will need to be provided to Markdown. That instance would be created
-like so:
+For perspective, Markdown calls `parseDocument` which calls `parseChunk` which calls `parseBlocks` which calls your
+block processor, which, in turn, might call one of these routines.
+
+#### Example
+
+This example calls out important paragraphs by giving them a border. It looks for a fence line of exclamation points
+before and after and renders the fenced blocks into a new, styled `div`. If it does not find the ending fence line,
+it does nothing.
+
+Our code, like most block processors, is longer than other examples:
```python
-# an oversimplified regex
-MYPATTERN = r'\*([^*]+)\*'
-# pass in pattern and create instance
-emphasis = EmphasisPattern(MYPATTERN)
+def test_block_processor():
+ class BoxBlockProcessor(BlockProcessor):
+ RE_FENCE_START = r'^ *!{3,} *\n' # start line, e.g., ` !!!! `
+ RE_FENCE_END = r'\n *!{3,}\s*$' # last non-blank line, e.g, '!!!\n \n\n'
+
+ def test(self, parent, block):
+ return re.match(self.RE_FENCE_START, block)
+
+ def run(self, parent, blocks):
+ original_block = blocks[0]
+ blocks[0] = re.sub(self.RE_FENCE_START, '', blocks[0])
+
+ # Find block with ending fence
+ for block_num, block in enumerate(blocks):
+ if re.search(self.RE_FENCE_END, block):
+ # remove fence
+ blocks[block_num] = re.sub(self.RE_FENCE_END, '', block)
+ # render fenced area inside a new div
+ e = etree.SubElement(parent, 'div')
+ e.set('style', 'display: inline-block; border: 1px solid red;')
+ self.parser.parseBlocks(e, blocks[0:block_num + 1])
+ # remove used blocks
+ for i in range(0, block_num + 1):
+ blocks.pop(0)
+ return True # or could have had no return statement
+ # No closing marker! Restore and do nothing
+ blocks[0] = original_block
+ return False # equivalent to our test() routine returning False
+
+ class BoxExtension(Extension):
+ def extendMarkdown(self, md):
+ md.parser.blockprocessors.register(BoxBlockProcessor(md.parser), 'box', 175)
```
-Actually it would not be necessary to create that pattern (and not just because
-a more sophisticated emphasis pattern already exists in Markdown). The fact is,
-that example pattern is not very DRY. A pattern for `**strong**` text would
-be almost identical, with the exception that it would create a 'strong' element.
-Therefore, Markdown provides a number of generic pattern classes that can
-provide some common functionality. For example, both emphasis and strong are
-implemented with separate instances of the `SimpleTagPattern` listed below.
-Feel free to use or extend any of the Pattern classes found at
-`markdown.inlinepatterns`.
-
-### Future
-
-While users can still create plugins with the existing
-`markdown.inlinepatterns.Pattern`, a new, more flexible inline processor has
-been added which users are encouraged to migrate to. The new inline processor
-is found at `markdown.inlinepatterns.InlineProcessor`.
-
-The new processor is very similar to legacy with two major distinctions.
-
-1. Patterns no longer need to match the entire block, so patterns no longer
- start with `r'^(.*?)'` and end with `r'(.*?)!'`. This was a huge
- performance sink and this requirement has been removed. The returned match
- object will only contain what is explicitly matched in the pattern, and
- extension pattern groups now start with `m.group(1)`.
-
-2. The `handleMatch` method now takes an additional input called `data`,
- which is the entire block under analysis, not just what is matched with
- the specified pattern. The method also returns the element *and* the index
- boundaries relative to `data` that the return element is replacing
- (usually `m.start(0)` and `m.end(0)`). If the boundaries are returned as
- `None`, it is assumed that the match did not take place, and nothing will
- be altered in `data`.
-
-If all you need is the same functionality as the legacy processor, you can do
-as shown below. Most of the time, simple regular expression processing is all
-you'll need.
+Start with this example input:
-```python
-from markdown.inlinepatterns import InlineProcessor
-import xml.etree.ElementTree as etree
+``` text
+A regular paragraph of text.
-# an oversimplified regex
-MYPATTERN = r'\*([^*]+)\*'
+!!!!!
+First paragraph of wrapped text.
-class EmphasisPattern(InlineProcessor):
- def handleMatch(self, m, data):
- el = etree.Element('em')
- el.text = m.group(1)
- return el, m.start(0), m.end(0)
+Second Paragraph of **wrapped** text.
+!!!!!
-# pass in pattern and create instance
-emphasis = EmphasisPattern(MYPATTERN)
+Another regular paragraph of text.
```
-But, the new processor allows you handle much more complex patterns that are
-too much for Python's Re to handle. For instance, to handle nested brackets in
-link patterns, the built-in link inline processor uses the following pattern to
-find where a link *might* start:
+The fenced text adds one node with two children to the tree:
-```python
-LINK_RE = NOIMG + r'\['
-link = LinkInlineProcessor(LINK_RE, md_instance)
-```
+* `div`, with a `style` attribute. It renders as
+ `<div style="display: inline-block; border: 1px solid red;">...</div>`
+ * `p` with text `First paragraph of wrapped text.`
+ * `p` with text `Second Paragraph of **wrapped** text`. The conversion to a `<strong>` tag will happen when
+ running the inline processors, which will happen after all of the block processors have completed.
-It then uses programmed logic to actually walk the string (`data`), starting at
-where the match started (`m.start(0)`). If for whatever reason, the text
-does not appear to be a link, it returns `None` for the start and end boundary
-in order to communicate to the parser that no match was found.
+The example output might display as follows:
-```python
- # Just a snippet of the link's handleMatch
- # method to illustrate new logic
- def handleMatch(self, m, data):
- text, index, handled = self.getText(data, m.end(0))
+!!! note ""
+ <p>A regular paragraph of text.</p>
+ <div style="display: inline-block; border: 1px solid red;">
+ <p>First paragraph of wrapped text.</p>
+ <p>Second Paragraph of **wrapped** text.</p>
+ </div>
+ <p>Another regular paragraph of text.</p>
- if not handled:
- return None, None, None
+#### Usages
- href, title, index, handled = self.getLink(data, index)
- if not handled:
- return None, None, None
+Some block processors in the Markdown source tree include:
- el = etree.Element("a")
- el.text = text
+| Class | Kind | Description |
+| ----------------------------|-----------|---------------------------------------------|
+| [`HashHeaderProcessor`][b1] | built-in | Title hashes (`#`), which may split blocks |
+| [`HRProcessor`][b2] | built-in | Horizontal lines, e.g., `---` |
+| [`OListProcessor`][b3] | built-in | Ordered lists; complex and using `state` |
+| [`Admonition`][b4] | extension | Render each [Admonition][] in a new `div` |
- el.set("href", href)
+[b1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/blockprocessors.py
+[b2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/blockprocessors.py
+[b3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/blockprocessors.py
+[Admonition]: https://python-markdown.github.io/extensions/admonition/
+[b4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/admonition.py
- if title is not None:
- el.set("title", title)
+### Tree processors {: #treeprocessors }
- return el, m.start(0), index
-```
+Tree processors manipulate the tree created by block processors. They can even create an entirely new ElementTree
+object. This is an excellent place for creating summaries, adding collected references, or last minute adjustments.
+
+A tree processor must inherit from `markdown.treeprocessors.Treeprocessor` (note the capitalization). A tree processor
+must implement a `run` method which takes a single argument `root`. In most cases `root` would be an
+`xml.etree.ElementTree.Element` instance; however, in rare cases it could be some other type of ElementTree object.
+The `run` method may return `None`, in which case the (possibly modified) original `root` object is used, or it may
+return an entirely new `Element` object, which will replace the existing `root` object and all of its children. It is
+generally preferred to modify `root` in place and return `None`, which avoids creating multiple copies of the entire
+document tree in memory.
-### Generic Pattern Classes
+For specifics on manipulating the ElementTree, see [Working with the ElementTree][workingwithetree] below.
-Some example processors that are available.
+#### Example
+
+A pseudo example:
-* **`SimpleTextInlineProcessor(pattern)`**:
+```python
+from markdown.treeprocessors import Treeprocessor
- Returns simple text of `group(2)` of a `pattern` and the start and end
- position of the match.
+class MyTreeprocessor(Treeprocessor):
+ def run(self, root):
+ root.text = 'modified content'
+ # No return statement is same as `return None`
+```
-* **`SimpleTagInlineProcessor(pattern, tag)`**:
+#### Usages
- Returns an element of type "`tag`" with a text attribute of `group(3)`
- of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em').
- It also returns the start and end position of the match.
+The core `InlineProcessor` class is a tree processor. It walks the tree, matches patterns, and splits and creates
+nodes on matches.
-* **`SubstituteTagInlineProcessor(pattern, tag)`**:
+Additional tree processors in the Markdown source tree include:
- Returns an element of type "`tag`" with no children or text (i.e.: `br`)
- and the start and end position of the match.
+| Class | Kind | Description |
+| ----------------------------------|-----------|---------------------------------------------------------------|
+| [`PrettifyTreeprocessor`][e1] | built-in | Add line breaks to the html document |
+| [`TocTreeprocessor`][e2] | extension | Builds a [table of contents][] from the finished tree |
+| [`FootnoteTreeprocessor`][e3] | extension | Create [footnote][] div at end of document |
+| [`FootnotePostTreeprocessor`][e4] | extension | Amend div created by `FootnoteTreeprocessor` with duplicates |
-A very small number of the basic legacy processors are still available to
-prevent breakage of 3rd party extensions during the transition period to the
-new processors. Three of the available processors are listed below.
+[e1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/treeprocessors.py
+[e2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/toc.py
+[e3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py
+[e4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py
+[table of contents]: https://python-markdown.github.io/extensions/toc/
+[footnote]: https://python-markdown.github.io/extensions/footnotes/
-* **`SimpleTextPattern(pattern)`**:
+### Inline Processors {: #inlineprocessors }
- Returns simple text of `group(2)` of a `pattern`.
+Inline processors, previously called inline patterns, are used to add formatting, such as `**emphasis**`, by replacing
+a matched pattern with a new element tree node. It is an excellent for adding new syntax for inline tags. Inline
+processor code is often quite short.
-* **`SimpleTagPattern(pattern, tag)`**:
+Inline processors inherit from `InlineProcessor`, are initialized, and implement `handleMatch`:
- Returns an element of type "`tag`" with a text attribute of `group(3)`
- of a `pattern`. `tag` should be a string of a HTML element (i.e.: 'em').
+* `__init__(self, pattern, md=None)` is the inherited constructor. You do not need to implement your own.
+ * `pattern` is the regular expression string that must match the code block in order for the `handleMatch` method
+ to be called.
+ * `md`, an optional parameter, is a pointer to the instance of `markdown.Markdown` and is available as `self.md`
+ on the `InlineProcessor` instance.
-* **`SubstituteTagPattern(pattern, tag)`**:
+* `handleMatch(self, m, data)` must be implemented in all `InlineProcessor` subclasses.
+ * `m` is the regular expression [match object][] found by the `pattern` passed to `__init__`.
+ * `data` is a single, multi-line, Unicode string containing the entire block of text around the pattern. A block
+ is text set apart by blank lines.
+ * Returns either `(None, None, None)`, indicating the provided match was rejected or `(el, start, end)`, if the
+ match was successfully processed. On success, `el` is the element being added the tree, `start` and `end` are
+ indexes in `data` that were "consumed" by the pattern. The "consumed" span will be replaced by a placeholder.
+ The same inline processor may be called several times on the same block.
- Returns an element of type "`tag`" with no children or text (i.e.: `br`).
+Inline Processors can define the property `ANCESTOR_EXCLUDES` which is either a list or tuple of undesirable ancestors.
+The processor will be skipped if it would cause the content to be a descendant of one of the listed tag names.
-There may be other Pattern classes in the Markdown source that you could extend
-or use as well. Read through the source and see if there is anything you can
-use. You might even get a few ideas for different approaches to your specific
-situation.
+##### Convenience Classes
-## Treeprocessors {: #treeprocessors }
+Convenience subclasses of `InlineProcessor` are provide for common operations:
-Treeprocessors manipulate an ElementTree object after it has passed through the
-core BlockParser. This is where additional manipulation of the tree takes
-place. Additionally, the InlineProcessor is a Treeprocessor which steps through
-the tree and runs the Inline Patterns on the text of each Element in the tree.
+* [`SimpleTextInlineProcessor`][i1] returns the text of `group(1)` of the match.
+* [`SubstituteTagInlineProcessor`][i4] is initialized as `SubstituteTagInlineProcessor(pattern, tag)`. It returns a
+ new element `tag` whenever `pattern` is matched.
+* [`SimpleTagInlineProcessor`][i3] is initialized as `SimpleTagInlineProcessor(pattern, tag)`. It returns an element
+ `tag` with a text field of `group(2)` of the match.
-A Treeprocessor should inherit from `markdown.treeprocessors.Treeprocessor`,
-over-ride the `run` method which takes one argument `root` (an ElementTree
-object) and either modifies that root element and returns `None` or returns a
-new ElementTree object.
+##### Example
-A pseudo example:
+This example changes `--strike--` to `<del>strike</del>`.
```python
-from markdown.treeprocessors import Treeprocessor
+from markdown.inlinepatterns import InlineProcessor
+from markdown.extensions import Extension
+import xml.etree.ElementTree as etree
-class MyTreeprocessor(Treeprocessor):
- def run(self, root):
- root.text = 'modified content'
+
+class DelInlineProcessor(InlineProcessor):
+ def handleMatch(self, m, data):
+ el = etree.Element('del')
+ el.text = m.group(1)
+ return el, m.start(0), m.end(0)
+
+class DelExtension(Extension):
+ def extendMarkdown(self, md):
+ DEL_PATTERN = r'--(.*?)--' # like --del--
+ md.inlinePatterns.register(DelInlineProcessor(DEL_PATTERN, md), 'del', 175)
```
-Note that Python class methods return `None` by default when no `return`
-statement is defined. Additionally all Python variables refer to objects by
-reference. Therefore, the above `run` method modifies the `root` element
-in place and returns `None`. The changes made to the `root` element and its
-children are retained.
+Use this input example:
+
+``` text
+First line of the block.
+This is --strike one--.
+This is --strike two--.
+End of the block.
+```
-Some may be inclined to return the modified `root` element. While that would
-work, it would cause a copy of the entire ElementTree to be generated each
-time the Treeprocessor is run. Therefore, it is generally expected that
-the `run` method would only return `None` or a new ElementTree object.
+The example output might display as follows:
-For specifics on manipulating the ElementTree, see
-[Working with the ElementTree][workingwithetree] below.
+!!! note ""
+ <p>First line of the block.
+ This is <del>strike one</del>.
+ This is <del>strike two</del>.
+ End of the block.</p>
-## Postprocessors {: #postprocessors }
+* On the first call to `handleMatch`
+ * `m` will be the match for `--strike one--`
+ * `data` will be the string:
+ `First line of the block.\nThis is --strike one--.\nThis is --strike two--.\nEnd of the block.`
-Postprocessors manipulate the document after the ElementTree has been
-serialized into a string. Postprocessors should be used to work with the
-text just before output.
+ Because the match was successful, the region between the returned `start` and `end` are replaced with a
+ placeholder token and the new element is added to the tree.
-A Postprocessor should inherit from `markdown.postprocessors.Postprocessor`
-and over-ride the `run` method which takes one argument `text` and returns
-a Unicode string.
+* On the second call to `handleMatch`
+ * `m` will be the match for `--strike two--`
+ * `data` will be the string
+ `First line of the block.\nThis is \ 2klzzwxh:0000\ 3.\nThis is --strike two--.\nEnd of the block.`
-Postprocessors are run after the ElementTree has been serialized back into
-Unicode text. For example, this may be an appropriate place to add a table of
-contents to a document:
+Note the placeholder token `\ 2klzzwxh:0000\ 3`. This allows the regular expression to be run against the entire block,
+not just the the text contained in an individual element. The placeholders will later be swapped back out for the
+actual elements by the parser.
+
+Actually it would not be necessary to create the above inline processor. The fact is, that example is not very DRY
+(Don't Repeat Yourself). A pattern for `**strong**` text would be almost identical, with the exception that it would
+create a `strong` element. Therefore, Markdown provides a number of generic `InlineProcessor` subclasses that can
+provide some common functionality. For example, strike could be implemented with an instance of the
+`SimpleTagInlineProcessor` class as demonstrated below. Feel free to use or extend any of the `InlineProcessor`
+subclasses found at `markdown.inlinepatterns`.
```python
-from markdown.postprocessors import Postprocessor
+from markdown.inlinepatterns import SimpleTagInlineProcessor
+from markdown.extensions import Extension
-class TocPostprocessor(Postprocessor):
- def run(self, text):
- return MYMARKERRE.sub(MyToc, text)
+class DelExtension(Extension):
+ def extendMarkdown(self, md):
+ md.inlinePatterns.register(SimpleTagInlineProcessor(r'()--(.*?)--', 'del'), 'del', 175)
```
-## BlockParser {: #blockparser }
-Sometimes, Preprocessors, Treeprocessors, Postprocessors, and Inline Patterns
-are not going to do what you need. Perhaps you want a new type of block type
-that needs to be integrated into the core parsing. In such a situation, you can
-add/change/remove functionality of the core `BlockParser`. The BlockParser is
-composed of a number of Blockprocessors. The BlockParser steps through each
-block of text (split by blank lines) and passes each block to the appropriate
-Blockprocessor. That Blockprocessor parses the block and adds it to the
-ElementTree. The
-[Definition Lists][] extension would be a good example of an extension that
-adds/modifies Blockprocessors.
+##### Usages
-A Blockprocessor should inherit from `markdown.blockprocessors.BlockProcessor`
-and implement both the `test` and `run` methods.
+Here are some convenience functions and other examples:
-The `test` method is used by BlockParser to identify the type of block.
-Therefore the `test` method must return a Boolean value. If the test returns
-`True`, then the BlockParser will call that Blockprocessor's `run` method.
-If it returns `False`, the BlockParser will move on to the next
-Blockprocessor.
+| Class | Kind | Description |
+| ---------------------------------|-----------|---------------------------------------------------------------|
+| [`AsteriskProcessor`][i5] | built-in | Emphasis processor for handling strong and em matches inside asterisks |
+| [`AbbrInlineProcessor`][i6] | extension | Apply tag to abbreviation registered by preprocessor |
+| [`WikiLinksInlineProcessor`][i7] | extension | Link `[[article names]]` to wiki given in metadata |
+| [`FootnoteInlineProcessor`][i8] | extension | Replaces footnote in text with link to footnote div at bottom |
-The **`test`** method takes two arguments:
+[i1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py
+[i2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py
+[i3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py
+[i4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py
+[i5]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py
+[i6]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/abbr.py
+[i7]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/wikilinks.py
+[i8]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py
-* **`parent`**: The parent ElementTree Element of the block. This can be useful
- as the block may need to be treated differently if it is inside a list, for
- example.
+### Patterns
-* **`block`**: A string of the current block of text. The test may be a
- simple string method (such as `block.startswith(some_text)`) or a complex
- regular expression.
+In version 3.0, a new, more flexible inline processor was added, `markdown.inlinepatterns.InlineProcessor`. The
+original inline patterns, which inherit from `markdown.inlinepatterns.Pattern` or one of its children are still
+supported, though users are encouraged to migrate.
-The **`run`** method takes two arguments:
+#### Comparison with new `InlineProcessor`
-* **`parent`**: A pointer to the parent ElementTree Element of the block. The run
- method will most likely attach additional nodes to this parent. Note that
- nothing is returned by the method. The ElementTree object is altered in place.
+The new `InlineProcessor` provides two major enhancements to `Patterns`:
-* **`blocks`**: A list of all remaining blocks of the document. Your run
- method must remove (pop) the first block from the list (which it altered in
- place - not returned) and parse that block. You may find that a block of text
- legitimately contains multiple block types. Therefore, after processing the
- first type, your processor can insert the remaining text into the beginning
- of the `blocks` list for future parsing.
+1. Inline Processors no longer need to match the entire block, so regular expressions no longer need to start with
+ `r'^(.*?)'` and end with `r'(.*?)%'`. This runs faster. The returned [match object][] will only contain what is
+ explicitly matched in the pattern, and extension pattern groups now start with `m.group(1)`.
-Please be aware that a single block can span multiple text blocks. For example,
-The official Markdown syntax rules state that a blank line does not end a
-Code Block. If the next block of text is also indented, then it is part of
-the previous block. Therefore, the BlockParser was specifically designed to
-address these types of situations. If you notice the `CodeBlockProcessor`,
-in the core, you will note that it checks the last child of the `parent`.
-If the last child is a code block (`<pre><code>...</code></pre>`), then it
-appends that block to the previous code block rather than creating a new
-code block.
+2. The `handleMatch` method now takes an additional input called `data`, which is the entire block under analysis,
+ not just what is matched with the specified pattern. The method now returns the element *and* the indexes relative
+ to `data` that the return element is replacing (usually `m.start(0)` and `m.end(0)`). If the boundaries are
+ returned as `None`, it is assumed that the match did not take place, and nothing will be altered in `data`.
-Each Blockprocessor has the following utility methods available:
+ This allows handling of more complex constructs than regular expressions can handle, e.g., matching nested
+ brackets, and explicit control of the span "consumed" by the processor.
+
+#### Inline Patterns
-* **`lastChild(parent)`**:
-
- Returns the last child of the given ElementTree Element or `None` if it
- had no children.
+Inline Patterns can implement inline HTML element syntax for Markdown such as `*emphasis*` or
+`[links](http://example.com)`. Pattern objects should be instances of classes that inherit from
+`markdown.inlinepatterns.Pattern` or one of its children. Each pattern object uses a single regular expression and
+must have the following methods:
-* **`detab(text)`**:
+* **`getCompiledRegExp()`**:
- Removes one level of indent (four spaces by default) from the front of each
- line of the given text string.
+ Returns a compiled regular expression.
-* **`looseDetab(text, level)`**:
+* **`handleMatch(m)`**:
- Removes "level" levels of indent (defaults to 1) from the front of each line
- of the given text string. However, this methods allows secondary lines to
- not be indented as does some parts of the Markdown syntax.
+ Accepts a match object and returns an ElementTree element of a plain Unicode string.
-Each Blockprocessor also has a pointer to the containing BlockParser instance at
-`self.parser`, which can be used to check or alter the state of the parser.
-The BlockParser tracks it's state in a stack at `parser.state`. The state
-stack is an instance of the `State` class.
+Inline Patterns can define the property `ANCESTOR_EXCLUDES` with is either a list or tuple of undesirable ancestors.
+The pattern will be skipped if it would cause the content to be a descendant of one of the listed tag names.
-**`State`** is a subclass of `list` and has the additional methods:
+Note that any regular expression returned by `getCompiledRegExp` must capture the whole block. Therefore, they should
+all start with `r'^(.*?)'` and end with `r'(.*?)!'`. When using the default `getCompiledRegExp()` method provided in
+the `Pattern` you can pass in a regular expression without that and `getCompiledRegExp` will wrap your expression for
+you and set the `re.DOTALL` and `re.UNICODE` flags. This means that the first group of your match will be `m.group(2)`
+as `m.group(1)` will match everything before the pattern.
-* **`set(state)`**:
+For an example, consider this simplified emphasis pattern:
- Set a new state to string `state`. The new state is appended to the end
- of the stack.
+```python
+from markdown.inlinepatterns import Pattern
+import xml.etree.ElementTree as etree
-* **`reset()`**:
+class EmphasisPattern(Pattern):
+ def handleMatch(self, m):
+ el = etree.Element('em')
+ el.text = m.group(2)
+ return el
+```
- Step back one step in the stack. The last state at the end is removed from
- the stack.
+As discussed in [Integrating Your Code Into Markdown][], an instance of this class will need to be provided to
+Markdown. That instance would be created like so:
-* **`isstate(state)`**:
+```python
+# an oversimplified regex
+MYPATTERN = r'\*([^*]+)\*'
+# pass in pattern and create instance
+emphasis = EmphasisPattern(MYPATTERN)
+```
- Test that the top (current) level of the stack is of the given string
- `state`.
+### Postprocessors {: #postprocessors }
-Note that to ensure that the state stack does not become corrupted, each time a
-state is set for a block, that state *must* be reset when the parser finishes
-parsing that block.
+Postprocessors munge the document after the ElementTree has been serialized into a string. Postprocessors should be
+used to work with the text just before output. Usually, they are used add back sections that were extracted in a
+preprocessor, fix up outgoing encodings, or wrap the whole document.
-An instance of the **`BlockParser`** is found at `Markdown.parser`.
-`BlockParser` has the following methods:
+Postprocessors inherit from `markdown.postprocessors.Postprocessor` and implement a `run` method which takes a single
+parameter `text`, the entire HTML document as a single Unicode string. `run` should return a single Unicode string
+ready for output. Note that preprocessors use a list of lines while postprocessors use a single multi-line string.
-* **`parseDocument(lines)`**:
+#### Example
- Given a list of lines, an ElementTree object is returned. This should be
- passed an entire document and is the only method the `Markdown` class
- calls directly.
+Here is a simple example that changes the output to one big page showing the raw html.
-* **`parseChunk(parent, text)`**:
+```python
+from markdown.postprocessors import Postprocessor
+import re
- Parses a chunk of markdown text composed of multiple blocks and attaches
- those blocks to the `parent` Element. The `parent` is altered in place
- and nothing is returned. Extensions would most likely use this method for
- block parsing.
+class ShowActualHtmlPostprocesor(Postprocessor):
+ """ Wrap entire output in <pre> tags as a diagnostic. """
+ def run(self, text):
+ return '<pre>\n' + re.sub('<', '<', text) + '</pre>\n'
+```
-* **`parseBlocks(parent, blocks)`**:
+#### Usages
- Parses a list of blocks of text and attaches those blocks to the `parent`
- Element. The `parent` is altered in place and nothing is returned. This
- method will generally only be used internally to recursively parse nested
- blocks of text.
+Some postprocessors in the Markdown source tree include:
-While it is not recommended, an extension could subclass or completely replace
-the `BlockParser`. The new class would have to provide the same public API.
-However, be aware that other extensions may expect the core parser provided
-and will not work with such a drastically different parser.
+| Class | Kind | Description |
+| ------------------------------|-----------|----------------------------------------------------|
+| [`raw_html`][p1] | built-in | Restore raw html from `htmlStash`, stored by `HTMLBlockPreprocessor`, and code highlighters |
+| [`amp_substitute`][p2] | built-in | Convert ampersand substitutes to `&`; used in links |
+| [`unescape`][p3] | built-in | Convert some escaped characters back from integers; used in links |
+| [`FootnotePostProcessor`][p4] | extension | Replace footnote placeholders with html entities; as set by other stages |
+
+ [p1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/postprocessors.py
+ [p2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/postprocessors.py
+ [p3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/postprocessors.py
+ [p4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py
+
## Working with the ElementTree {: #working_with_et }
-As mentioned, the Markdown parser converts a source document to an
-[ElementTree][ElementTree] object before serializing that back to Unicode text.
-Markdown has provided some helpers to ease that manipulation within the context
+As mentioned, the Markdown parser converts a source document to an [ElementTree][ElementTree] object before
+serializing that back to Unicode text. Markdown has provided some helpers to ease that manipulation within the context
of the Markdown module.
First, import the ElementTree module:
```python
import xml.etree.ElementTree as etree
```
-Sometimes you may want text inserted into an element to be parsed by
-[Inline Patterns][]. In such a situation, simply insert the text as you normally
-would and the text will be automatically run through the Inline Patterns.
-However, if you do *not* want some text to be parsed by Inline Patterns,
-then insert the text as an `AtomicString`.
+Sometimes you may want text inserted into an element to be parsed by [Inline Patterns][]. In such a situation, simply
+insert the text as you normally would and the text will be automatically run through the Inline Patterns. However, if
+you do *not* want some text to be parsed by Inline Patterns, then insert the text as an `AtomicString`.
```python
from markdown.util import AtomicString
some_element.text = AtomicString(some_text)
```
-Here's a basic example which creates an HTML table (note that the contents of
-the second cell (`td2`) will be run through Inline Patterns latter):
+Here's a basic example which creates an HTML table (note that the contents of the second cell (`td2`) will be run
+through Inline Patterns latter):
```python
table = etree.Element("table")
table.tail = "Text after table" # Add text after table
```
-You can also manipulate an existing tree. Consider the following example which
-adds a `class` attribute to `<a>` elements:
+You can also manipulate an existing tree. Consider the following example which adds a `class` attribute to `<a>`
+elements:
```python
def set_link_class(self, element):
for child in element:
if child.tag == "a":
child.set("class", "myclass") #set the class attribute
- set_link_class(child) # run recursively on children
+ set_link_class(child) # run recursively on children
```
For more information about working with ElementTree see the ElementTree
-[Documentation](https://effbot.org/zone/element-index.htm)
-([Python Docs](https://docs.python.org/3/library/xml.etree.elementtree.html)).
+[Documentation](https://effbot.org/zone/element-index.htm) ([Python
+Docs](https://docs.python.org/3/library/xml.etree.elementtree.html)).
## Integrating Your Code Into Markdown {: #integrating_into_markdown }
-Once you have the various pieces of your extension built, you need to tell
-Markdown about them and ensure that they are run in the proper sequence.
-Markdown accepts an `Extension` instance for each extension. Therefore, you
-will need to define a class that extends `markdown.extensions.Extension` and
-over-rides the `extendMarkdown` method. Within this class you will manage
-configuration options for your extension and attach the various processors and
-patterns to the Markdown instance.
-
-It is important to note that the order of the various processors and patterns
-matters. For example, if we replace `http://...` links with `<a>` elements, and
-*then* try to deal with inline HTML, we will end up with a mess. Therefore, the
-various types of processors and patterns are stored within an instance of the
-Markdown class in a [Registry][]. Your `Extension` class will need to manipulate
-those registries appropriately. You may `register` instances of your processors
-and patterns with an appropriate priority, `deregister` built-in instances, or
-replace a built-in instance with your own.
+Once you have the various pieces of your extension built, you need to tell Markdown about them and ensure that they
+are run in the proper sequence. Markdown accepts an `Extension` instance for each extension. Therefore, you will need
+to define a class that extends `markdown.extensions.Extension` and over-rides the `extendMarkdown` method. Within this
+class you will manage configuration options for your extension and attach the various processors and patterns to the
+Markdown instance.
+
+It is important to note that the order of the various processors and patterns matters. For example, if we replace
+`http://...` links with `<a>` elements, and *then* try to deal with inline HTML, we will end up with a mess.
+Therefore, the various types of processors and patterns are stored within an instance of the `markdown.Markdown` class
+in a [Registry][]. Your `Extension` class will need to manipulate those registries appropriately. You may `register`
+instances of your processors and patterns with an appropriate priority, `deregister` built-in instances, or replace a
+built-in instance with your own.
### `extendMarkdown` {: #extendmarkdown }
-The `extendMarkdown` method of a `markdown.extensions.Extension` class
-accepts one argument:
+The `extendMarkdown` method of a `markdown.extensions.Extension` class accepts one argument:
* **`md`**:
- A pointer to the instance of the Markdown class. You should use this to
- access the [Registries][Registry] of processors and patterns. They are
- found under the following attributes:
+ A pointer to the instance of the `markdown.Markdown` class. You should use this to access the
+ [Registries][Registry] of processors and patterns. They are found under the following attributes:
* `md.preprocessors`
* `md.inlinePatterns`
* `md.treeprocessors`
* `md.postprocessors`
- Some other things you may want to access in the markdown instance are:
+ Some other things you may want to access on the `markdown.Markdown` instance are:
* `md.htmlStash`
* `md.output_formats`
* `md.isBlockLevel()`
!!! Warning
- With access to the above items, theoretically you have the option to
- change anything through various [monkey_patching][] techniques. However,
- you should be aware that the various undocumented parts of markdown may
- change without notice and your monkey_patches may break with a new release.
- Therefore, what you really should be doing is inserting processors and
- patterns into the markdown pipeline. Consider yourself warned!
+ With access to the above items, theoretically you have the option to change anything through various
+ [monkey_patching][] techniques. However, you should be aware that the various undocumented parts of Markdown may
+ change without notice and your monkey_patches may break with a new release. Therefore, what you really should be
+ doing is inserting processors and patterns into the Markdown pipeline. Consider yourself warned!
[monkey_patching]: https://en.wikipedia.org/wiki/Monkey_patch
md.inlinePatterns.register(MyPattern(md), 'mypattern', 175)
```
-### Registry
-
-The `markdown.util.Registry` class is a priority sorted registry which Markdown
-uses internally to determine the processing order of its various processors and
-patterns.
-
-A `Registry` instance provides two public methods to alter the data of the
-registry: `register` and `deregister`. Use `register` to add items and
-`deregister` to remove items. See each method for specifics.
-
-When registering an item, a "name" and a "priority" must be provided. All
-items are automatically sorted by the value of the "priority" parameter such
-that the item with the highest value will be processed first. The "name" is
-used to remove (`deregister`) and get items.
-
-A `Registry` instance is like a list (which maintains order) when reading
-data. You may iterate over the items, get an item and get a count (length)
-of all items. You may also check that the registry contains an item.
-
-When getting an item you may use either the index of the item or the
-string-based "name". For example:
-
- registry = Registry()
- registry.register(SomeItem(), 'itemname', 20)
- # Get the item by index
- item = registry[0]
- # Get the item by name
- item = registry['itemname']
-
-When checking that the registry contains an item, you may use either the
-string-based "name", or a reference to the actual item. For example:
-
- someitem = SomeItem()
- registry.register(someitem, 'itemname', 20)
- # Contains the name
- assert 'itemname' in registry
- # Contains the item instance
- assert someitem in registry
-
-`markdown.util.Registry` has the following methods:
-
-#### `Registry.register(self, item, name, priority)` {: #registry.register }
-
-: Add an item to the registry with the given name and priority.
-
- Parameters:
-
- * `item`: The item being registered.
- * `name`: A string used to reference the item.
- * `priority`: An integer or float used to sort against all items.
-
- If an item is registered with a "name" which already exists, the existing
- item is replaced with the new item. Tread carefully as the old item is lost
- with no way to recover it. The new item will be sorted according to its
- priority and will **not** retain the position of the old item.
-
-#### `Registry.deregister(self, name, strict=True)` {: #registry.deregister }
-
-: Remove an item from the registry.
-
- Set `strict=False` to fail silently.
-
-#### `Registry.get_index_for_name(self, name)` {: #registry.get_index_for_name }
-
-: Return the index of the given `name`.
-
### registerExtension {: #registerextension }
-Some extensions may need to have their state reset between multiple runs of the
-Markdown class. For example, consider the following use of the [Footnotes][]
-extension:
+Some extensions may need to have their state reset between multiple runs of the `markdown.Markdown` class. For
+example, consider the following use of the [Footnotes][] extension:
```python
md = markdown.Markdown(extensions=['footnotes'])
html2 = md.convert(text_without_footnote)
```
-Without calling `reset`, the footnote definitions from the first document will
-be inserted into the second document as they are still stored within the class
-instance. Therefore the `Extension` class needs to define a `reset` method
-that will reset the state of the extension (i.e.: `self.footnotes = {}`).
-However, as many extensions do not have a need for `reset`, `reset` is only
-called on extensions that are registered.
+Without calling `reset`, the footnote definitions from the first document will be inserted into the second document as
+they are still stored within the class instance. Therefore the `Extension` class needs to define a `reset` method that
+will reset the state of the extension (i.e.: `self.footnotes = {}`). However, as many extensions do not have a need
+for `reset`, `reset` is only called on extensions that are registered.
-To register an extension, call `md.registerExtension` from within your
-`extendMarkdown` method:
+To register an extension, call `md.registerExtension` from within your `extendMarkdown` method:
```python
def extendMarkdown(self, md):
# insert processors and patterns here
```
-Then, each time `reset` is called on the Markdown instance, the `reset`
-method of each registered extension will be called as well. You should also
-note that `reset` will be called on each registered extension after it is
-initialized the first time. Keep that in mind when over-riding the extension's
-`reset` method.
+Then, each time `reset` is called on the `markdown.Markdown` instance, the `reset` method of each registered extension
+will be called as well. You should also note that `reset` will be called on each registered extension after it is
+initialized the first time. Keep that in mind when over-riding the extension's `reset` method.
### Configuration Settings {: #configsettings }
-If an extension uses any parameters that the user may want to change,
-those parameters should be stored in `self.config` of your
-`markdown.extensions.Extension` class in the following format:
+If an extension uses any parameters that the user may want to change, those parameters should be stored in
+`self.config` of your `markdown.extensions.Extension` class in the following format:
```python
class MyExtension(markdown.extensions.Extension):
def __init__(self, **kwargs):
- self.config = {'option1' : ['value1', 'description1'],
- 'option2' : ['value2', 'description2'] }
+ self.config = {
+ 'option1' : ['value1', 'description1'],
+ 'option2' : ['value2', 'description2']
+ }
super(MyExtension, self).__init__(**kwargs)
```
-When implemented this way the configuration parameters can be over-ridden at
-run time (thus the call to `super`). For example:
+When implemented this way the configuration parameters can be over-ridden at run time (thus the call to `super`). For
+example:
```python
markdown.Markdown(extensions=[MyExtension(option1='other value')])
```
-Note that if a keyword is passed in that is not already defined in
-`self.config`, then a `KeyError` is raised.
+Note that if a keyword is passed in that is not already defined in `self.config`, then a `KeyError` is raised.
-The `markdown.extensions.Extension` class and its subclasses have the
-following methods available to assist in working with configuration settings:
+The `markdown.extensions.Extension` class and its subclasses have the following methods available to assist in working
+with configuration settings:
* **`getConfig(key [, default])`**:
- Returns the stored value for the given `key` or `default` if the `key`
- does not exist. If not set, `default` returns an empty string.
+ Returns the stored value for the given `key` or `default` if the `key` does not exist. If not set, `default`
+ returns an empty string.
* **`getConfigs()`**:
* **`setConfig(key, value)`**:
- Sets a configuration setting for `key` with the given `value`. If `key` is
- unknown, a `KeyError` is raised. If the previous value of `key` was
- a Boolean value, then `value` is converted to a Boolean value. If
- the previous value of `key` is `None`, then `value` is converted to
- a Boolean value except when it is `None`. No conversion takes place
- when the previous value of `key` is a string.
+ Sets a configuration setting for `key` with the given `value`. If `key` is unknown, a `KeyError` is raised. If the
+ previous value of `key` was a Boolean value, then `value` is converted to a Boolean value. If the previous value
+ of `key` is `None`, then `value` is converted to a Boolean value except when it is `None`. No conversion takes
+ place when the previous value of `key` is a string.
* **`setConfigs(items)`**:
### Naming an Extension { #naming_an_extension }
-As noted in the [library reference] an instance of an extension can be passed
-directly to Markdown. In fact, this is the preferred way to use third-party
-extensions.
+As noted in the [library reference] an instance of an extension can be passed directly to `markdown.Markdown`. In
+fact, this is the preferred way to use third-party extensions.
For example:
md = markdown.Markdown(extensions=[MyExtension(option='value')])
```
-However, Markdown also accepts "named" third party extensions for those
-occasions when it is impractical to import an extension directly (from the
-command line or from within templates). A "name" can either be a registered
-[entry point](#entry_point) or a string using Python's [dot
-notation](#dot_notation).
+However, Markdown also accepts "named" third party extensions for those occasions when it is impractical to import an
+extension directly (from the command line or from within templates). A "name" can either be a registered [entry
+point](#entry_point) or a string using Python's [dot notation](#dot_notation).
#### Entry Point { #entry_point }
-[Entry points] are defined in a Python package's `setup.py` script. The script
-must use [setuptools] to support entry points. Python-Markdown extensions must
-be assigned to the `markdown.extensions` group. An entry point definition might
-look like this:
+[Entry points] are defined in a Python package's `setup.py` script. The script must use [setuptools] to support entry
+points. Python-Markdown extensions must be assigned to the `markdown.extensions` group. An entry point definition
+might look like this:
```python
from setuptools import setup
)
```
-After a user installs your extension using the above script, they could then
-call the extension using the `myextension` string name like this:
+After a user installs your extension using the above script, they could then call the extension using the
+`myextension` string name like this:
```python
markdown.markdown(text, extensions=['myextension'])
```
-Note that if two or more entry points within the same group are assigned the
-same name, Python-Markdown will only ever use the first one found and ignore all
-others. Therefore, be sure to give your extension a unique name.
+Note that if two or more entry points within the same group are assigned the same name, Python-Markdown will only ever
+use the first one found and ignore all others. Therefore, be sure to give your extension a unique name.
-For more information on writing `setup.py` scripts, see the Python documentation
-on [Packaging and Distributing Projects].
+For more information on writing `setup.py` scripts, see the Python documentation on [Packaging and Distributing
+Projects].
#### Dot Notation { #dot_notation }
-If an extension does not have a registered entry point, Python's dot notation
-may be used instead. The extension must be installed as a Python module on your
-PYTHONPATH. Generally, a class should be specified in the name. The class must
+If an extension does not have a registered entry point, Python's dot notation may be used instead. The extension must
+be installed as a Python module on your PYTHONPATH. Generally, a class should be specified in the name. The class must
be at the end of the name and be separated by a colon from the module.
Therefore, if you were to import the class like this:
markdown.markdown(text, extensions=['path.to.module:MyExtension'])
```
-You do not need to do anything special to support this feature. As long as your
-extension class is able to be imported, a user can include it with the above
-syntax.
+You do not need to do anything special to support this feature. As long as your extension class is able to be
+imported, a user can include it with the above syntax.
-The above two methods are especially useful if you need to implement a large
-number of extensions with more than one residing in a module. However, if you do
-not want to require that your users include the class name in their string, you
-must define only one extension per module and that module must contain a
-module-level function called `makeExtension` that accepts `**kwargs` and returns
-an extension instance.
+The above two methods are especially useful if you need to implement a large number of extensions with more than one
+residing in a module. However, if you do not want to require that your users include the class name in their string,
+you must define only one extension per module and that module must contain a module-level function called
+`makeExtension` that accepts `**kwargs` and returns an extension instance.
For example:
return MyExtension(**kwargs)
```
-When Markdown is passed the "name" of your extension as a dot notation string
-that does not include a class (for example `path.to.module`), it will import the
-module and call the `makeExtension` function to initiate your extension.
+When `markdown.Markdown` is passed the "name" of your extension as a dot notation string that does not include a class
+(for example `path.to.module`), it will import the module and call the `makeExtension` function to initiate your
+extension.
+
+## Registries
+
+The `markdown.util.Registry` class is a priority sorted registry which Markdown uses internally to determine the
+processing order of its various processors and patterns.
+
+A `Registry` instance provides two public methods to alter the data of the registry: `register` and `deregister`. Use
+`register` to add items and `deregister` to remove items. See each method for specifics.
+
+When registering an item, a "name" and a "priority" must be provided. All items are automatically sorted by the value
+of the "priority" parameter such that the item with the highest value will be processed first. The "name" is used to
+remove (`deregister`) and get items.
+
+A `Registry` instance is like a list (which maintains order) when reading data. You may iterate over the items, get an
+item and get a count (length) of all items. You may also check that the registry contains an item.
+
+When getting an item you may use either the index of the item or the string-based "name". For example:
+
+```python
+registry = Registry()
+registry.register(SomeItem(), 'itemname', 20)
+# Get the item by index
+item = registry[0]
+# Get the item by name
+item = registry['itemname']
+```
+
+When checking that the registry contains an item, you may use either the string-based "name", or a reference to the
+actual item. For example:
+
+```python
+someitem = SomeItem()
+registry.register(someitem, 'itemname', 20)
+# Contains the name
+assert 'itemname' in registry
+# Contains the item instance
+assert someitem in registry
+```
+
+`markdown.util.Registry` has the following methods:
+
+### `Registry.register(self, item, name, priority)` {: #registry.register data-toc-label='Registry.register'}
+
+: Add an item to the registry with the given name and priority.
+
+ Parameters:
+
+ * `item`: The item being registered.
+ * `name`: A string used to reference the item.
+ * `priority`: An integer or float used to sort against all items.
+
+ If an item is registered with a "name" which already exists, the existing item is replaced with the new item.
+ Tread carefully as the old item is lost with no way to recover it. The new item will be sorted according to its
+ priority and will **not** retain the position of the old item.
+
+### `Registry.deregister(self, name, strict=True)` {: #registry.deregister data-toc-label='Registry.deregister'}
+
+: Remove an item from the registry.
+
+ Set `strict=False` to fail silently.
+
+### `Registry.get_index_for_name(self, name)` {: #registry.get_index_for_name data-toc-label='Registry.get_index_for_name'}
+
+: Return the index of the given `name`.
-[Preprocessors]: #preprocessors
-[Inline Patterns]: #inlinepatterns
-[Treeprocessors]: #treeprocessors
-[Postprocessors]: #postprocessors
-[BlockParser]: #blockparser
+[match object]: https://docs.python.org/3/library/re.html#match-objects
+[bug tracker]: https://github.com/Python-Markdown/markdown/issues
+[extension source]: https://github.com/Python-Markdown/markdown/tree/master/markdown/extensions
+[tutorial]: https://github.com/Python-Markdown/markdown/wiki/Tutorial:-Writing-Extensions-for-Python-Markdown
[workingwithetree]: #working_with_et
[Integrating your code into Markdown]: #integrating_into_markdown
[extendMarkdown]: #extendmarkdown
[makeExtension]: #makeextension
[ElementTree]: https://effbot.org/zone/element-index.htm
[Available Extensions]: index.md
-[Footnotes]: https://github.com/Python-Markdown/mdx_footnotes
-[Definition Lists]: https://github.com/Python-Markdown/mdx_definition_lists
+[Footnotes]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py
+[Definition Lists]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/definition_lists
[library reference]: ../reference.md
[setuptools]: https://packaging.python.org/key_projects/#setuptools
[Entry points]: https://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins