6 Things Markdown Got Wrong

swyx 2020-03-22

John Gruber's Markdown is almost a perfect content authoring format. Here are 6 things it got wrong (in my opinion, of course).

This post is an expanded tweet and you may enjoy the related HN discussion.

1. Lazy List Numbering

In Gruber's original spec, this was the way he explained it:

If you instead wrote the list in Markdown like this:
1.  Bird
1.  McHale
1.  Parish
or even:
3. Bird
1. McHale
8. Parish
you’d get the exact same HTML output. The point is, if you want to, you can use ordinal numbers in your ordered Markdown lists, so that the numbers in your source match the numbers in your published HTML. But if you want to be lazy, you don’t have to.

This laziness is good for, well, the lazy. I know some people love this feature. But we read markdown more than we write it - even as the people writing it! It is a design philosophy of Markdown. In practice, more people will reorder their numbered lists just to make it look right in markdown (either manually, or with tooling).

I can understand why Gruber went for it: HTML itself is pretty lazy. Normal li elements don't care about what order you leave them in:

<ol>
    <li>Bird</li>
    <li>McHale</li>
    <li>Parish</li>
</ol>

But given that in Markdown we're already typing numbers, having numbers be order independent kind of loses the meaning of using numbers at all.

Well, ok. Let's say order matters in Markdown. Does that give us any value in HTML?

Actually, yes:

<ol>
    <li value="1">Bird</li>
    <li value="2">McHale</li>
    <li value="3">Parish</li>
</ol>

And if the author were to need out-of-order numbers for whatever reason, that would be supported out of the box as well. Pretty sweet!

As a further thought exercise, we could support lazy numbering with some other character than a 0-9 number:

300. Bird
100. McHale
*. Parish - this would show as "101"

This would be even easier to remember had Gruber not chosen to allow * for unordered lists instead of the pretty-much-universally-used - (and the never-used +) - but it's common for lexers to recognize sets of two characters (*.) over one (*).

2. Code Blocks

Code Blocks probably aren't necessary, and overlap with 2 important commonly used (but not headline) Markdown syntax features.

Just to be clear what we're talking about - Code Blocks are how you indicate code (with 4 spaces), whereas Code Fences are the thing more people probably use today (with triple backticks).

Here's how Gruber explained them in the spec:

Pre-formatted code blocks are used for writing about programming or markup source code. Rather than forming normal paragraphs, the lines of a code block are interpreted literally. Markdown wraps a code block in both <pre> and <code> tags.

To produce a code block in Markdown, simply indent every line of the block by at least 4 spaces or 1 tab. For example, given this input:
This is a normal paragraph:

    This is a code block.
Markdown will generate:
<p>This is a normal paragraph:</p>

<pre><code>This is a code block.
</code></pre>

It's maaaybe possible to see why Gruber picked 4 spaces/1 tab (let's not have THAT debate) to indicate a codeblock. It's convenient to toggle what is or is not a code block just by tabbing content in and out.

But of course, code is code, and words are words - we don't just switch them back and forth willy nilly, they're drastically differnt types of content.

The problem with Code Blocks is twofold. First, they weren't designed with developer tooling in mind. Code fences have room to let you indicate the language of the code content, and have even been extended to offer other metadata:

```js
alert('this is definitely javascript!');
```js

Adding code titles and line highlighting was probably a step too far for Gruber's goals, but the core idea is that code fences are more extensible than code blocks - and, it turns out - useful for readers of the unprocessed markdown too! This blogpost you just read would be unreadable - and therefore unwritable - without code fences.

Code Blocks have another related, smaller problem - they coincide with hanging list indents. Here's how Gruber described list indents in the spec:

To make lists look nice, you can wrap items with hanging indents:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus. Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus.

Donec sit amet nisl. Aliquam semper ipsum sit amet velit. Suspendisse id sem consectetuer libero luctus adipiscing.

and

List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab:

This is a list item with two paragraphs. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam hendrerit mi posuere lectus.

Vestibulum enim wisi, viverra nec, fringilla in, laoreet vitae, risus. Donec sit amet nisl. Aliquam semper ipsum sit amet velit.

Suspendisse id sem consectetuer libero luctus adipiscing.

Nice - and extremely handy - except it is now pretty confusing whether or not a code block is at the top level or inside a list item.

If you reserve indents for cosmetic and list item membership purposes then you have no such ambiguity. Code fences (or maybe more accurately, the backtick `) tell you what is or isn't code. That's it.

Of course, today Code Fences, with language tags for syntax highlighting are in very widespread usage. I'm not sure who invented them, but I definitely know that it is widely used because GitHub Flavored Markdown made it so and I am grateful for that.

3. Markdown in HTML in Markdown

A genius decision by Gruber was to have Markdown be a superset of HTML. He called it Inline HTML - but you can remember it as HTML-in-Markdown:

For example, to add an HTML table to a Markdown article:

This is a regular paragraph.
<table>
    <tr>
       <td>Foo</td>
    </tr>
</table>
This is another regular paragraph.

This is pretty handy for the many cases where Markdown just doesn't give you enough control (more on that later). But sometimes you just want to define some wrapper components, and then use Markdown for the rest:

<div class="myContainerClass">

    ## My Request
    
    I want to use *Markdown* here, not HTML!

</div>

In other words, we want Markdown in HTML in Markdown.

Once again, GitHub Flavored Markdown to the rescue. This is EXTREMELY useful for the one stateful UI element offered to you by GitHub - the details/summary tag:


## Docs for my new startup

This is my startup! Short and sweet pitch here.

<details>
    <summary>If you'd like to read our in-depth vision, click here</summary>

    ## Our vision

    Alpha ecosystem user experience. Hackathon incubator business-to-consumer assets focus. Termsheet stealth first mover advantage scrum project client long tail business-to-business user experience entrepreneur backing product management rockstar venture. Business-to-business analytics market disruptive crowdsource creative paradigm shift infographic metrics network effects lean startup accelerator stealth customer. Vesting period growth hacking partnership scrum project validation interaction design handshake sales assets business-to-consumer. Ownership early adopters graphical user interface funding influencer A/B testing user experience. Return on investment infographic investor partnership growth hacking business plan user experience deployment churn rate assets first mover advantage buyer startup. Non-disclosure agreement investor sales angel investor. Mass market creative angel investor freemium network effects investor business-to-consumer supply chain bootstrapping twitter hypotheses early adopters. Traction MVP paradigm shift series A financing virality market alpha.

</details>

John Gruber actually saw this criticism and replied:

markdown=“1” as a tag attribute is the way to solve #4 properly. But there’s a question of how you should handle block vs span elements.

So this is a solution I wasn't aware of, but it seems it is a nonstandard attribute so it is not in wide use.

The next 2 points are common usecases that necessitate "de-opting" from Markdown, causing the need for Inline HTML in the first place - if we just had these 2 things we would be able to stay in Markdown a lot longer.

4. No Syntax for Adding Classes

There are only 2 mentions of classes in the entire Markdown spec. That's probably not a surprise - as it says:

The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.

Ultimately, I think, Markdown cannot be fully orthogonal to HTML, and it doesn't really try to be. It isn't a design goal of Markdown to let you do everything you can do with HTML, but still it is essentially a language that compiles down to HTML.

Additionally, normally everything Markdown does is visible to the end user - it doesn't even have comments! Yet, classes are a weird feature of HTML. Are they visible to the end user? By default, no. But by all practical purposes they are the primary hooks CSS uses to modify how things appear to users.

Using wrapper divs is often a great way to manage layout and spacing and other important design considerations. As discussed above, because there is no Markdown-in-HTML-in-Markdown, any child content of a wrapper div then has to become HTML too, and you lose the benefit of using Markdown at all.

Unfortunately my case is diminished greatly by the lack of constructive suggestions to do this well. Axel Rauschmayer chimed in to offer Pandoc's format - as a Markdown flavor, I'd wish for something that degrades gracefully to common Markdown viewers, like code fences do. If Gruber had designed this into Markdown from the get-go, whatever the syntax, we would not have this problem or this objection.

5. No ID's in Headers

This is important for URL accessibility. Here we have a normal markdown header:

# My Title

and this compiles to:

<h1>My Title</h1>

but do the same in GitHub, and this results:

<h1><a id="user-content-mytitle" class="anchor" aria-hidden="true" href="#mytitle"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z"></path></svg></a>mytitle</h1>

This lets GitHub offer its nice hover effect to remind you you can copy anchor links, but you don't really need an <a> tag for an anchor link - you could just compile to:

<h1 id="my-title">My Title</h1>

And that would work.

There's nothing I hate more than reading a good article with headers, wanting to link someone directly to the section relevant to them, and then not having an id to use to send them straight there.

💁‍♂️ Side note: I LOVE the Display #Anchors chrome extension for toggling this on and off!

Of course, offering ID's for headers would also mean taking a stance on how to slug-case content. How would you slug case this?

# Something: with Punctuation? (Part II - Dealing with Emojis 🤯)

There are also quibbles about case sensitivity and hyphens vs underscores. But I don't really care - pick one, let me link directly to content!!

Of course, GitHub Flavored Markdown deals with this with GitHub's postprocessing and you can add this feature in widely used Markdown processors. But it would have been nice to be in the spec.

6. No Option for Metadata

This is related to the "no way to specify non-visible things" complaint above, but on a whole-document level instead of a per-element level. Sometimes you want to offer Markdown processors some extra data about how this specific Markdown document should be handled differently than others, or just specify other non-HTML data for use by other toolchains like a blogging engine (e.g. specify what layout format to use, or what categories the blogpost belongs to, and so on).

Instead of having to specify this data in a separate file, it'd be nice to colocate this metadata alongside the content.

There's a reason this probably wasn't offered by Gruber - it involves another language that isn't Markdown and isn't HTML. I don't know this for a fact but it seems like Jekyll was the first to introduce Front Matter as the widely accepted Markdown metadata format - and it uses YAML, which predates Markdown, but only by 3 years.

YAML itself has plenty of vocal detractors, so it isn't perfect. But something for Metadata would have been really, really nice.

Other small gripes

I wish Markdown had a distinction between an <aside> and a <blockquote> as I use both often.

Conclusion

Of course, Gruber got everything else right, which is why we're even talking about it today. I don't even want to know all the competing alternatives present at the time and have to pick between them. Sometimes, Worse is Better.

And in the end, Markdown was successful enough. I am writing this blogpost in Markdown, and since I write every day, that means I write Markdown every day. I also build tools that consume Markdown, and live primarily on the web, and thus I feel the painpoints more intimately than others might.

6 Things Markdown Got Wrong

1. Lazy List Numbering

2. Code Blocks

3. Markdown in HTML in Markdown

4. No Syntax for Adding Classes

5. No ID's in Headers

6. No Option for Metadata

Other small gripes

Conclusion

Further Reading

Latest posts

6 Things Markdown Got Wrong

1. Lazy List Numbering

2. Code Blocks

3. Markdown in HTML in Markdown

4. No Syntax for Adding Classes

5. No ID's in Headers

6. No Option for Metadata

Other small gripes

Conclusion

Further Reading

Subscribe to the newsletter

Latest posts