Why format HTML?
Raw HTML from a CMS, a scraper, a template engine, or a minified production bundle can be one long line that's impossible to read or debug. Formatting it gives you the indented tree structure your eyes expect, making it easy to see nesting errors, find specific elements, and understand the document structure.
Minification does the opposite: it removes all formatting (whitespace, newlines, and comments) to produce the smallest possible file. For production serving, a minified HTML file combined with gzip compression can reduce transfer size by 60–80%.
Void elements: no closing tag
HTML has a set of void elements that never have content and therefore never need a closing tag: <br>, <hr>, <img>, <input>, <link>, <meta>, <area>, <base>, <col>, <embed>, <param>, <source>, <track>, and <wbr>. The formatter knows this list and does not increase the indent depth after these tags.
In XHTML (which follows XML rules), void elements must be self-closed: <br />. In HTML5, the slash is optional and ignored. The formatter preserves whatever form the input uses.
Preserve tags: pre, code, script, style, textarea
The formatter does not reformat content inside <pre>, <code>, <textarea>, <script>, or <style> elements. Whitespace inside these elements is significant:
<pre>and<code>display whitespace literally. Reformatting would visibly change the rendered output.<script>contains JavaScript where indentation in string literals matters.<style>contains CSS where the formatter shouldn't interfere.<textarea>shows its content as user-editable text; reformatting would add unwanted leading/trailing whitespace.
What minification removes
HTML minification strips:
- All standard comments (
<!-- ... -->). IE conditional comments (<!--[if IE]...) are preserved. - Leading and trailing whitespace in text nodes.
- Consecutive whitespace sequences collapsed to a single space.
- Whitespace between tags (
> <→><).
What minification does not do (to stay safe):
- Remove attribute quotes (some parsers require them).
- Collapse boolean attributes (
disabled="disabled"is left as-is). - Remove optional closing tags (
</li>,</p>), which would require a full HTML5 parser.
HTML formatting vs. a real HTML parser
The formatter uses a regex tokenizer and is deliberately simple. It handles the vast majority of real-world HTML correctly, but it is not a full HTML5 parser. Edge cases include:
- Deeply nested structures where indentation gets out of sync due to optional closing tags.
<svg>and<math>elements (which have their own namespace rules).- Template syntax like
{{mustache}}or{% block %}inside attributes.
For production-grade formatting, tools like Prettier with the HTML plugin use a full parse tree and produce byte-for-byte identical-semantics output. For inspection, debugging, and quick cleanup, the formatter here handles the common case well.
HTTP compression amplifies minification
A minified 20 KB HTML file will compress to roughly 5–6 KB with gzip or Brotli. An equivalent formatted 30 KB file compresses to 6–7 KB. The absolute savings from minification are amplified by HTTP compression, but minification reduces baseline transfer size for clients without compression support (rare) and reduces memory usage during HTML parsing on the client.