Why HTML entities exist
HTML uses < and > as tag delimiters, & as the start of entity references, and " inside attribute values. If your content contains these characters, the parser will misinterpret them as markup.
The solution is to replace them with their entity equivalents before inserting into HTML:
| Character | Named entity | Numeric entity | Why |
|---|---|---|---|
| & | & | & | Starts an entity reference |
| < | < | < | Opens a tag |
| > | > | > | Closes a tag |
| " | " | " | Ends an attribute value |
| ' | ' | ' | Single-quote in attributes |
Encoding and XSS prevention
Failing to encode user-supplied content before inserting it into HTML is the root cause of Cross-Site Scripting (XSS) — one of the most prevalent web vulnerabilities. If a user submits <script>alert(1)</script> and you render it verbatim, the browser executes it.
Encoding < to < and > to > turns executable markup into inert text that displays correctly but never runs.
Context matters. HTML entity encoding protects you in HTML context (between tags and in attribute values). It does not protect you in:
- JavaScript context (
<script>tags) — requires JavaScript string escaping - CSS context (
<style>tags) — requires CSS escaping - URL attribute context (
href,src) — requires URL encoding
Named vs numeric entities
Named entities like © or — are human-readable and covered by the HTML5 specification. There are over 2,000 named entities.
Numeric entities come in two forms:
- Decimal:
©for © - Hex:
©for © (note thexprefix)
Numeric entities cover every Unicode code point. They are more portable since they work in XHTML and XML without an entity declaration, while named entities outside the HTML-reserved set (&, <, >, ", ') require a DOCTYPE or XML entity declaration.
Non-breaking space ( )
is the most misunderstood entity. It looks like a space but has two special behaviours:
- The browser will not wrap a line at an
position. - It does not collapse — consecutive
entities create multiple spaces, unlike regular spaces which the HTML parser collapses to one.
Good uses: 25 °C, Dr. Smith, phone number groups. Bad use: as an indent or paragraph spacer (use CSS for that).
Typography entities
HTML ships rich typographic entities that save you from inserting literal Unicode characters:
—(—) em dash — for parenthetical interruptions — like this.–(–) en dash — for ranges: 2020–2024.…(…) horizontal ellipsis — better than three separate dots.“”(" ") curly double quotes.‘’(' ') curly single quotes.
Most modern frameworks handle these automatically via smart-quotes processing, but when hand-authoring HTML it is worth knowing the entity names.