HTML Character Entity References and Unicode

Using Unicode, it is theoretically possible to display more than 65,000 different characters on a web page. This is enough to display all written languages on the planet. But there are two challenges for the webmaster: (1) What is the correct "character reference" or "entity reference" to use to display the character; and (2) Will the user's computer be able to display that character.

This page will help you select the proper character reference as either a decimal (base 10) integer or in hexadecimal (base 16) format. In some cases a character will also have an entity reference (for example, " " for a non-breaking space). The output below will also display what the character will look like in two of the font sets that are common on Windows computers (Microsoft Sans-Serif and Tahoma) and in the HTML default value of "sans-serif." You can also see all the characters for various Unicode named character sets (such as Bengali, Cyrillic or Tamil). *

Use the forms on the right to select a range of Unicode characters to see in the table below. You can select by decimal (base 10) or hexadecimal (base 16) code, or by the name of the character set.

You can also enter a character in the "Find Character Code" form and see what the code will be.

Select a Unicode Character Set: *

Original Table

Character Reference		Entity Reference	Display Fonts †			Unicode Character Description Submit
Decimal (Base 10)	Hexadecimal (Base 16)	Entity Reference	M	T	D	Unicode Character Description Submit

* The following character sets are not included because they are so large: CJK (Chinese, Japanese, Korean) Unified Ideographs (more than 20,000 characters); CJK Unified Ideographs Extension A (more than 6,500 characters); and Hangul (Korean) Syllables (more than 11,000 characters). Also, omitted from the list is the "Private Use Area" (space for 6,400 characters) because it contains no standard characters and the High and Low "Surrogates", which are not displayable characters. Finally, only character sets in the "Basic" range of Unicode characters, 0 through 65,535 (FFFF) are listed. Others, such as "Linear B Ideograms" or "Byzantine Musical Symbols", in the "Extended" range from 65,664 (10000) to 1,114,111 (10FFFF) are not listed.

† M: Microsoft Sans-Serif; T: Tahoma; D: Default Sans-Serif. As you will notice, not all font sets have the same characters in them. Also, if you do not have the listed font sets on your computer, your results will not be reliable. Firefox seems to display a broader range of fonts than does Internet Explorer.

For more information:

World Wide Consortium (W3C): The group that sets the standards for the web.
W3C Character Entity References: The official standard for character entity references in HTML 4.
Evolt.org: A simple character entity chart with references to the most commonly used (western) characters. The website also has lots of helpful information for web designers.
Unicode.org: More than you could ever want to know about the surprisingly complex subect of displaying characters on a computer screen.