ASCII

Computers can only store information as numbers. So, in order to store a text file, there must be an encoding scheme to represent each character as a number. ASCII was the first widely used text enconding scheme for this purpose. The ASCII code for each character is listed below.

ASCII Code		Character	Escape
0	0x00	NUL (Null)	\0
1	0x01	SOH (Start of Header)
2	0x02	STX (Start of Text)
3	0x03	ETX (End of Text)
4	0x04	EOT (End of Transmission)
5	0x05	ENQ (Enquiry)
6	0x06	ACK (Acknowledgment)
7	0x07	BEL (Bell)	\a
8	0x08	BS (Backspace)	\b
9	0x09	HT (Horizontal Tab)	\t
10	0x0A	LF (Line Feed)	\n
11	0x0B	VT (Vertical Tab)	\v
12	0x0C	FF (Form Feed)	\f
13	0x0D	CR (Carriage Return)	\r
14	0x0E	SO (Shift Out)
15	0x0F	SI (Shift In)
16	0x10	DLE (Data Link Escape)
17	0x11	DC1 (Device Control 1)
18	0x12	DC2 (Device Control 2)
19	0x13	DC3 (Device Control 3)
20	0x14	DC4 (Device Control 4)
21	0x15	NAK (Negative Acknowledgment)
22	0x16	SYN (Synchronous Idle)
23	0x17	ETB (End of Transmission Block)
24	0x18	CAN (Cancel)
25	0x19	EM (End of Medium)
26	0x1A	SUB (Substitute)
27	0x1B	ESC (Escape)	\e
28	0x1C	FS (File Separator)
29	0x1D	GS (Group Separator)
30	0x1E	RS (Record Separator)
31	0x1F	US (Unit Separator)

ASCII Code		Character	Escape
32	0x20	(Space)
33	0x21	!
34	0x22	"	\"
35	0x23	#
36	0x24	$
37	0x25	%
38	0x26	&
39	0x27	'	\'
40	0x28	(
41	0x29	)
42	0x2A	*
43	0x2B	+
44	0x2C	,
45	0x2D	-
46	0x2E	.
47	0x2F	/
48	0x30	0
49	0x31	1
50	0x32	2
51	0x33	3
52	0x34	4
53	0x35	5
54	0x36	6
55	0x37	7
56	0x38	8
57	0x39	9
58	0x3A	:
59	0x3B	;
60	0x3C	<
61	0x3D	=
62	0x3E	>
63	0x3F	?

ASCII Code		Character	Escape
64	0x40	@
65	0x41	A
66	0x42	B
67	0x43	C
68	0x44	D
69	0x45	E
70	0x46	F
71	0x47	G
72	0x48	H
73	0x49	I
74	0x4A	J
75	0x4B	K
76	0x4C	L
77	0x4D	M
78	0x4E	N
79	0x4F	O
80	0x50	P
81	0x51	Q
82	0x52	R
83	0x53	S
84	0x54	T
85	0x55	U
86	0x56	V
87	0x57	W
88	0x58	X
89	0x59	Y
90	0x5A	Z
91	0x5B	[
92	0x5C	\	\\
93	0x5D	]
94	0x5E	^
95	0x5F	_

ASCII Code		Character
96	0x60	`
97	0x61	a
98	0x62	b
99	0x63	c
100	0x64	d
101	0x65	e
102	0x66	f
103	0x67	g
104	0x68	h
105	0x69	i
106	0x6A	j
107	0x6B	k
108	0x6C	l
109	0x6D	m
110	0x6E	n
111	0x6F	o
112	0x70	p
113	0x71	q
114	0x72	r
115	0x73	s
116	0x74	t
117	0x75	u
118	0x76	v
119	0x77	w
120	0x78	x
121	0x79	y
122	0x7A	z
123	0x7B	{
124	0x7C	\|
125	0x7D	}
126	0x7E	~
127	0x7F	DEL (Delete)

The grayed-out characters above are of historical significance, but are rarely used today.

Escape Characters

Escape characters are not part of ASCII, but they have been included on this page since they are commonly used in source code to represent ASCII characters that do not have a printable symbol. For example, '\n' represents the newline character (ASCII code 10), '\t' represents the horizontal tab character (ASCII code 9), etc. Any character can also be written using an escape sequence of the form "\x00" where the numbers are the hexadecimal ASCII code of the desired character. For example, '\x68' is equivalent to 'h'.

Note: The escape characters in the table above are mostly universal, but may differ slightly among programming languages.

Note: The escape characters described here are unrelated to the ANSI escape sequences described here.

Design Notes

Capitalization

The ASCII codes assigned to the upper and lower case letters were deliberately designed to make changing the capitalization of letters effecient using only an if statement and XORing a single bit.


void toUppercase(char c) {
	if (97 <= c && c <= 122)
		return c ^ 0x20;
	else
		return c;
}


void toLowercase(char c) {
	if (65 <= c && c <= 90)
		return c ^ 0x20;
	else
		return c;
}

Numeric Conversions

The in-order placement of the digits allows for easy conversions between their ASCII codes and numeric values.


int asciiToInt(char c) {
	return c - 48;
}

char intToAscii(int i) {
	return i + 48;
}


int stringToInt(char * str) {
	int num = 0;
	int length = strlen(str);
	for (int i = 0; i < length; i++)
		num = num * 10 + asciiToInt(str[i]);
	return num;
}

An intToString() function can be written in a similar manner, but requires slightly more complexity to manage the memory for the created string.

Other Languages

ASCII works great when you are working with English text. However, it doesn't contain the symbols required by most of other languages (e.g. ä ñ ø Ж ش 한 ऊ). Since ASCII contains only 128 characters and a byte can represent 256 different values, one solution is to let the values from 128 to 255 represent the missing characters. This method is called Extended ASCII and it was the initial solution used in many countries. However, it does not work for languages with a large number of characters (e.g. Chinese) or all languages simultaneously. This problem was solved with the creation of Unicode and UTF-8, which are now the worldwide standard for text encoding.

Special Characters

Carriage Returns \r & Newlines \n

When the enter key is pressed in a text document, your cursor will move to the far left of the screen and move down to the next line. This behavior is the same on both Windows and Unix based (e.g. Macs and Linux) operating systems. However, this behavior is implemented in different ways.

On Windows, when the enter key is pressed, a carriage return (ASCII code 13, escape sequence '\r') and a line feed (ASCII code 10, escape code '\n') are inserted into the document. These two characters move the cursor to the left of the screen and move it down the next line, respectively.

On Unix based operating systems, when the enter key is pressed, only a line feed is inserted into the document. However, when the line feed is interpreted, the cursor is both moved all the way to the left and down one line. Carriage returns are rarely used.

This creates compatability issues between Windows and other operating systems. Some text editors allow users to explictly choose between using \r\n and \n for newlines.

Note: Web browsers may handle carriage returns and line feeds differently then the underlying operating system. So the newline behavior of the visualization above may be different than on your OS.

Backspace \b

The backspace key on a keyboard generates the backspace character (ASCII code 8, escape sequence '\b'). Most programs immediately interpet this character by removing the previous character and moving the cursor once to the left.

Tab \t

The tab key on a keyboard generates the tab character (ASCII code 9, escape sequence '\t'). The width of the tab is not part of ASCII, but is instead defined by the program displaying the text.

Escape \e

The escape character (ASCII code 27, escape sequence '\e') is commonly used as the first letter in an ANSI escape sequence to affect the behavior of a terminal. This is how some programs change the color and boldness of their terminal output or move the location of the cursor. ANSI escape sequences are not part of ASCII, but they are described here since they are the most common use of ASCII code 27).

Try running the following command in a terminal:

echo -e "Hello \e[1m\e[32mworld\e[0m!"

The ANSI escape sequences described here are unrelated to the escape characters described here. Both use the concept of escaping, but they use different escape charaters (ASCII code 27 vs ASCII code 92), their behavior is described in different standards (The definition of each programming language vs ANSI), and they are interpreted in different locations (e.g. your compiler vs your terminal driver).

Null \0

See below

Other

The other special characters (shown in gray in the table) are rarely used. They were designed for controlling teletypes in the 1960's, but are no longer necessary.

C/C++ Notes

The Null Character

In C and many other languages, the null character (ASCII code 0, escape sequence '\0') is used to denote the end of a string. For example, the string "Hi" would be stored using three bytes in memory as 0x48 0x69 0x00. Often compilers, runtime environments, and standard library methods will automatically insert the null character at the end of a string. However, since this is not always the case in C or assembly languages, it a common source of bugs.

Chars as Numbers

In C and many other languages, when a character is written in single quotes, the compiler automatically replaces it with its ASCII code and treats the value as a number. For example, the return value of the following function is 201:


int add_characters() {
	return 'd' + 'e';
}