ASCII

Computers can only store information as numbers. So, in order to store a text file, there must be an encoding scheme to represent each character as a number. ASCII was the first widely used text enconding scheme for this purpose. The ASCII code for each character is listed below.

ASCII CodeCharacterEscape
00x00NUL (Null)\0
10x01SOH (Start of Header)
20x02STX (Start of Text)
30x03ETX (End of Text)
40x04EOT (End of Transmission)
50x05ENQ (Enquiry)
60x06ACK (Acknowledgment)
70x07BEL (Bell)\a
80x08BS (Backspace)\b
90x09HT (Horizontal Tab)\t
100x0ALF (Line Feed)\n
110x0BVT (Vertical Tab)\v
120x0CFF (Form Feed)\f
130x0DCR (Carriage Return)\r
140x0ESO (Shift Out)
150x0FSI (Shift In)
160x10DLE (Data Link Escape)
170x11DC1 (Device Control 1)
180x12DC2 (Device Control 2)
190x13DC3 (Device Control 3)
200x14DC4 (Device Control 4)
210x15NAK (Negative Acknowledgment)
220x16SYN (Synchronous Idle)
230x17ETB (End of Transmission Block)
240x18CAN (Cancel)
250x19EM (End of Medium)
260x1ASUB (Substitute)
270x1BESC (Escape)\e
280x1CFS (File Separator)
290x1DGS (Group Separator)
300x1ERS (Record Separator)
310x1FUS (Unit Separator)
ASCII CodeCharacterEscape
320x20(Space)
330x21!
340x22"\"
350x23#
360x24$
370x25%
380x26&
390x27'\'
400x28(
410x29)
420x2A*
430x2B+
440x2C,
450x2D-
460x2E.
470x2F/
480x300
490x311
500x322
510x333
520x344
530x355
540x366
550x377
560x388
570x399
580x3A:
590x3B;
600x3C<
610x3D=
620x3E>
630x3F?
ASCII CodeCharacterEscape
640x40@
650x41A
660x42B
670x43C
680x44D
690x45E
700x46F
710x47G
720x48H
730x49I
740x4AJ
750x4BK
760x4CL
770x4DM
780x4EN
790x4FO
800x50P
810x51Q
820x52R
830x53S
840x54T
850x55U
860x56V
870x57W
880x58X
890x59Y
900x5AZ
910x5B[
920x5C\\\
930x5D]
940x5E^
950x5F_
ASCII CodeCharacter
960x60`
970x61a
980x62b
990x63c
1000x64d
1010x65e
1020x66f
1030x67g
1040x68h
1050x69i
1060x6Aj
1070x6Bk
1080x6Cl
1090x6Dm
1100x6En
1110x6Fo
1120x70p
1130x71q
1140x72r
1150x73s
1160x74t
1170x75u
1180x76v
1190x77w
1200x78x
1210x79y
1220x7Az
1230x7B{
1240x7C|
1250x7D}
1260x7E~
1270x7FDEL (Delete)

The grayed-out characters above are of historical significance, but are rarely used today.

Escape Characters

Escape characters are not part of ASCII, but they have been included on this page since they are commonly used in source code to represent ASCII characters that do not have a printable symbol. For example, '\n' represents the newline character (ASCII code 10), '\t' represents the horizontal tab character (ASCII code 9), etc. Any character can also be written using an escape sequence of the form "\x00" where the numbers are the hexadecimal ASCII code of the desired character. For example, '\x68' is equivalent to 'h'.

Note: The escape characters in the table above are mostly universal, but may differ slightly among programming languages.

Note: The escape characters described here are unrelated to the ANSI escape sequences described here.

Design Notes

Capitalization

The ASCII codes assigned to the upper and lower case letters were deliberately designed to make changing the capitalization of letters effecient using only an if statement and XORing a single bit.

void toUppercase(char c) { if (97 <= c && c <= 122) return c ^ 0x20; else return c; } void toLowercase(char c) { if (65 <= c && c <= 90) return c ^ 0x20; else return c; }

Numeric Conversions

The in-order placement of the digits allows for easy conversions between their ASCII codes and numeric values.

int asciiToInt(char c) { return c - 48; } char intToAscii(int i) { return i + 48; } int stringToInt(char * str) { int num = 0; int length = strlen(str); for (int i = 0; i < length; i++) num = num * 10 + asciiToInt(str[i]); return num; }

An intToString() function can be written in a similar manner, but requires slightly more complexity to manage the memory for the created string.

Other Languages

ASCII works great when you are working with English text. However, it doesn't contain the symbols required by most of other languages (e.g. ä ñ ø Ж ش 한 ऊ). Since ASCII contains only 128 characters and a byte can represent 256 different values, one solution is to let the values from 128 to 255 represent the missing characters. This method is called Extended ASCII and it was the initial solution used in many countries. However, it does not work for languages with a large number of characters (e.g. Chinese) or all languages simultaneously. This problem was solved with the creation of Unicode and UTF-8, which are now the worldwide standard for text encoding.

Special Characters

Carriage Returns \r & Newlines \n

When the enter key is pressed in a text document, your cursor will move to the far left of the screen and move down to the next line. This behavior is the same on both Windows and Unix based (e.g. Macs and Linux) operating systems. However, this behavior is implemented in different ways.

On Windows, when the enter key is pressed, a carriage return (ASCII code 13, escape sequence '\r') and a line feed (ASCII code 10, escape code '\n') are inserted into the document. These two characters move the cursor to the left of the screen and move it down the next line, respectively.

On Unix based operating systems, when the enter key is pressed, only a line feed is inserted into the document. However, when the line feed is interpreted, the cursor is both moved all the way to the left and down one line. Carriage returns are rarely used.

This creates compatability issues between Windows and other operating systems. Some text editors allow users to explictly choose between using \r\n and \n for newlines.

Note: Web browsers may handle carriage returns and line feeds differently then the underlying operating system. So the newline behavior of the visualization above may be different than on your OS.

Backspace \b

The backspace key on a keyboard generates the backspace character (ASCII code 8, escape sequence '\b'). Most programs immediately interpet this character by removing the previous character and moving the cursor once to the left.

Tab \t

The tab key on a keyboard generates the tab character (ASCII code 9, escape sequence '\t'). The width of the tab is not part of ASCII, but is instead defined by the program displaying the text.

Escape \e

The escape character (ASCII code 27, escape sequence '\e') is commonly used as the first letter in an ANSI escape sequence to affect the behavior of a terminal. This is how some programs change the color and boldness of their terminal output or move the location of the cursor. ANSI escape sequences are not part of ASCII, but they are described here since they are the most common use of ASCII code 27).

Try running the following command in a terminal:

echo -e "Hello \e[1m\e[32mworld\e[0m!"

The ANSI escape sequences described here are unrelated to the escape characters described here. Both use the concept of escaping, but they use different escape charaters (ASCII code 27 vs ASCII code 92), their behavior is described in different standards (The definition of each programming language vs ANSI), and they are interpreted in different locations (e.g. your compiler vs your terminal driver).

Null \0

See below

Other

The other special characters (shown in gray in the table) are rarely used. They were designed for controlling teletypes in the 1960's, but are no longer necessary.

C/C++ Notes

The Null Character

In C and many other languages, the null character (ASCII code 0, escape sequence '\0') is used to denote the end of a string. For example, the string "Hi" would be stored using three bytes in memory as 0x48 0x69 0x00. Often compilers, runtime environments, and standard library methods will automatically insert the null character at the end of a string. However, since this is not always the case in C or assembly languages, it a common source of bugs.

Chars as Numbers

In C and many other languages, when a character is written in single quotes, the compiler automatically replaces it with its ASCII code and treats the value as a number. For example, the return value of the following function is 201:

int add_characters() { return 'd' + 'e'; }