Computers can only store information as numbers. So, in order to store a text file, there must be an encoding scheme to represent each character as a number. ASCII was the first widely used text enconding scheme for this purpose. The ASCII code for each character is listed below.
| ASCII Code | Character | Escape | |
|---|---|---|---|
| 0 | 0x00 | NUL (Null) | \0 |
| 1 | 0x01 | SOH (Start of Header) | |
| 2 | 0x02 | STX (Start of Text) | |
| 3 | 0x03 | ETX (End of Text) | |
| 4 | 0x04 | EOT (End of Transmission) | |
| 5 | 0x05 | ENQ (Enquiry) | |
| 6 | 0x06 | ACK (Acknowledgment) | |
| 7 | 0x07 | BEL (Bell) | \a |
| 8 | 0x08 | BS (Backspace) | \b |
| 9 | 0x09 | HT (Horizontal Tab) | \t |
| 10 | 0x0A | LF (Line Feed) | \n |
| 11 | 0x0B | VT (Vertical Tab) | \v |
| 12 | 0x0C | FF (Form Feed) | \f |
| 13 | 0x0D | CR (Carriage Return) | \r |
| 14 | 0x0E | SO (Shift Out) | |
| 15 | 0x0F | SI (Shift In) | |
| 16 | 0x10 | DLE (Data Link Escape) | |
| 17 | 0x11 | DC1 (Device Control 1) | |
| 18 | 0x12 | DC2 (Device Control 2) | |
| 19 | 0x13 | DC3 (Device Control 3) | |
| 20 | 0x14 | DC4 (Device Control 4) | |
| 21 | 0x15 | NAK (Negative Acknowledgment) | |
| 22 | 0x16 | SYN (Synchronous Idle) | |
| 23 | 0x17 | ETB (End of Transmission Block) | |
| 24 | 0x18 | CAN (Cancel) | |
| 25 | 0x19 | EM (End of Medium) | |
| 26 | 0x1A | SUB (Substitute) | |
| 27 | 0x1B | ESC (Escape) | \e |
| 28 | 0x1C | FS (File Separator) | |
| 29 | 0x1D | GS (Group Separator) | |
| 30 | 0x1E | RS (Record Separator) | |
| 31 | 0x1F | US (Unit Separator) | |
| ASCII Code | Character | Escape | |
|---|---|---|---|
| 32 | 0x20 | (Space) | |
| 33 | 0x21 | ! | |
| 34 | 0x22 | " | \" |
| 35 | 0x23 | # | |
| 36 | 0x24 | $ | |
| 37 | 0x25 | % | |
| 38 | 0x26 | & | |
| 39 | 0x27 | ' | \' |
| 40 | 0x28 | ( | |
| 41 | 0x29 | ) | |
| 42 | 0x2A | * | |
| 43 | 0x2B | + | |
| 44 | 0x2C | , | |
| 45 | 0x2D | - | |
| 46 | 0x2E | . | |
| 47 | 0x2F | / | |
| 48 | 0x30 | 0 | |
| 49 | 0x31 | 1 | |
| 50 | 0x32 | 2 | |
| 51 | 0x33 | 3 | |
| 52 | 0x34 | 4 | |
| 53 | 0x35 | 5 | |
| 54 | 0x36 | 6 | |
| 55 | 0x37 | 7 | |
| 56 | 0x38 | 8 | |
| 57 | 0x39 | 9 | |
| 58 | 0x3A | : | |
| 59 | 0x3B | ; | |
| 60 | 0x3C | < | |
| 61 | 0x3D | = | |
| 62 | 0x3E | > | |
| 63 | 0x3F | ? | |
| ASCII Code | Character | Escape | |
|---|---|---|---|
| 64 | 0x40 | @ | |
| 65 | 0x41 | A | |
| 66 | 0x42 | B | |
| 67 | 0x43 | C | |
| 68 | 0x44 | D | |
| 69 | 0x45 | E | |
| 70 | 0x46 | F | |
| 71 | 0x47 | G | |
| 72 | 0x48 | H | |
| 73 | 0x49 | I | |
| 74 | 0x4A | J | |
| 75 | 0x4B | K | |
| 76 | 0x4C | L | |
| 77 | 0x4D | M | |
| 78 | 0x4E | N | |
| 79 | 0x4F | O | |
| 80 | 0x50 | P | |
| 81 | 0x51 | Q | |
| 82 | 0x52 | R | |
| 83 | 0x53 | S | |
| 84 | 0x54 | T | |
| 85 | 0x55 | U | |
| 86 | 0x56 | V | |
| 87 | 0x57 | W | |
| 88 | 0x58 | X | |
| 89 | 0x59 | Y | |
| 90 | 0x5A | Z | |
| 91 | 0x5B | [ | |
| 92 | 0x5C | \ | \\ |
| 93 | 0x5D | ] | |
| 94 | 0x5E | ^ | |
| 95 | 0x5F | _ | |
| ASCII Code | Character | |
|---|---|---|
| 96 | 0x60 | ` |
| 97 | 0x61 | a |
| 98 | 0x62 | b |
| 99 | 0x63 | c |
| 100 | 0x64 | d |
| 101 | 0x65 | e |
| 102 | 0x66 | f |
| 103 | 0x67 | g |
| 104 | 0x68 | h |
| 105 | 0x69 | i |
| 106 | 0x6A | j |
| 107 | 0x6B | k |
| 108 | 0x6C | l |
| 109 | 0x6D | m |
| 110 | 0x6E | n |
| 111 | 0x6F | o |
| 112 | 0x70 | p |
| 113 | 0x71 | q |
| 114 | 0x72 | r |
| 115 | 0x73 | s |
| 116 | 0x74 | t |
| 117 | 0x75 | u |
| 118 | 0x76 | v |
| 119 | 0x77 | w |
| 120 | 0x78 | x |
| 121 | 0x79 | y |
| 122 | 0x7A | z |
| 123 | 0x7B | { |
| 124 | 0x7C | | |
| 125 | 0x7D | } |
| 126 | 0x7E | ~ |
| 127 | 0x7F | DEL (Delete) |
The grayed-out characters above are of historical significance, but are rarely used today.
Escape characters are not part of ASCII, but they have been included on this page since they are commonly used in source code to represent ASCII characters that do not have a printable symbol. For example, '\n' represents the newline character (ASCII code 10), '\t' represents the horizontal tab character (ASCII code 9), etc. Any character can also be written using an escape sequence of the form "\x00" where the numbers are the hexadecimal ASCII code of the desired character. For example, '\x68' is equivalent to 'h'.
Note: The escape characters in the table above are mostly universal, but may differ slightly among programming languages.
Note: The escape characters described here are unrelated to the ANSI escape sequences described here.
The ASCII codes assigned to the upper and lower case letters were deliberately designed to make changing the capitalization of letters effecient using only an if statement and XORing a single bit.
void toUppercase(char c) {
if (97 <= c && c <= 122)
return c ^ 0x20;
else
return c;
}
void toLowercase(char c) {
if (65 <= c && c <= 90)
return c ^ 0x20;
else
return c;
}
The in-order placement of the digits allows for easy conversions between their ASCII codes and numeric values.
int asciiToInt(char c) {
return c - 48;
}
char intToAscii(int i) {
return i + 48;
}
int stringToInt(char * str) {
int num = 0;
int length = strlen(str);
for (int i = 0; i < length; i++)
num = num * 10 + asciiToInt(str[i]);
return num;
}
An intToString() function can be written in a similar manner, but requires slightly more complexity to manage the memory for the created string.
ASCII works great when you are working with English text. However, it doesn't contain the symbols required by most of other languages (e.g. ä ñ ø Ж ش 한 ऊ). Since ASCII contains only 128 characters and a byte can represent 256 different values, one solution is to let the values from 128 to 255 represent the missing characters. This method is called Extended ASCII and it was the initial solution used in many countries. However, it does not work for languages with a large number of characters (e.g. Chinese) or all languages simultaneously. This problem was solved with the creation of Unicode and UTF-8, which are now the worldwide standard for text encoding.
When the enter key is pressed in a text document, your cursor will move to the far left of the screen and move down to the next line. This behavior is the same on both Windows and Unix based (e.g. Macs and Linux) operating systems. However, this behavior is implemented in different ways.
On Windows, when the enter key is pressed, a carriage return (ASCII code 13, escape sequence '\r') and a line feed (ASCII code 10, escape code '\n') are inserted into the document. These two characters move the cursor to the left of the screen and move it down the next line, respectively.
On Unix based operating systems, when the enter key is pressed, only a line feed is inserted into the document. However, when the line feed is interpreted, the cursor is both moved all the way to the left and down one line. Carriage returns are rarely used.
This creates compatability issues between Windows and other operating systems. Some text editors allow users to explictly choose between using \r\n and \n for newlines.
Note: Web browsers may handle carriage returns and line feeds differently then the underlying operating system. So the newline behavior of the visualization above may be different than on your OS.
The backspace key on a keyboard generates the backspace character (ASCII code 8, escape sequence '\b'). Most programs immediately interpet this character by removing the previous character and moving the cursor once to the left.
The tab key on a keyboard generates the tab character (ASCII code 9, escape sequence '\t'). The width of the tab is not part of ASCII, but is instead defined by the program displaying the text.
The escape character (ASCII code 27, escape sequence '\e') is commonly used as the first letter in an ANSI escape sequence to affect the behavior of a terminal. This is how some programs change the color and boldness of their terminal output or move the location of the cursor. ANSI escape sequences are not part of ASCII, but they are described here since they are the most common use of ASCII code 27).
Try running the following command in a terminal:
echo -e "Hello \e[1m\e[32mworld\e[0m!"
The ANSI escape sequences described here are unrelated to the escape characters described here. Both use the concept of escaping, but they use different escape charaters (ASCII code 27 vs ASCII code 92), their behavior is described in different standards (The definition of each programming language vs ANSI), and they are interpreted in different locations (e.g. your compiler vs your terminal driver).
The other special characters (shown in gray in the table) are rarely used. They were designed for controlling teletypes in the 1960's, but are no longer necessary.
In C and many other languages, the null character (ASCII code 0, escape sequence '\0') is used to denote the end of a string. For example, the string "Hi" would be stored using three bytes in memory as 0x48 0x69 0x00. Often compilers, runtime environments, and standard library methods will automatically insert the null character at the end of a string. However, since this is not always the case in C or assembly languages, it a common source of bugs.
In C and many other languages, when a character is written in single quotes, the compiler automatically replaces it with its ASCII code and treats the value as a number. For example, the return value of the following function is 201:
int add_characters() {
return 'd' + 'e';
}