Table of Contents

Formal Description of ANSI Character Streams

Below is a formal description of standard-compliant ANSI streams using EBNF notation. The original ANSI X3.64 and ECMA-48 standards contemplated only 7-bit and 8-bit codes, whereas this version has been extended to encompass UTF-16.

EBNF Description

<ansi-stream>               ::= <ansi-element>*

<ansi-element>              ::= <printable-string> | <control-element>

<pritable-string>           ::= <printable-char>+
<printable-char>            ::= 0x20...0x7f | 0xa0...0xffff

<control-element>           ::= <solitary-control-code> | <introduced-sequence>

<solitary-control-code>     ::= <solitary-C0-code> | <solitary-C1-code>
<solitary-C0-code>          ::= 0x00...0x1a | 0x1c...0x1f
<solitary-C1-code>          ::= 0x80...0x8f | 0x91...0x97 | 0x99...0x9a

<introduced-sequence>       ::= <escape-sequence> | <control-sequence> | <control-string>

<escape-sequence>           ::= <nf-sequence> | <fp-sequence> | <fe-sequence> | <fs-sequence>
<nf-sequence>               ::= <ESC> (0x20...0x2f)+ 0x30...0x7e
<fp-sequence>               ::= <ESC> 0x30...0x3f
<fe-sequence>               ::= <ESC> (0x40...0x4f | 0x51...0x57 | 0x59...0x5a)
<fs-sequence>               ::= <ESC> 0x60...0x7e

<control-sequence>          ::= <standard-control-sequence> | <private-control-sequence>

<standard-control-sequence> ::= <cs-initiator> <parameter-string> <control-function>
<parameter-string>          ::= <parameter-substring> (";" <parameter-substring>)*
<parameter-substring>       ::= <part> (":" <part>)*
<part>                      ::= <digit>*
<digit>                     ::= '0'...'9'

<private-control-sequence>  ::= <cs-initiator> <private-parameter-string> <control-function>
<private-parameter-string>  ::= <ppi> <parameter-byte>*
<ppi>                       ::= 0x3c...0x3f
<parameter-byte>            ::= 0x30...0x3f

<cs-initiator>              ::= <CSI> | <ESC> <FE-CSI>

<control-function>          ::= <intermediate-byte>* <final-byte>
<intermediate-byte>         ::= 0x20...0x2f
<final-byte>                ::= 0x40...0x7e

<control-string>            ::= <character-string> | <command-string>

<character-string>          ::= <string-initiator> <string-char>* <string-terminator>
<string-initiator>          ::= <SOS> | <ESC> <FE-SOS>
<string-char>               ::= 0x00...0x17 | 0x19 | 0x1c...0x7f | 0xa0...0xffff

<command-string>            ::= <command-initiator> | <command-char>* <string-terminator>
<command-initiator>         ::= <dcs-initiator> | <osc-initiator> | <pm-initiator> | <apc-initiator>
<dcs-initiator>             ::= <DCS> | <ESC> <FE-DCS>
<osc-initiator>             ::= <OSC> | <ESC> <FE-OSC>
<pm-initiator>              ::= <PM>  | <ESC> <FE-PM>
<apc-initiator>             ::= <APC> | <ESC> <FE-APC>
<command-char>              ::= 0x08...0x0d | 0x20...0x7e | 0xa0...0xffff

<string-terminator>         ::= <ST> | <ESC> <FE-ST>

// The following terminals have corresponding symbols defined in
// the API that can be use to refer to them when writing code.

<ESC>    ::= 0x1b  // API symbol: ControlCode.ESC
<FE-DCS> ::= 0x50  // API symbol: EscapeCode.DCS ('P')
<FE-SOS> ::= 0x58  // API symbol: EscapeCode.SOS ('X')
<FE-CSI> ::= 0x5b  // API symbol: EscapeCode.CSI ('[')
<FE-ST>  ::= 0x5c  // API symbol: EscapeCode.ST  ('\')
<FE-OSC> ::= 0x5d  // API symbol: EscapeCode.OSC (']')
<FE-PM>  ::= 0x5e  // API symbol: EscapeCode.PM  ('^')
<FE-APC> ::= 0x5f  // API symbol: EscapeCode.APC ('_')
<DCS>    ::= 0x90  // API symbol: ControlCode.DCS
<SOS>    ::= 0x98  // API symbol: ControlCode.SOS
<CSI>    ::= 0x9b  // API symbol: ControlCode.CSI
<ST>     ::= 0x9c  // API symbol: ControlCode.ST
<OSC>    ::= 0x9d  // API symbol: ControlCode.OSC
<PM>     ::= 0x9e  // API symbol: ControlCode.PM
<APC>    ::= 0x9f  // API symbol: ControlCode.APC
Note

As a convention, the names of non-terminals have been written in lower case and the names of terminals have been written in upper case.

Note

Control codes of type <solitary-control-code> appear independently in the input stream without any associated preceding or succeeding characters.

Note

The API ControlCode class defines symbolic constants for all control codes and the API EscapeCode class defines symbolic constants for all characters associated with escape sequences of type <fe-sequence>.

Note

See ECMA-35 for detailed discussion of nF, Fp, Fe, and Fs escape sequences.

Important

There are many possible illegal sequences that do not conform to the EBNF grammar, such as <ESC> followed immediately by another control code, or <ST> not preceded by a corresponding string initiator, etc. The parser's state machine implementation guarantees a deterministic response to all possible sequences, although it cannot guarantee a return to its ground state. For example, an <SOS> code followed by nothing but valid string content characters will cause the parser to remain in its string-processing state indefinitely.