Processing Instructions

<?target processing-instruction data?>

Processing instructions provide an escape mechanism that allows an XML application to include instructions to an XML processor that are not validated. The processing instruction target can be any legal XML name, except xml in any combination of upper- and lowercase (see Chapter 2). Linking to a stylesheet to provide formatting instructions for a document is a common use of this mechanism. According to the principles of XML, formatting instructions should remain separate from the actual content of a document, but some mechanism must associate the two. Processing instructions are significant only to applications that recognize them.

The notation facility can indicate exactly what type of processing instruction is included, and each individual XML application must decide what to do with the additional data. No action is required by an XML parser when it recognizes that a particular processing instruction matches a declared notation. When this facility is used, applications that do not recognize the public or system identifiers of a given processing instruction target should realize that they could not properly interpret its data portion.

Character Encoding Autodetection

The XML declaration (possibly preceded by a Unicode byte-order mark) must be the very first item in a document so that the XML parser can determine which character encoding was used to store the document. A chicken-and-egg problem exists, involving the XML declaration's encoding="..." clause: the parser can't parse the clause if it doesn't know what character encoding the document uses. However, since the first five characters of the document must be the string <?xml (if it includes an XML declaration), the parser can read the first few bytes of a document and, in most cases, determine the character encoding before it has read the encoding declaration.