From 27979ecf09d98c7a081b2e1adfe94b7b8bd5937b Mon Sep 17 00:00:00 2001
From: Daniel Veillard
1 Introduction
1.1 Origin and Goals
1.2 Terminology
2 Documents
2.1 Well-Formed XML Documents
2.2 Characters
2.3 Common Syntactic Constructs
2.4 Character Data and Markup
2.5 Comments
2.6 Processing Instructions
2.7 CDATA Sections
2.8 Prolog and Document Type Declaration
2.9 Standalone Document Declaration
2.10 White Space Handling
2.11 End-of-Line Handling
2.12 Language Identification
3 Logical Structures
3.1 Start-Tags, End-Tags, and Empty-Element Tags
3.2 Element Type Declarations
3.2.1 Element Content
3.2.2 Mixed Content
3.3 Attribute-List Declarations
3.3.1 Attribute Types
3.3.2 Attribute Defaults
3.3.3 Attribute-Value
Normalization
3.4 Conditional Sections
4 Physical Structures
4.1 Character and Entity References
4.2 Entity Declarations
4.2.1 Internal Entities
4.2.2 External Entities
4.3 Parsed Entities
4.3.1 The Text Declaration
4.3.2 Well-Formed Parsed Entities
4.3.3 Character Encoding in Entities
4.4 XML Processor Treatment of Entities and References
4.4.1 Not Recognized
4.4.2 Included
4.4.3 Included If Validating
4.4.4 Forbidden
4.4.5 Included in Literal
4.4.6 Notify
4.4.7 Bypassed
4.4.8 Included as PE
4.5 Construction of Internal Entity Replacement Text
4.6 Predefined Entities
4.7 Notation Declarations
4.8 Document Entity
5 Conformance
5.1 Validating and Non-Validating Processors
5.2 Using XML Processors
6 Notation
A References
A.1 Normative References
A.2 Other References
B Character Classes
C XML and SGML (Non-Normative)
D Expansion of Entity and Character References (Non-Normative)
E Deterministic Content Models (Non-Normative)
F Autodetection
-of Character Encodings (Non-Normative)
F.1 Detection Without External Encoding Information
F.2 Priorities in the Presence of External Encoding Information
G W3C XML Working Group (Non-Normative)
H W3C XML Core Group (Non-Normative)
I Production Notes (Non-Normative)
a
is declared NMTOKENS and to those of the right columns if a
is declared CDATA.
-#x0000003C
"
and '?' is "#x0000003F
", and the Byte Order Mark
required of UTF-16 data streams is "#xFEFF
". The notation ## is used to denote any byte value except that two consecutive ##s cannot be both 00.
With a Byte Order Mark:
00 00 FE
-FF | UCS-4, big-endian machine (1234 order) |
FF
-FE 00 00 | UCS-4, little-endian machine (4321 order) |
00 00 FF FE | UCS-4, unusual octet order (2143) | -
FE FF 00 00 | UCS-4, unusual octet order (3412) | -
FE FF ## ## | UTF-16, big-endian |
FF FE ## ## | UTF-16, little-endian |
EF BB BF | UTF-8 |
00 00 FE
+FF | UCS-4, big-endian machine (1234 order) |
FF
+FE 00 00 | UCS-4, little-endian machine (4321 order) |
00 00 FF FE | UCS-4, unusual octet order (2143) | +
FE FF 00 00 | UCS-4, unusual octet order (3412) | +
FE FF ## ## | UTF-16, big-endian |
FF FE ## ## | UTF-16, little-endian |
EF BB BF | UTF-8 |
Without a Byte Order Mark:
00 00 00 3C |
-UCS-4 or other encoding with a 32-bit code unit and ASCII
+
Note: @@ -2574,7 +2574,7 @@ Contact)
- I Production Notes (Non-Normative)+I Production Notes (Non-Normative)This Second Edition was encoded in the XMLspec DTD (which has documentation available). The HTML versions were produced with a combination of the xmlspec.xsl, diffspec.xsl, diff --git a/tests/xmlspec/REC-xml-20001006.html b/tests/xmlspec/REC-xml-20001006.html index ead304b..341dbdc 100644 --- a/tests/xmlspec/REC-xml-20001006.html +++ b/tests/xmlspec/REC-xml-20001006.html @@ -65,7 +65,7 @@ be contacted at cmsmcq@w3.org. Table of Contents1 Introduction AppendicesA References
+of Character Encodings (Non-Normative) F.1 Detection Without External Encoding Information F.2 Priorities in the Presence of External Encoding Information G W3C XML Working Group (Non-Normative) H W3C XML Core Group (Non-Normative) I Production Notes (Non-Normative) 1 Introduction@@ -1151,15 +1151,15 @@ declarations: to the character sequences of the middle column if the attributea
is declared NMTOKENS and to those of the right columns if a
is declared CDATA.
-
Note that the last example is invalid (but well-formed) if Entity
-Type | Character | Parameter | Internal General | External Parsed
-General | Unparsed | Reference
-in Content | Not recognized |
-Included | Included
-if validating | Forbidden |
-Included | Reference in Attribute Value | Not recognized | Included
-in literal | Forbidden |
-Forbidden | Included |
-Occurs as Attribute
-Value | Not recognized |
-Forbidden | Forbidden | Notify |
-Not recognized | Reference in EntityValue | Included in literal | Bypassed |
-Bypassed | Forbidden |
-Included | Reference in DTD | Included
-as PE | Forbidden |
-Forbidden | Forbidden |
-Forbidden | Entity
+Type | Character | | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Parameter | Internal General | External Parsed +General | Unparsed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reference +in Content | Not recognized | +Included | Included +if validating | Forbidden | +Included | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reference in Attribute Value | Not recognized | Included +in literal | Forbidden | +Forbidden | Included | +||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Occurs as Attribute +Value | Not recognized | +Forbidden | Forbidden | Notify | +Not recognized | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reference in EntityValue | Included in literal | Bypassed | +Bypassed | Forbidden | +Included | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reference in DTD | Included +as PE | Forbidden | +Forbidden | Forbidden | +Forbidden |
#x0000003C
"
and '?' is "#x0000003F
", and the Byte Order Mark
required of UTF-16 data streams is "#xFEFF
". The notation ## is used to denote any byte value except that two consecutive ##s cannot be both 00.
With a Byte Order Mark:
-00 00 FE
-FF | UCS-4, big-endian machine (1234 order) |
FF
-FE 00 00 | UCS-4, little-endian machine (4321 order) |
00 00 FF FE | UCS-4, unusual octet order (2143) | -
FE FF 00 00 | UCS-4, unusual octet order (3412) | -
FE FF ## ## | UTF-16, big-endian |
FF FE ## ## | UTF-16, little-endian |
EF BB BF | UTF-8 |
00 00 FE
+FF | UCS-4, big-endian machine (1234 order) |
FF
+FE 00 00 | UCS-4, little-endian machine (4321 order) |
00 00 FF FE | UCS-4, unusual octet order (2143) | +
FE FF 00 00 | UCS-4, unusual octet order (3412) | +
FE FF ## ## | UTF-16, big-endian |
FF FE ## ## | UTF-16, little-endian |
EF BB BF | UTF-8 |
Without a Byte Order Mark:
-00 00 00 3C |
-UCS-4 or other encoding with a 32-bit code unit and ASCII
+
Note: @@ -2484,7 +2484,7 @@ Contact)
- I Production Notes (Non-Normative)+I Production Notes (Non-Normative)This Second Edition was encoded in the XMLspec DTD (which has documentation available). The HTML versions were produced with a combination of the xmlspec.xsl, diffspec.xsl, -- 2.7.4 |