SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,293
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)

    Using SGML spec/rules to parse DTDs

    A very obscure technical SGML, DTD question I'm sure, but no harm in asking it anyway.

    In the SGML specification (ISO 8879) the first SGML entity is:

    Code:
    1 SGML document =
        2 SGML document entity,
        (    3 SGML subdocument entity |
            4 SGML text entity |
            5.1 character data entity |
            5.2 specific data entity |
            6 non-SGML data entity    )*
    That says "2 SGML document entity" must happen, followed by any number of the bracketed things including zero times. So 2 has to happen first and it must happen. 2 is:

    Code:
    2 SGML document entity =
        5 s*,
        171 SGML declaration,
        7 prolog,
        10 document instance set,
        Ee
    5 is a space character or three other non-printable chars I think, anyway it's optional. Then 171 has to happen. 171 is:

    Code:
    171 SGML declaration =
        mdo,
        "SGML",
        65 ps+,
        76 minimum literal,
        65 ps+,
        172 document character set,
        65 ps+,
        180 capacity set,
        65 ps+,
        181 concrete syntax scope,
        65 ps+,
        182 concrete syntax,
        65 ps+,
        195 feature use,
        65 ps+,
        199 application-specific information,
        65 ps*,
        mdc
    The start of that would be: <!SGML

    Then there's "7 prolog". 7 prolog is:

    Code:
    7 prolog =
        8 other prolog*,
        9 base document type declaration,
        (    110 document type declaration |
            8 other prolog    )*,
        (    154 link type declaration |
            8 other prolog    )*
    8 other prolog is "91 comment declaration" or "44 processing instruction" or some space but it's optional anyway. Then 9 is:

    Code:
    9 base document type declaration =
        110 document type declaration
    Which, finally (this is what I was waiting for), is the document declaration which is recognisable from the start of any well formed web page.

    I understand that you take the doc type line from the start of a web page, like

    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    and that gets added before the rest of the DTD for when parsing it. So the DTD you parse is:

    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    
    <!--
       Extensible HTML version 1.0 Strict DTD
    
       This is the same as HTML 4 Strict except for
       changes due to the differences between XML and SGML.
    
       Namespace = http://www.w3.org/1999/xhtml
    
       For further information, see: http://www.w3.org/TR/xhtml1
    
       Copyright (c) 1998-2002 W3C (MIT, INRIA, Keio),
       All Rights Reserved. 
    
       This DTD module is identified by the PUBLIC and SYSTEM identifiers:
    
       PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
       SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
    
       $Revision: 1.1 $
       $Date: 2002/08/01 13:56:03 $
    -->
    
    <!--================ Character mnemonic entities =========================-->
    
    <!ENTITY % HTMLlat1 PUBLIC
       "-//W3C//ENTITIES Latin 1 for XHTML//EN"
    etc.
    etc.
    So what the SGML rules/spec is saying is that you have to have at least a "171 SGML declaration" first, which starts <!SGML
    Then you have the doctype line. Then what's in the DTD presumably.

    There is no <!SGML … line apparent from a DTD. Where is the <!SGML... bit? Is it implied in some way? Otherwise without that it seems a DTD isn't proper SGML?

  2. #2
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,293
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Ah, I've kind of answered this myself now. The missing <!SGML ....> bits I've found on w3.org's site.

    xhtml 1's SGML: http://www.w3.org/TR/xhtml1/DTD/xhtml1.dcl
    SGML Declaration of HTML 4: http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html

    So those need to come before the doc type to make the DTDs proper SGML. Odd the way the existence/location of those aren't part of it technically in some way though. I mean in a machine linked, readable and parsable way.

  3. #3
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,293
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I thought it might be deducible from DTD urls.


    This works:


    http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd << url in doctype
    http://www.w3.org/TR/xhtml1/DTD/xhtml1.dcl << the sgml


    These don't:


    http://www.w3.org/TR/html4/loose.dtd << url in doctype
    http://www.w3.org/TR/html4/loose.dcl << not found, doesn't exist


    http://www.w3.org/TR/html4/strict.dtd << url in doctype
    http://www.w3.org/TR/html4/strict.dcl << not found, doesn't exist


    Ah, HTML4's machine readable one is http://www.w3.org/TR/html4/HTML4.decl

    Hmm, maybe I'll just ignore the <!SGML... bit and start at and with the doc type line.

  4. #4
    bronze trophy
    Join Date
    Dec 2004
    Location
    Sweden
    Posts
    2,670
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What are you trying to achieve? Learning SGML is a waste of time.

    If you're trying to write correct HTML and are targeting browsers, forget about SGML. Browsers never used SGML parsers and never will. Historically browsers used different rules to parse HTML, but today they have all rewritten their HTML parsers according to the HTML spec. (Previous HTML specs maintained the fiction that HTML was SGML, but browsers didn't comply and Web content didn't, either.)

    The HTML spec for writing HTML: http://www.whatwg.org/specs/web-apps...x.html#writing
    Simon Pieters


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •