i'm writing some php to parse dtd's (which are in sgml) -- dtd's which are pointed to by the doctype at the start of html and xhtml documents.

i've just found out that html and xhtml use different versions of sgml. i think xhtml's sgml is the same as the sgml xml uses.

the point is it's not just that the rules in xhtml dtd's are different to the rules in html dtd's, but the language the rules are written in differs -- a different version of sgml is used.

so it seems that html uses an html version of sgml, and xhtml uses an xml version of sgml. the code i'm writing, which takes a doctype as a starting point, what should it base its answer to "parse using xml-sgml or html-sgml?" on?

Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
also there doesn't seem to be anything in the dtd documents themselves which state which sgml to use.

i suppose i could use the HTML and XHTML part of the doctype which are part of the Formal Public Identifyer parts of the doctypes, in particular the "label" or "public text description" part of the fpi (according to http://www.eskimo.com/~bloo/indexdot.../d/doctype.htm )

but that doesn't seem a very reliable good way to tell if the xml sgml or html sgml should be used -- just looking to see if the "HTML 4.01 Transitional"/"XHTML 1.0 Strict" part of the doctype starts with an X or not -- which is what that would come down to. especially as one term used to describe that bit of the doctype is "public text description" -- doesn't sound like something a programatic decision should be based on.

or i could base the decision on the sgml itself. for example html sgml, the element definitions contain a pair of hyphens and/or O's to indicate whether the opening and closing tags are optional or not:

<!ELEMENT UL - - (LI)+>

whereas those never occur in the xhtml's sgml:

<!ELEMENT ul (li)+>

what's the propper thing to base the use xhtml-sgml or html-sgml decision on does anyone know? thanks.

(i'm not actually sure at the moment how much xhtml's sgml and html's sgml differs -- not the rules but the language -- it does differ at least a little bit)