Hi, Mittineague.
I didn't include any of the <front> portions simply because whether it's some "<front>" element or "<item>" element, all I believe I need is a way to extract content between an opening and closing group of data. BUT, if you really need some code pertaining to those sections, let me know and I'll throw some of it on here.
Below is the header portion of the XML code...:
Code:
<?xml version="1.0" encoding="ISO-8859-1"?>
<TEI.2>
<teiHeader status="new" type="text">
<fileDesc>
<titleStmt>
<title>Lorem Ipsum</title>
<author>Lorem Ipsum</author>
<sponsor>Lorem Ipsum</sponsor>
<principal>Lorem Ipsum</principal>
<respStmt>
<resp>Lorem Ipsum</resp>
<name>Lorem Ipsum</name>
<name>Lorem Ipsum</name>
<name>Lorem Ipsum</name>
<name>Lorem Ipsum</name>
</respStmt>
<funder n="org:BLAH">Lorem Ipsum</funder>
</titleStmt>
<extent />
<publicationStmt>
<publisher>Lorem Ipsum</publisher>
<pubPlace>Lorem Ipsum</pubPlace>
<authority>Lorem Ipsum</authority>
<availability status="free">
<p>
Lorem Ipsum, Lorem Ipsum, Lorem Ipsum...
</p>
<list>
<item>
Lorem Ipsum
<quote>Lorem Ipsum, Lorem Ipsum, Lorem Ipsum, Lorem Ipsum, Lorem Ipsum, Lorem Ipsum, Lorem Ipsum.</quote>
</item>
<item>Lorem Ipsum</item>
<item>Lorem Ipsum</item>
<item>Lorem Ipsum</item>
</list>
</availability>
</publicationStmt>
<sourceDesc default="NO">
<biblStruct default="NO">
<monogr>
<title>Lorem Ipsum</title>
<author>Lorem Ipsum</author>
<imprint>
<pubPlace>Lorem Ipsum</pubPlace>
<publisher>Lorem Ipsum</publisher>
<date>1870</date>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>
<encodingDesc>
<editorialDecl default="NO">
<correction status="medium" method="silent" default="NO">
<p>Lorem Ipsum</p>
</correction>
</editorialDecl>
<refsDecl doctype="TEI.2" n="front">
<state unit="section" n="chunk" />
</refsDecl>
<refsDecl doctype="TEI.2" n="body">
<state unit="section" />
<state unit="subsection" />
<state unit="paragraph" n="chunk" />
</refsDecl>
<refsDecl doctype="TEI.2">
<state unit="section" />
<state unit="subsection" />
<state unit="paragraph" n="chunk" />
</refsDecl>
</encodingDesc>
<profileDesc>
<langUsage default="NO">
<language id="en">
English
</language>
<language id="greek">
Greek
</language>
<language id="la">
Latin
</language>
<language id="de">
German
</language>
<language id="fr">
French
</language>
<language id="it">
Italian
</language>
</langUsage>
</profileDesc>
<revisionDesc>
<change>
<date>June 25, 1819</date>
<respStmt>
<name>Lorem Ipsum</name>
<resp>Lorem Ipsum</resp>
</respStmt>
<item>
Etiam in consequat est. Ut at mattis magna. Praesent quis metus in nibh lobortis egestas condimentum eu tortor. Nullam ut mi justo, nec scelerisque ante. Integer risus mauris, pretium eu laoreet eget, adipiscing sed ligula. In vitae congue lacus. Nulla a felis velit, et ultricies lacus. Vivamus volutpat imperdiet mauris, vitae aliquet augue tempor nec. Vivamus eget hendrerit dui. Donec id enim ut enim dignissim luctus. Pellentesque vel orci arcu, quis venenatis libero.
</item>
</change>
</revisionDesc>
</teiHeader>
So for this particular example, let's say I'm trying to extract all the content between "<profileDesc>" and "</profileDesc>", meaning I'm trying to extract only the content (not necessarily the tags themselves, though the option could be nice for possible conditions later on down the road; if I need to create any pertinent open-close tag conditions, for example).
What do you think?
Bookmarks