Hi all,
I’ll try explaining this in a concise manner, perhaps this will suffice for now.
Basically I am writing a parser. It actually works and produces a valid RSS 2.0 script.
The next step is to expand beyond the custom tags I already have in my PHP web page and add more, these I can then optionally use for example to expand (append to) the <description> </description> XML data.
All potential RSS items in my PHP web page have a <rss_content_item> </rss_content_item> around them so with a simple preg_match_all() I can quickly find out how many I have (i.e. 9, 11, 25 etc.) and then just focus future searches on that instead of all the file (to boost speed).
Now…as mentioned above I want to add some other tags. The problem is not all <rss_content_item></rss_content_item> 's feature these tags. It depends on the given rss item.
My parser reads through the entire file, finds all instances of all custom tags and then writes that to an array I have made (class) before finally once it’s all populated writing that to a physical .xml file.
It works wonderfully if you assume each rss item has all the tags (i.e. title, description, link etc.) and the data in between each tag only occurs in one instance. This is indeed true for all standard tags (those needed to make a valid RSS XML file) but not true for any additional tags I’m throwing in them (i.e. those that provide additional data that may/may not find its way into the RSS XML file).
Finding and copying the data in between these additional tags is all fine. The problem is how to know which rss_item contains the additional tags but also how many instances of these tags does the rss_item contain.
My dilemma spans from the fact that preg_match_all() returns just an array of all instances having searched all rss_items (see $content_items) but there’s no indication where the data comes from exactly (which rss_item contains it).
Any ideas how to solve this puzzle? Without knowing what goes where I can’t populate $rss_content and ultimately write it to an XML file. Thanks!
[B]
class RSSContent
{
public $rssTitle;
public $rssDescription;
public $rssHasExtra;
public $rssDescriptionExtra;
public $rssPubDate;
public $rssLink;
public $rssGUID;
public $rssAuthor;
public $rssCategory;
}
Here’s some code to give you an idea how it’s working so far:
//holds all RSS data from source file
$rss_content = new RSSContent();
//open source and destination files
$rss_source_file = fopen("$rss_from_file", "r") or die("can't open file [SOURCE]");
$rss_write_file = fopen("$rss_to_file", "a") or die("can't open file [DESTINATION]");
//read the entire contents of source file into buffer
while (!feof ($rss_source_file))
{
$source_file_contents = fgets($rss_source_file);
}
$source_file_array_count = sizeof($source_file_contents);
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// RSS_CONTENT_ITEM
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
$match_pattern = ‘/[\r
]<rss_content_item>[\r
](.)[\r
]<\\/rss_content_item>/U’;
$current_line = $source_file_contents;
for ($i = 0; $i < $source_file_array_count; $i++)
{
$content_items = get_all_content_between($current_line, $match_pattern);
}
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// ---> RSS_CONTENT_TITLE
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
$array_count = sizeof($content_items);
$current_line = $content_items;
$match_pattern = '/[\\r\
]<rss_content_title>[\r
](.)[\r
]<\\/rss_content_title>/U’;
for ($i = 0; $i < $array_count; $i++)
{
$content_title = get_all_content_between(implode(" ", $current_line), $match_pattern);
}
$array_count = sizeof($content_title);
for ($i = 0; $i < $array_count; $i++)
{
$rss_content['$rssTitle '][$i] = $content_title[$i];
}
etc…
[/B]