SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Member
    Join Date
    Jul 2003
    Location
    United Kingdom
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Database storage and [img] tag parsing

    I have been designing a new MySQL based website, but ran into some trouble when I was laying out how news will be stored in the database. I am a big enthusiast for web standards (namely strict XHTML and CSS based designs), so my news will be displayed in the following HTML:

    Code:
    <p>My first paragraph.</p>
    <p>My second paragraph.</p>
    And so on. As is usually the custom, I plan to store the news text (maybe entered from an admin panel) in the following form:

    Code:
    My first paragraph.\r\nMysecond paragraph.
    But I decided to make it cross-os and flexible, so the data can also be stored as:

    Code:
    My first paragraph.\r\n\r\nMy second paragraph.
    To do this, I developed a regular expression to replace any block of consecutive \r or \n (any amount in any order) with </p>\r\n<p>, and thus I can format my output like I was planning to at the beginning of the post:

    PHP Code:
    <?php
       
    echo '<p>'.preg_replace("/(\r|\n)+/""</p>\r\n<p>"$item['text']).'</p>';
    ?>
    However, the problem arose when I realized that I also sometimes have to embed images in my news posts like this:

    Code:
    <p>My first paragraph.</p>
    <div class="right"><img src="image.gif" /><p>My caption.</p></div>
    <p>My second paragraph.</p>
    The regular expression I wrote above would turn the image with the text above into:

    Code:
    <p><div class="right"><img src="image.gif" /><p>My caption.</p></div></p>
    Which is invalid XHTML. I figured the way to solve it would be to store the image as:

    Code:
    [img src="image.gif" align="right" caption="My caption."]
    Which, with a regular expression could turn into:

    Code:
    <div class="right"><img src="image.gif" /><p>My caption.</p></div>
    But should I store the [img] tag directly in my database, or convert the entire post into HTML when it is submitted and then store the HTML in the database. The latter sounds very stupid because it takes up extra database space, and would ruin any chance of transferring the database to a new design sometime later.

    Is it worth doing the regular expression for every news item that I retrieve from the database? Am I overlooking something stupid? How should I store the news in the database to format into the way I stated at the very beginning of the post? Where can I find [img]-like tag parsing regular expressions?

    Thanks in advance,

    DocUK

  2. #2
    SitePoint Enthusiast
    Join Date
    Jun 2003
    Location
    Chicago
    Posts
    73
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'd store it with the
    Code:
    [img src="image.gif" align="right" caption="My caption."]
    in the database. That way, if you decide to edit the news article, it's as if you just entered it, not with html you didn't type.

    To completely minimize the preg over head, you could buffer all your news, get the buffer into a variable, and then do just one regular expression, as opposed to doing it every post.

  3. #3
    SitePoint Member
    Join Date
    Jul 2003
    Location
    United Kingdom
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Aesthetic-Theory
    To completely minimize the preg over head, you could buffer all your news, get the buffer into a variable, and then do just one regular expression, as opposed to doing it every post.
    Thanks for your reply!

    How do I create a regular expression that replaces the [img] tag with the appropriate <div> tags and so on, and then adds the <p></p> around the text blocks only, and not the <div></div>? Is it also possible to make the caption part of the [img] tag optional?

    Regex is my worst enemy...

  4. #4
    SitePoint Enthusiast
    Join Date
    Jun 2003
    Location
    Chicago
    Posts
    73
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Regex is my worst enemy
    Join the club

    My second one works fine, but the first one does nothing at all. No errors or anything, just replaces nothing.

    Code:
    		<?php
    		$img = '/\[img align=\'(.+)\' caption=\'(.+)\'  source=\'(.+)\'\]/i';
    		$text = "[img align='right' caption='TF Logo' source='http://tutorialforums.com/images/styles/Dimitrix/top_logo.jpg']";
    		print $text . '<br />';
    		$text = preg_replace($img, '<div class="\\1"><img src="\\3" alt="\\2" /><p>\\2</p></div>' ,$text);
    		print $text . '<br /><br />';
    		
    		$img = '/(\[img\])(.+)(\[\/img\])/i';
    		$text = "";
    		print $text . '<br />';
    		$text = preg_replace($img, '<img src="\\2" alt="\\2" />' ,$text);		
    		print $text;
    		?>
    Edit:
    <b>bold?</b>
    It seems HTML works in code tags... thats rather interesting.

  5. #5
    SitePoint Member
    Join Date
    Jul 2003
    Location
    United Kingdom
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm sorry I must be stupid, but what does the second piece of code do? $text seems to be the same before and after, as if nothing was replaced like in the first one.

    However, the main problem is really how to avoid the <p></p> being wrapped around <div></div> tags as I described in my original post... I am sure I can find some BBCode regex in a tutorial somewhere later on.

  6. #6
    SitePoint Enthusiast
    Join Date
    Jun 2003
    Location
    Chicago
    Posts
    73
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You'd search for the <p> tags surrounding the <img> tag and strip them to solve your problem.

  7. #7
    SitePoint Member
    Join Date
    Jul 2003
    Location
    United Kingdom
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That seems so clumsy though :/

  8. #8
    SitePoint Member
    Join Date
    Jul 2003
    Location
    United Kingdom
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've been working on this for awhile now, and this is what I came up with to parse some BB-like code in my text items:

    PHP Code:
    $item['text'] = preg_replace('/\[abbr="(.+)"\](.+)\[\/abbr\]/''<abbr title="\1">\2</abbr>'$item['text']);
    $item['text'] = preg_replace('/\[img="(.+)"\](.*)\[\/img\]/''<img src="\1" alt="\2" />'$item['text']);
    $item['text'] = preg_replace('/\[img="(.+)"\]/''<img src="\1" alt="" />'$item['text']);
    $item['text'] = preg_replace('/\[link="(.+)"\](.+)\[\/link\]/''<a href="\1">\2</a>'$item['text']);
    $item['text'] = preg_replace('/\[link="(.+)"\]/''<a href="\1">\1</a>'$item['text']);
    $item['text'] = preg_replace('/\[(left|right)\](.+)\[\/\1\]/''<p class="\1">\2</p>'$item['text']);

    //this should fix the <p>s around our text, but remove on [left] and [right] tags
    $item['text'] = '<p>'.preg_replace("/(\r|\n)+/""</p>\r\n<p>"$item['text']).'</p>';
    $item['text'] = str_replace('<p><p''<p'$item['text']);
    $item['text'] = str_replace('</p></p>''</p>'$item['text']); 
    The last couple of str_replaces to remove those extra tags seem rather clumsy, and I'd appreciate your opinion. Also, this is an extraordinary amount of regular expression overhead. Perhaps I should store the parsed code in the database after all...


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •