SitePoint Sponsor

User Tag List

Results 1 to 3 of 3

Thread: Html to Text

  1. #1
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    318
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Html to Text

    I am writing a script; for which I want to suppress all the html content and store the data only as simple text.

    For example the database has
    Code:
    Posted by Dave Merten<p><img src="http://www.macsimumnews.com/images/uploads/fix11-1.jpg" border="0" alt="image" name="image" width="69" height="71" />The System Management Controller—or SMC for short—is an integrated circuit (computer chip) that is on the logic board of the computer. As the name implies, it is responsible for power management of the computer. It controls backlighting, hard disk spin down, sleep and wake, some charging aspects, trackpad control, and... </p><br clear="both"/>
    <br clear="both"/>
    <a href="http://ads.pheedo.com/click.phdo?p=1"><img alt="" border="0" src="http://ads.pheedo.com/img.phdo?p=1" /></a>
    this should be stored as
    Code:
    The System Management Controller—or SMC for short—is an integrated circuit (computer chip) that is on the logic board of the computer. As the name implies, it is responsible for power management of the computer. It controls backlighting, hard disk spin down, sleep and wake, some charging aspects, trackpad control, and...

    How can I achieve this?
    http://kkonline.org - Inspiring Life...

  2. #2
    SitePoint Enthusiast
    Join Date
    Jul 2009
    Location
    Austria
    Posts
    43
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How can you transform tables into plain text?
    Or lists?

    I think, you will have to write a parser.

    PHP Code:
    $arrayToken=preg_split('/<(.*?)>/s',$htmlText,-1,PREG_SPLIT_DELIM_CAPTURE);
    var_dump($arrayToken); 
    Then - in case of tags - a switch. Then you can use regular expressions to extract all needed information.

    Maybe you can use http://us2.php.net/manual/en/function.strip-tags.php

  3. #3
    rajug.replace('Raju Gautam'); bronze trophy Raju Gautam's Avatar
    Join Date
    Oct 2006
    Location
    Kathmandu, Nepal
    Posts
    4,013
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Don't you need 'Posted by Dave Merten'?? If you meant just to avoid the html tags then the easy solution is to use strip_tags(). Otherwise you have to write your own parser or check out that blubb's parser might work for you.
    Mistakes are proof that you are trying.....
    ------------------------------------------------------------------------
    PSD to HTML - SlicingArt.com | Personal Blog | ZCE - PHP 5


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •