SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    SitePoint Addict jamus's Avatar
    Join Date
    Jul 2004
    Location
    Devon, UK
    Posts
    301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Removing hashtags (#example) from a string

    Im pulling in tweets to my webpage using the xml feed from Twitter.

    Is it possible to remove the hashtags from the string tweet? When you don't know what the hashtag will be.

    Removing everything from the hash symbol to the next bit of whitespace?

    E.g

    "Isn't this amazing! #HTML5"

    becomes

    "Isn't this amazing!"

  2. #2
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    So you only need part of the string? Then you need substr

    And you want to find the last occurrence of a # and take all text up until there --so you basically need the position of the last occurrence of # in the string--, then you need strrpos.

    Using those two functions I'm pretty sure you can figure it out. If you have any questions let me know
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  3. #3
    SitePoint Addict jamus's Avatar
    Join Date
    Jul 2004
    Location
    Devon, UK
    Posts
    301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks ScallioXTX.

    Hmmm. There might be more than one - so I was you'd run the same function again to remove the next - and so on?

  4. #4
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Hm, I was under the impression there would be only one at the end. In that case I'd use strpos rather than strrpos (strrpos would work as well, but working from left to right is a bit more natural than from right to left )

    And you're right, you'd work your way through the string until all the tags are gone.

    A general outline:
    PHP Code:
    $tweet='tweet #tweet diddly tweet #html';
    while ((
    $pos strpos($tweet'#')) !== false) {
      
    // $spacepos = position of first space after $pos -- find this using strpos
      // now that you know where the tag starts ($pos) and ends ($spacepos), use
      // substr_replace to get it out of there

    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  5. #5
    SitePoint Addict chestertondevelopment's Avatar
    Join Date
    Dec 2005
    Location
    Essex, UK
    Posts
    241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Twitter themselves have released code which they recommend using for parsing usernames and hash tags. I don't know if it will do exactly as you want but it might be useful as a starting point at least. http://github.com/mzsanford/twitter-text-php

  6. #6
    SitePoint Addict jamus's Avatar
    Join Date
    Jul 2004
    Location
    Devon, UK
    Posts
    301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Excellent ScallioXTX. Thank you for your guidance!

  7. #7
    SitePoint Addict jamus's Avatar
    Join Date
    Jul 2004
    Location
    Devon, UK
    Posts
    301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    chestertondevelopment I shall take a look. Thank you.

    Though I want to try the 'manual' way of doing things not least as it will come in handy for other parts of this project not just the hashtags. Eg. removing links. I think anyway!

  8. #8
    SitePoint Addict jamus's Avatar
    Join Date
    Jul 2004
    Location
    Devon, UK
    Posts
    301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I may need some additional help - sorry.

    PHP Code:
    $tweet='tweet #tweet diddly tweet #html';

    while ((
    $pos strpos($tweet'#')) !== false) {

        
    $spaceposstrpos($tweet' ')
        
    $tweet=substr_replace($var''$pos$spacepos)

    Does this look right to you? How do I get out the loop?

  9. #9
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    No, $var should be $tweet

    Also, you should start looking for a space after the position of the #

    So, $spacepos=strpos($tweet, ' ', $pos);

    Otherwise in a string 'hello #hi bye' it will find the space after 'hello', while that's not what you want, since you want the space before 'bye'. The code above will give you exactly that

    Also, you need to take into account that there doesn't necessarily have to a space, if #something is at the end of the string there is no space after it.

    So you need something like this:
    PHP Code:
    while (($pos strpos($tweet'#')) !== false) {
      if (
    $spacepos strpos($tweet' '$pos)) {
        
    $tweet=substr_replace($tweet''// fill this in //, // fill this in //);
      
    } else {
        
    $tweet=substr_replace($tweet''// fill this in //);
      
    }
    }
    $tweet=trim($tweet); 
    I'll leave the //fill this in// as an exercise to you
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  10. #10
    SitePoint Wizard gRoberts's Avatar
    Join Date
    Oct 2004
    Location
    Birtley, UK
    Posts
    2,439
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You'd need to use preg_replace...

    The problem is you'd need to decide on where to stop.

    Looking at your code I'd use:

    Code:
    $tweet = 'tweet #tweet diddly tweet #html';
    $tweet = preg_replace('/#([^ \r\n\t]+)/', '', $tweet);
    echo $tweet;
    hth


  11. #11
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by gRoberts View Post
    Code:
    $tweet = 'tweet #tweet diddly tweet #html';
    $tweet = preg_replace('/#([^ \r\n\t]+)/', '', $tweet);
    echo $tweet;
    Yup, you could also do that, although I'd replace [^ \r\t\n] with [^\s] since \s=[ \r\t\n], and I'd remove the backreference since we're actually interested in what it says and don't want to capture it for later use; we just want to remove it.
    Also, if you remove everything up until the next space you run into the chance that you'll end up with two consecutive spaces in the string, so I'd add an \s to the end as well, but make it optional since it doesn't have to be there (i.e., at the end of the string).
    Lastly, if there is a tag at the end of the string we could end up with a trailing space, but that's nothing rtrim can't handle.

    PHP Code:
    $tweet 'tweet #tweet diddly tweet #html';
    $tweet rtrimpreg_replace('/#[^\s]+\s?/'''$tweet) );
    echo 
    $tweet


    BTW. The solution using strpos is
    PHP Code:
    $tweet 'tweet #tweet diddly tweet #html';
    while ((
    $pos strpos($tweet'#')) !== false) {
      if (
    $spacepos strpos($tweet' '$pos)) {
        
    $tweet=substr_replace($tweet''$pos$spacepos-$pos+1);
      } else {
        
    $tweet=substr_replace($tweet''$pos);
      }
    }
    $tweet=rtrim($tweet); 
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  12. #12
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    65 Post(s)
    Tagged
    0 Thread(s)
    Following chestertondevelopment's suggestion, I'd recommend looking at existing solutions for inspiration.
    Salathe
    Software Developer and PHP Manual Author.

  13. #13
    SitePoint Addict jamus's Avatar
    Join Date
    Jul 2004
    Location
    Devon, UK
    Posts
    301
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you ScallioXTX! Your code worked perfectly to do would I requested. I also learnt ALOT.

    However, I still need to look into the existing solutions as I've run into some other issues.

  14. #14
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by jamus View Post
    I also learnt ALOT.
    Good, I'm glad to hear that
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •