SitePoint Sponsor

User Tag List

Results 1 to 23 of 23
  1. #1
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    jQuery Word Counter

    Code being used:

    Code:
    <script type="text/javascript">
    $('td.c_post').each(function () {
    var words = $(this).html().replace(/<blockquote>(.*?)<\/blockquote>/g, '').replace(/<div class="spoiler_toggle">(.*?)<\/div>/g, '').replace(/<div class="spoiler">(.*?)<\/div>/g, '').replace(/<br \/>/g, '').split(' ').length;
    $(this).append('<br /><br /><span class="word_count"><big><strong>' + words + '</strong> Words</big></span>');
    });
    </script>
    Here: http://s1.zetaboards.com/Cory/topic/4616513/

    See how it only counts each line when there is multiple lines, as the example shows in post #6, how do I prevent this and make it count all words? I'm using replace() so it won't count words in specific HTML elements and to remove the line breaks, and split() to remove the spaces so it will count all words, but it doesn't seem to be working correctly.

  2. #2
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    I see that you're replacing <br /> but from what I see on the page, there is <br> too.
    You can deal with that issue by instead using:

    Code:
    .replace(/<br[ \/]*>/g, ' ')
    If you now look at the resulting text after it's split by a space, you'll find that you have:

    Code:
    ["
    Alpha", "Bravo
    Charlie
    Delta
    Echo", "Foxtrot", "Gamma
    "]
    The reason for this is that the newline characters aren't being split up. You can fix that by splitting using /\s/ instead, which is a regular expression for a white-space separator.

    Code:
    .split(/\s/)
    When you do that you'll then have:

    Code:
    ["", "Alpha", "Bravo", "Charlie", "Delta", "Echo", "Foxtrot", "Gamma", ""]
    So you need to trim things off first, before splitting it.

    Here's a way of doing it, where you get the text first, and then rework it for the words.

    Code javascript:
    var text = $(this).html().replace(/<blockquote>(.*?)<\/blockquote>/g, '').replace(/<div class="spoiler_toggle">(.*?)<\/div>/g, '').replace(/<div class="spoiler">(.*?)<\/div>/g, '').replace(/<br[ \/]*>/g, ' '),
        words = $.trim(text).split(/\s/).length;

    That will give you a result of 7 words, which is correct for the example.

    But what happens though when there are multiple spaces between words, your word count will then be off again.
    You can deal with that by using the + symbol to have the split capture one or more pieces of white-space:

    Code:
    .split(/\s+/)
    Which leaves us with:

    Code javascript:
    var text = $(this).html().replace(/<blockquote>(.*?)<\/blockquote>/g, '').replace(/<div class="spoiler_toggle">(.*?)<\/div>/g, '').replace(/<div class="spoiler">(.*?)<\/div>/g, '').replace(/<br[ \/]*>/g, ' '),
        words = $.trim(text).split(/\s+/).length;
    Last edited by paul_wilkins; Mar 6, 2012 at 00:35.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  3. #3
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That sounds understandable, but I made the according changes and it still gives me the same result.

  4. #4
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    That sounds understandable, but I made the according changes and it still gives me the same result.
    Shall we get more detailed about things then?

    What you are wanting a way that results in the number of words from the following, right?

    Code:
    <img src="http://z3.ifrm.com/63/1/0/e661949/e661949.png" alt=":cblush:">
    <img src="http://z3.ifrm.com/63/1/0/e661950/e661950.png" alt=":cdrat:">
    <img src="http://z3.ifrm.com/63/1/0/e661951/e661951.png" alt=":facepalm:">
    <img src="http://z3.ifrm.com/63/1/0/e661952/e661952.png" alt=":cglare:">
    <img src="http://z3.ifrm.com/63/1/0/e661953/e661953.png" alt=":cmeh:">
    <img src="http://z3.ifrm.com/63/1/0/e661954/e661954.png" alt=":cP:">
    Number of words in the above: 0


    and from:

    Code:
    <strong>Test</strong><br><br>
    <em>Test</em><br><br>
    <u>Test</u><br><br>
    <del>Test</del><br><br>
    <big>Test</big><br><br>
    <small>Test</small><br><br>
    <blockquote><dl><dt>Code: </dt><dd>&nbsp;</dd></dl><code style="width: 1079px; display: block; ">Test</code></blockquote><br><br>
    <blockquote><dl><dt>Quote:</dt><dd>&nbsp;</dd></dl><div>Test</div></blockquote><br><br>
    <a href="http://test.com/" target="_blank" rel="nofollow">Test</a><br><br>
    <div class="spoiler_toggle">Spoiler: click to toggle</div>
    <div class="spoiler" style="display:none;">Test</div>
    <ul><li style="display:none"><br></li><li>Test<br></li><li>Test<br></li></ul>
    Number of words in the above: 11

    Is that right?
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  5. #5
    doRighteousDeeds++
    Join Date
    Aug 2006
    Location
    تركيا Turkey Türkiye
    Posts
    266
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If I understand
    I try this code:
    Code:
      
    <html>
    <head>
    <meta chaset="utf-8">
    <script type="text/javascript" src="http://code.jquery.com/jquery-latest.pack.js"></script> 
    <script type="text/javascript">
    // http://www.eburhan.com/jquery-dunyasina-adim-atiyoruz/
    // http://www.w3schools.com/jsref/jsref_match.asp
    
    $(document).ready(function(){
    
    var text = $('td.c_post').html();
    var c = text.match(/(<[^<]+>)(\w+\:*\s*)+(<[^<]+>)/g).join(' ').replace(/(<[^<]+>)/g,'').match(/\w+\:*\s*/g).length;
    // var c = text.match(/(<[^<]+>)(\w+\:*\s*)+(<[^<]+>)/g).join(' ').replace(/(<[^<]+>)/g,'').match(/ /g).length;
    // alert('word numbers = '+c); // 18
    $('td.c_post').append('<br /><br /><span class="word_count"><big><strong>' + c + '</strong> Words</big></span>');
    
    });
    </script>
    </head>
    <body>
    <table><tr>
    <td class="c_post">
    <strong>Test</strong><br><br>
    <em>Test</em><br><br>
    <u>Test</u><br><br>
    <del>Test</del><br><br>
    <big>Test</big><br><br>
    <small>Test</small><br><br>
    <blockquote><dl><dt>Code: </dt><dd>&nbsp;</dd></dl><code style="width: 1079px; display: block; ">Test</code></blockquote><br><br>
    <blockquote><dl><dt>Quote:</dt><dd>&nbsp;</dd></dl><div>Test</div></blockquote><br><br>
    <a href="http://test.com/" target="_blank" rel="nofollow">Test</a><br><br>
    <div class="spoiler_toggle">Spoiler: click to toggle</div>
    <div class="spoiler" style="display:none;">Test</div>
    <ul><li style="display:none"><br></li><li>Test<br></li><li>Test<br></li></ul>
    </td>
    </tr></table>
    </body>
    </html>
    The above code is working in Firefox 4.0b9 and Konqueror 4.5.5

    Code:
      
    $(document).ready(function(){
    
    var text = $('td.c_post').html();
    
    alert(text);
    
    var re = /(<[^<]+>)(\w+\:*\s*)+(<[^<]+>)/g;
    
    var t = text.match(re);
    alert('t =   '+t);
    alert('t.length =   '+t.length);
    var tt = t.join(' ').replace(/(<[^<]+>)/g,'');
    alert('tt =   '+tt);
    var c = tt.match(/\w+\:*\s*/g).length;
    alert(c);
    The Time Through Ages. In the Name of Allah, Most Gracious, Most Merciful.
    1. By the Time, 2. Verily Man is in loss,
    3. Except such as have Faith, and do righteous deeds, and (join together) in the mutual enjoining of Truth, and of Patience and Constancy.

  6. #6
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    paul_wilkins: Yes, I basically need to match the amount of words in each post and have it display the amount of words at the bottom of each post. I'm getting 18 words in the two posts with the emoticons, and I edited the third post to put more text inside of the blockquote and now it is giving me 25 words instead of 11. I don't want words to be counted that are inside of the HTML replaced in the string. Yesterday for post #6 it was saying there was only 1 word in the post even though there was 10, it appears to be counting the correct amount of words on each new line now, so the only issue at the moment is it still appears to be counting words in the replaced HTML.

    I get this error in Firebug, muazzez: http://prntscr.com/6s97q

  7. #7
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    paul_wilkins: Yes, I basically need to match the amount of words in each post and have it display the amount of words at the bottom of each post. I'm getting 18 words in the two posts with the emoticons, and I edited the third post to put more text inside of the blockquote and now it is giving me 25 words instead of 11. I don't want words to be counted that are inside of the HTML replaced in the string.]
    Because there are so many changes that you are making to the posts on that site, I want you to give here some examples of the HTML code for posts that you want to count, and to also help clarify the situation with them by showing how many words you expect to find in those examples.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  8. #8
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here's the post I am concerned about:

    HTML Code:
    <td class="c_post">
    						<strong>Test</strong><br><br><em>Test</em><br><br><u>Test</u><br><br><del>Test</del><br><br><big>Test</big><br><br><small>Test</small><br><br><blockquote><dl><dt>Code: </dt><dd>&nbsp;</dd></dl><code style="width: 686px; display: block;">Test</code></blockquote><br><br><blockquote><dl><dt>Quote:</dt><dd>&nbsp;</dd></dl><div>Test Test Test Test Test Test Test Test Test Test Test</div></blockquote><br><br><a rel="nofollow" target="_blank" href="http://test.com/">Test</a><br><br><div class="spoiler_toggle">Spoiler: click to toggle</div><div style="display:none;" class="spoiler">Test</div><ul><li style="display:none"><br></li><li>Test<br></li><li>Test<br></li></ul>
    						
    						
    						<div class="editby">Edited by <strong><a href="http://s1.zetaboards.com/Cory/profile/62973/">Cory</a></strong>, 59 minutes ago.</div>
    					<br><br><span class="word_count"><big><strong>25</strong> Words</big></span><span><br><div style="display: none;" class="likebg" id="like4616513.671140"></div></span></td>
    It should only count 9 words, it is counting 25 words. I don't want it to count what's in between the blockquotes and DIVs. I can split the editby DIV with the replace method I originally used, but I just need it to work correctly. When I added more text inside the blockquote, it added more words to the total word count. Every other post seems to be fine, I don't really need it to count images like it's doing in the first two posts, but I don't mind that as much as it counting the replaced HTML.

  9. #9
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    Here's the post I am concerned about:

    HTML Code:
    <td class="c_post">
    						<strong>Test</strong><br><br><em>Test</em><br><br><u>Test</u><br><br><del>Test</del><br><br><big>Test</big><br><br><small>Test</small><br><br><blockquote><dl><dt>Code: </dt><dd>*</dd></dl><code style="width: 686px; display: block;">Test</code></blockquote><br><br><blockquote><dl><dt>Quote:</dt><dd>*</dd></dl><div>Test Test Test Test Test Test Test Test Test Test Test</div></blockquote><br><br><a rel="nofollow" target="_blank" href="http://test.com/">Test</a><br><br><div class="spoiler_toggle">Spoiler: click to toggle</div><div style="display:none;" class="spoiler">Test</div><ul><li style="display:none"><br></li><li>Test<br></li><li>Test<br></li></ul>
    						
    						
    						<div class="editby">Edited by <strong><a href="http://s1.zetaboards.com/Cory/profile/62973/">Cory</a></strong>, 59 minutes ago.</div>
    					<br><br><span class="word_count"><big><strong>25</strong> Words</big></span><span><br><div style="display: none;" class="likebg" id="like4616513.671140"></div></span></td>
    It should only count 9 words, it is counting 25 words. I don't want it to count what's in between the blockquotes and DIVs. I can split the editby DIV with the replace method I originally used, but I just need it to work correctly. When I added more text inside the blockquote, it added more words to the total word count. Every other post seems to be fine, I don't really need it to count images like it's doing in the first two posts, but I don't mind that as much as it counting the replaced HTML.
    Righto - after working through that, the following seems to do the job nicely.

    Code javascript:
    var $html,
        html,
        text,
        words;
    $html = $('.c_post');
    $('blockquote', $html).remove();
    $('div', $html).remove();
    html = $html.html().replace(/<br[ \/]*>/gm, ' ');
    text = $(html).text();
    words = $.trim(text).split(/\s+/).length;

    The only difficult to understand part in there is the ".replace(/<br[ \/]*>/gm, ' ')" piece.

    The /<br[ \/]*>/ part matches either <br> or <br /> or even <br/>
    and the gm part means global and multiline, which performs multiple matches (and replacements) across multiple lines of the matching HTML code
    The reason why you replace the break with a space is that you don't want "text<br>text" to end up being "texttext" if the break is just removed.

    If you find other pieces of your HTML code isn't behaving as you expect with the word count, it should be possible to update the script to work in with that as well.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  10. #10
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is it suppose to look like this?

    Code:
    <script type="text/javascript">
    $('td.c_post').each(function () {
    var $html,
        html,
        text,
        words;
    $html = $(this);
    $('blockquote', $html).remove();
    $('div', $html).remove();
    html = $html.html().replace(/<br[ \/]*>/gm, ' ');
    text = $(html).text();
    words = $.trim(text).split(/\s+/).length;
    $(this).append('<br /><br /><span class="word_count"><big><strong>' + words + '</strong> Words</big></span>');
    });
    </script>
    If so, the count is only correct in post #3, the three posts below that it states that there is 1 word, although there are 10 words, and post #7 has 7 words. I haven't made any edits since the last I mentioned. The other thing that appears to be happening is blockquotes and DIV's are actually being removed from posts, I don't want them removed, I just want the text within them to not be added to the total word count. Sorry for making this so confusing, I suppose I should have explained myself more clearly.

  11. #11
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    If so, the count is only correct in post #3
    That's because it has been tested only on the one example that was provided earlier.

    Which other examples are different enough from that previous example to require further development?
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  12. #12
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Other examples:

    Code:
    <td class="c_post">
    						One Two Three Four Five Six Seven Eight Nine Ten
    						
    						
    						
    					<br><br><span class="word_count"><big><strong>1</strong> Words</big></span></td>
    Shows 1 word, should be 10 words.

    Code:
    <td class="c_post">
    						One<br><br>Two<br><br>Three<br><br>Four<br><br>Five<br><br>Six<br><br>Seven<br><br>Eight<br><br>Nine<br><br>Ten
    						
    						
    						
    					<br><br><span class="word_count"><big><strong>1</strong> Words</big></span></td>
    Shows 1 word, should be 10 words.

    Code:
    <td class="c_post">
    						Test Test<br><br>Test<br><br>Test<br><br>Test Test Test
    						
    						
    						
    					<br><br><span class="word_count"><big><strong>1</strong> Words</big></span></td>
    Shows 1 word, should be 7 words.

  13. #13
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    It seems that the following line has trouble if no tags exist at that stage in the HTML string.

    Code JavaScript:
    text = $(html).text();

    So all that's needed there is to check if HTML contains any tags, if it doesn't, just assign that tag-free HTML content straight over to the text variable.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  14. #14
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry, but how would I do that exactly?

  15. #15
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    Sorry, but how would I do that exactly?
    How would you do an if statement?
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  16. #16
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    if ($('td.c_post:contains(blockquote), td.c_post:contains(div)').length) {
    //Parse Code
    }
    Like that?

  17. #17
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    Like that?
    No, those have already been removed by earlier code, remember?

    Perhaps it would be easier for you to just wrap the html inside of a <div>, so that the .html() method can be guaranteed to have something to work with.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  18. #18
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry man, but I think I give up. I evidently wasn't meant to create a script like this, and it's not even for me, I wanted to create it for someone else. Sorry to waste your time.

  19. #19
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    Sorry man, but I think I give up. I evidently wasn't meant to create a script like this, and it's not even for me, I wanted to create it for someone else. Sorry to waste your time.
    What I meant when I said "to just wrap the html inside of a <div>" is this:

    Code javascript:
    text = $('<div>' + html + '</div>').text();
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  20. #20
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, that seemed to work, now how do I make it so it doesn't actually remove blockquotes and DIV's, but just doesn't count the text within them?

  21. #21
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    OK, that seemed to work, now how do I make it so it doesn't actually remove blockquotes and DIV's, but just doesn't count the text within them?
    That woud be this part:

    Code javascript:
    $html = $(this);

    What you need is to change that from being a reference, to cloning a copy of it by using .clone()

    For example:


    Code javascript:
    $html = $(this.clone());
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript

  22. #22
    SitePoint Enthusiast Cory R's Avatar
    Join Date
    Mar 2009
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you so much for your time and patience, it works just as intended now!

  23. #23
    Unobtrusively zen silver trophybronze trophy
    paul_wilkins's Avatar
    Join Date
    Jan 2007
    Location
    Christchurch, New Zealand
    Posts
    14,526
    Mentioned
    83 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Cory R View Post
    Thank you so much for your time and patience, it works just as intended now!
    You have just experienced the process of development. It's a step-by-step process where steadily, piece by piece, issue by issue, things are resolved until you end up with something that has been especially crafted to do the job well.
    Programming Group Advisor
    Reference: JavaScript, Quirksmode Validate: HTML Validation, JSLint
    Car is to Carpet as Java is to JavaScript


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •