SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Member
    Join Date
    Jun 2008
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to automatically detect total pages

    Sometimes when I'm reading a review article on a certain product with too many pages on some websites, I want to go directly from the first page to the last and get to the conclusion.

    I'm trying to build a personal tool for firefox that retrieves the dependent pages and returns the conclusion text from the last page. I've already wrote the code to get the innerHTML based on a class getElementsByStyleClass(classname) to get the conclusion text but I want to be able to crawl through all the pages to the last one and retrieve the conclusion text without leaving the first page.

    Is this possible? Was I clear?

  2. #2
    We're from teh basements.
    Join Date
    Apr 2007
    Posts
    1,205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If I understand you correctly, you want to crawl the pagination navigation links for a multi-page article? The solution would rely on the pagination links having some distinguishing characteristic such as a class or id attribute that would allow your script to identify them as such. You would need to use AJAX to load the last page. Preferably, the pages would be valid XHTML so that you could load the page into a DOMDocument instance and use DOM methods to retrieve the desired fragment. (It sounds like you've already implemented the last part, retrieving the desired fragment.)

  3. #3
    SitePoint Member
    Join Date
    Jun 2008
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for your reply. I think the best solution would be to look for a specific pattern starting at a maximum number of pages and trying to find it on the page.

    e.g:
    <a class="page" href=article.php?article_id=15&page=2">
    <a class="page" href=article.php?article_id=15&page=3">
    <a class="page" href=article.php?article_id=15&page=4">

    I would start from a maximum number of 5 and loop back. The first match would be the last page.

    After finding the match, how could I retrieve whetever value I wanted from that page without leaving the current page?

    Can you help or give me some tips on how to achieve this with javascript?

    tanx

  4. #4
    We're from teh basements.
    Join Date
    Apr 2007
    Posts
    1,205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If the links have a class attribute of "page", it's pretty straightforward - assuming no other links on the page besides the pagination links have that class. Just get all the page links and use the last one. This would be very easy indeed if getElementsByClassName were implemented in all browsers, but as far I know it isn't yet. So we need some extra coding to get the results we want.

    Code:
    var pageLinks = new Array();
    for ( var i = 0; i < document.getElementsByTagName('a').length; i++)  if (document.getElementsByTagName('a').item(i).getAttribute('class') == "page") pageLinks.push(document.getElementsByTagname('a').item(i));
    var lastPageUri  = pageLinks.pop().getAttribute('href');
    Then you would supply lastPageUri to the XMLHttpRequest object to load the page.

  5. #5
    SitePoint Member
    Join Date
    Jun 2008
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Thumbs up

    I think this is exactly what I need.

    Thank you very much for the help!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •