SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 28
  1. #1
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Determining outgoing HTTP requests

    How would you go about determining the outgoing HTTP requests of a page? Maybe with something like the PEAR HTTP_Request package (http://pear.php.net/manual/en/packag...-request.php)?

    EDIT: Just to clarify, ideally I would want a function that takes a URL as a parameter and returns all the outgoing HTTP requests of said URL.

  2. #2
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Please clarify your definition of outgoing http request.

    Are you talking about http requests to other server that your target script makes? If so, no way you're gonna do that.

    If by any chance you're talking about outgoing urls in a page, use DOM.
    Saul

  3. #3
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm trying to figure a way to capture all the content that is being sent to the browser. If PHP can't be used to accomplish it, what other way is there to get it done?

  4. #4
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    So you mean HTTP response? You can use fsockopen
    Saul

  5. #5
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    And how would I use fsockopen to accomplish what I want? I need everything that would be sent to the browser, including videos and such.

  6. #6
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    365
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    When loading a webpage your browser makes many requests.

    To cant think of a way yo capture everything from a php/server side point of view, as things like images/scripts/etc might not even access the server your script it running on.

    If you plan to capture everything it is probably better to do it from a client side point of view, and do it from the computer that is making a request.

    One possibly would be to write a php script, and use it as a proxy and send HTTP REQUESTS from it. then all the RESPONSES will goto the script, which you would then process and send back to the user. But it would be greatly difficult writing a script which emulates a browser, and you would still have troubles with flash or java

  7. #7
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    There are examples on fsockopen manual page, but to check all requests is not that simple as taliesinnz said. You'd have to mimic a browser.

    If all you need to do is sniff the requests, use firebug.
    Saul

  8. #8
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I see.

    Basically what I would need is a PHP script that replicates the 'NET' Inspector functionality of FireBug. All it would need to do is return a list.

    Could the PEAR package I mentioned in the first post be of any use?

  9. #9
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    You can use it yes. But that will not solve the main problem -- simulating the browser to find out all requests necessary. Perhaps you can use snoopy, although I'm not too sure about that.
    Saul

  10. #10
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    365
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Even with snoopy, you would have great trouble to get java/flash and any content which changes through AJAX or Javascript.

    May I ask why you are trying to achieve this, I might be able to suggest an alternative.

  11. #11
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I need to determine the validity of all page elements. So if there are any images, video files, etc, being displayed on the page, I would need to get a list of them all so that I can check them. All of this will be done on external sites and has to be performed by script.

  12. #12
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    365
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well you dont need to get all request data if you just want to validate URL's and content

    You can use the fucntion file_get_contents to open a url. and get the data

    So something along the lines of

    PHP Code:
    function check$url $errors$processsublinks 0)
    {
    if( (
    $file =  file_get_contents$url )) === FASLE )) 
         
    $errors[] = 'Error in' $link; return FALSE;

    if( ! 
    processsublinks ) return TRUE;

    -- 
    insert function to find all links here
    $links 
    find_links$file);

    foreach( 
    $link in $links)
      
    check($link,$errors);

    return 
    TRUE;
    }
    $errors = Array();
    check('google.com'$errors1); 
    should do the trick

  13. #13
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The thing is that I need to check content that's not mentioned in the page source.

  14. #14
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    365
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What content exactly?

  15. #15
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Things such as video and music files.

  16. #16
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    365
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The only way to do it is to have your script "fake" been a browser. And this includes all flash, java, javascript, content.

    Or have a program running on the clients computer.

    None of the music or video will even touch your server in normal operation

  17. #17
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, then that's what I need. Is this the snoopy that was mentioned earlier?
    http://sourceforge.net/projects/snoopy/

    If so, I've looked at the code and it didn't really help me get a better idea of exactly what I need to do.

  18. #18
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    How about this...

    Is it possible to send request headers that ask for the return of something other than what it was sent to? For example, sending request headers to www.google.com and expecting the response headers from the google logo?

  19. #19
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    365
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    When you make a request, you make it for the object that you want returned, it could be image, web files, application, zip files etc.

    If you requested the google home page, and received the sitepoint site, things would not work right.

    You could do something simular to what you are suggesting using a http proxy.

    eg
    The client requests the google home page from the proxy server (could be a php script)
    The proxy server requests the page from google.com
    The proxy server sends the page back to the client.

    The proxy server then can modify files, links etc in the middle.

  20. #20
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What do you mean by : ...The proxy server then can modify files, links etc in the middle...?

    How could I get all the response headers of every page element? Like I said, not everything would be contained within the page source.
    Could this be of any use?
    http://php.mirrors.ebizlab.hit.bme.h...onseHeader.php

  21. #21
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    In order to get a response, you have to send a request. When you send a requeset for a page, you get only the response of a page, not what is linked in form of html, javascript, flash, etc. To get a response for every page element that is subject to, you have to make a different request for each and every one of them (mimic browser). And to do that, you have to parse the document.

    It is rather easy to parse html, to find the requests for images, js, css includes, etc. However, when it comes to javascript (ajax), flash, java applets and similar, it is in a range of very difficult to completely impossible.

    You can implement a parser for ajax requests (well who knows, maybe), but I think that's about your limit. Flash runs on its own engine on client machine, you will not mimic flash as it's not even open source. I won't even start about java applets; and the list goes on.

    The conclusion here is that html is the only realistic level you can work on. Let's face it, you're doing nothing but a search engine spider. Did you hear of a search engine that indexed ajax or flash? Neither did I.
    Saul

  22. #22
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Alright, that method won't work, but what about using some sort of packet sniffer?

  23. #23
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I don't see how that's possible. You would have to sniff ALL packets on a port of the target server. That means intercepting data of ALL server visitors. That's a criminal activity, as far as I know.
    Saul

  24. #24
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    15
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Couldn't the packet sniffer just analyze the packets between the client(server running the script in this case) and the server?

  25. #25
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Actually that makes little sense. You don't need to sniff packets coming your way, you receive them anyway. And for the target server to send them, you need to send a request.
    Saul


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •