SitePoint Sponsor |
|
User Tag List
Results 1 to 25 of 28
-
Oct 25, 2007, 09:24 #1
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Determining outgoing HTTP requests
How would you go about determining the outgoing HTTP requests of a page? Maybe with something like the PEAR HTTP_Request package (http://pear.php.net/manual/en/packag...-request.php)?
EDIT: Just to clarify, ideally I would want a function that takes a URL as a parameter and returns all the outgoing HTTP requests of said URL.
-
Oct 25, 2007, 10:52 #2
Please clarify your definition of outgoing http request.
Are you talking about http requests to other server that your target script makes? If so, no way you're gonna do that.
If by any chance you're talking about outgoing urls in a page, use DOM.Saul
-
Oct 25, 2007, 11:37 #3
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I'm trying to figure a way to capture all the content that is being sent to the browser. If PHP can't be used to accomplish it, what other way is there to get it done?
-
Oct 25, 2007, 11:43 #4
So you mean HTTP response? You can use fsockopen
Saul
-
Oct 25, 2007, 11:55 #5
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
And how would I use fsockopen to accomplish what I want? I need everything that would be sent to the browser, including videos and such.
-
Oct 25, 2007, 12:20 #6
- Join Date
- Aug 2007
- Posts
- 365
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
When loading a webpage your browser makes many requests.
To cant think of a way yo capture everything from a php/server side point of view, as things like images/scripts/etc might not even access the server your script it running on.
If you plan to capture everything it is probably better to do it from a client side point of view, and do it from the computer that is making a request.
One possibly would be to write a php script, and use it as a proxy and send HTTP REQUESTS from it. then all the RESPONSES will goto the script, which you would then process and send back to the user. But it would be greatly difficult writing a script which emulates a browser, and you would still have troubles with flash or java
-
Oct 25, 2007, 12:48 #7
There are examples on fsockopen manual page, but to check all requests is not that simple as taliesinnz said. You'd have to mimic a browser.
If all you need to do is sniff the requests, use firebug.Saul
-
Oct 25, 2007, 12:58 #8
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I see.
Basically what I would need is a PHP script that replicates the 'NET' Inspector functionality of FireBug. All it would need to do is return a list.
Could the PEAR package I mentioned in the first post be of any use?
-
Oct 25, 2007, 13:12 #9
You can use it yes. But that will not solve the main problem -- simulating the browser to find out all requests necessary. Perhaps you can use snoopy, although I'm not too sure about that.
Saul
-
Oct 25, 2007, 14:11 #10
- Join Date
- Aug 2007
- Posts
- 365
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Even with snoopy, you would have great trouble to get java/flash and any content which changes through AJAX or Javascript.
May I ask why you are trying to achieve this, I might be able to suggest an alternative.
-
Oct 25, 2007, 14:56 #11
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I need to determine the validity of all page elements. So if there are any images, video files, etc, being displayed on the page, I would need to get a list of them all so that I can check them. All of this will be done on external sites and has to be performed by script.
-
Oct 25, 2007, 16:14 #12
- Join Date
- Aug 2007
- Posts
- 365
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Well you dont need to get all request data if you just want to validate URL's and content
You can use the fucntion file_get_contents to open a url. and get the data
So something along the lines of
PHP Code:function check( $url , $errors, $processsublinks = 0)
{
if( ($file = file_get_contents( $url )) === FASLE ))
$errors[] = 'Error in' . $link; return FALSE;
if( ! processsublinks ) return TRUE;
-- insert function to find all links here
$links = find_links( $file);
foreach( $link in $links)
check($link,$errors);
return TRUE;
}
$errors = Array();
check('google.com', $errors, 1);
-
Oct 25, 2007, 16:23 #13
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
The thing is that I need to check content that's not mentioned in the page source.
-
Oct 25, 2007, 18:24 #14
- Join Date
- Aug 2007
- Posts
- 365
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
What content exactly?
-
Oct 25, 2007, 18:57 #15
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Things such as video and music files.
-
Oct 26, 2007, 02:02 #16
- Join Date
- Aug 2007
- Posts
- 365
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
The only way to do it is to have your script "fake" been a browser. And this includes all flash, java, javascript, content.
Or have a program running on the clients computer.
None of the music or video will even touch your server in normal operation
-
Oct 26, 2007, 08:58 #17
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
OK, then that's what I need. Is this the snoopy that was mentioned earlier?
http://sourceforge.net/projects/snoopy/
If so, I've looked at the code and it didn't really help me get a better idea of exactly what I need to do.
-
Oct 26, 2007, 12:46 #18
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
How about this...
Is it possible to send request headers that ask for the return of something other than what it was sent to? For example, sending request headers to www.google.com and expecting the response headers from the google logo?
-
Oct 26, 2007, 21:45 #19
- Join Date
- Aug 2007
- Posts
- 365
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
When you make a request, you make it for the object that you want returned, it could be image, web files, application, zip files etc.
If you requested the google home page, and received the sitepoint site, things would not work right.
You could do something simular to what you are suggesting using a http proxy.
eg
The client requests the google home page from the proxy server (could be a php script)
The proxy server requests the page from google.com
The proxy server sends the page back to the client.
The proxy server then can modify files, links etc in the middle.
-
Oct 26, 2007, 22:46 #20
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
What do you mean by : ...The proxy server then can modify files, links etc in the middle...?
How could I get all the response headers of every page element? Like I said, not everything would be contained within the page source.
Could this be of any use?
http://php.mirrors.ebizlab.hit.bme.h...onseHeader.php
-
Oct 27, 2007, 03:37 #21
In order to get a response, you have to send a request. When you send a requeset for a page, you get only the response of a page, not what is linked in form of html, javascript, flash, etc. To get a response for every page element that is subject to, you have to make a different request for each and every one of them (mimic browser). And to do that, you have to parse the document.
It is rather easy to parse html, to find the requests for images, js, css includes, etc. However, when it comes to javascript (ajax), flash, java applets and similar, it is in a range of very difficult to completely impossible.
You can implement a parser for ajax requests (well who knows, maybe), but I think that's about your limit. Flash runs on its own engine on client machine, you will not mimic flash as it's not even open source. I won't even start about java applets; and the list goes on.
The conclusion here is that html is the only realistic level you can work on. Let's face it, you're doing nothing but a search engine spider. Did you hear of a search engine that indexed ajax or flash? Neither did I.Saul
-
Oct 27, 2007, 04:48 #22
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Alright, that method won't work, but what about using some sort of packet sniffer?
-
Oct 27, 2007, 07:05 #23
I don't see how that's possible. You would have to sniff ALL packets on a port of the target server. That means intercepting data of ALL server visitors. That's a criminal activity, as far as I know.
Saul
-
Oct 27, 2007, 14:00 #24
- Join Date
- Apr 2006
- Posts
- 15
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Couldn't the packet sniffer just analyze the packets between the client(server running the script in this case) and the server?
-
Oct 27, 2007, 14:16 #25
Actually that makes little sense. You don't need to sniff packets coming your way, you receive them anyway. And for the target server to send them, you need to send a request.
Saul
Bookmarks