PHP login script to remote server

Hi all

First time for me with this, I need to have a script running on my server access a remote server that is protected with HTTPaccess security.

So my script has to connect to a remote server and supply a username and password, once in there Ive got to map the directory structure and produce a database reference of files and there positions.

Anyone done this, or have any pointers ?

TIA

Terry

If you must do this via http, then how do you plan to get a list of files and directorys? Does this remote web server show the list of files/dirs when you view a dir? If so, you can parse the html with regex.

I need to have a script running on my server access a remote server that is protected with HTTPaccess security.

So my script has to connect to a remote server and supply a username and password, once in there Ive got to map the directory structure and produce a database reference of files and there positions.

Can you explain what you’re wanting to do with this.

That way we can probably show you a better way of accomplishing this - especially if you have access to that server.

If you don’t, well I’d need to know what this is for for ethical reasons.

Hi

The remote server only has HTML response, it serves a webpage starting at the webroot directory, somewhat like fighting your way through a DOS machine. It doesnt run Apache or IIS just a proprietry application to serve its shared files.

So yes I will parse the initial webpage save the info to database then access each sub directory and do the same, so on until all directorys have been parsed and all files read.

Its not ideal, but my hands are somewhat tied, I cant get direct access to the remote machine and I need to create an updatable database of its content.

My first concern was actually gaining access to the machine given that its protected by user and password, (yes I have the codes).

@arkinstall

Yes I have the access codes, the machine is located in the states, Im in Europe, the client doesnt want to change the machine in any way shape or form, but wants me to provide a file search facility for its content, since its not installed with Apache, IIS, MySQL or PHP and is of some age, I need to do this all remotely, I’ll then host the database on one of my servers so clients can search for files and get locations, and a hot link to the file.

Hope that makes it clearer

OK Guys, Ive got the webpage loading into a variable so now Im on the directory parsing bit, if anyone know a better way about it, Im all ears.

Terry

If you post the html, we can help you parse it.

Is this thing just a folder with the top directory shared? I was trying to use this function that I found, but haven’t played with it enough to be happy.


<?php
function ListFolder($path)
{
    //using the opendir function
    $dir_handle = @opendir($path) or die("Unable to open $path");

    //Leave only the lastest folder name
    $dirname = end(explode("/", $path));

    //display the target folder.
    $dirname = str_replace('_',' ',$dirname);
    echo ("<li><a href=\\"$path\\">$dirname</a>\
");
    echo "<ul>\
";
    while (false !== ($file = readdir($dir_handle)))
    {
        if($file!="." && $file!="..")
        {
            if (is_dir($path."/".$file))
            {
                //Display a list of sub folders.
                ListFolder($path."/".$file);
            }
            else
            {
                //Display a list of files.
                $fileName = str_replace('_',' ',$file);
                echo "<li><a href=\\"$path/$file\\">$fileName</a>";
                echo "</li>";
            }
        }
    }
    echo "</ul>\
";
    echo "</li>\
";

    //closing the directory
    closedir($dir_handle);
}
?>

No, the top directory has about 50 subs and each of those has between 20 and 48, and some of those have subs too, there is no guarantee that the client wont extend this backwards to many subs so I need to allow for this.

I have parsed the page to leave me an array with just the directory and file links, directorys can be identified as have the structure

/directoryname/

files will have an extension .xxx or .xxxx

I’ll have a look at your code see if I can gleen anything from it, thanks

Hi guys

Ive got my script up and running and it works fine except for one issue, the old remote machine that is serving the pages is somewhat slow, partly due to its speed, and partly because of the pages are huge and take a long while to compile. So I can sit there waiting for 50 seconds before the server decides to spit out the page.

This is causing me time out issues. Ive set cURL timeout to 300seconds, PHP execution time (local) to 300 secs, but am still getting issues, I think I need a way to feed something to the browser whilst cURL is waiting. When processing the smaller directories the script runs fine for several minutes without problems.

Any ideas


<?php
if(!ini_set("max_execution_time", "300")){
echo 'new ex time not set';    
} 
for($i = 0; $i < 40000; $i++){
 echo ' '; // extra spaces
}
// give the browser something to stary
flush();
 
// connect to database
code here
 
$baseurl = 'http://website.net'; // Base URL
$keeplooping = 1; //flag
// Check if finished last time
$sql = mysql_query("SELECT Directory
      FROM lenadir
      WHERE Filesread = '0'
         ");
if (mysql_num_rows($sql) == 0){ // If all directories were filed start again by clearing out
 // delete all directory entries to start again
 $bin = mysql_query("DELETE FROM lenafile
      WHERE '1' = '1' 
         "); 
}
 // get first directory
$row = mysql_fetch_array($sql); //get first directory not read from list
$directory = $row['Directory']; //set search directory for next pass  
 
while($keeplooping == 1){ // Keep going round till told to stop

// Read and decode HTML Response
set_time_limit(300);
$cURL = curl_init();
curl_setopt($cURL, CURLOPT_URL, $baseurl.$directory);
curl_setopt($cURL, CURLOPT_HEADER, 0);
curl_setopt($cURL, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($cURL, CURLOPT_CONNECTTIMEOUT, 300);
curl_setopt($cURL, CURLOPT_TIMEOUT, 300);
curl_setopt($cURL, CURLOPT_HTTPAUTH, CURLAUTH_ALL);
curl_setopt($cURL, CURLOPT_USERPWD, "user:pass");
$strPage = curl_exec($cURL);
curl_close($cURL);
echo $strPage;
// Parse HTML page to remove links
$regexp = "<a\\s[^>]*href=(\\"??)([^\\" >]*?)\\\\1[^>]*>(.*)<\\/a>";
  if(preg_match_all("/$regexp/siU", $strPage, $matches, PREG_SET_ORDER)) {
    // $matches[2] = array of link addresses with path
    // $matches[3] = array of link text 
  }  
array_splice($matches, 0, 9); //remove first 9 links as they are part of FSP program
array_splice($matches, -1, 1); //remove last link as its part of FSP program

// Step through each link, only take notice of directories, add to dirlist in database
foreach($matches as $element){
 $link = $element[2];
 $searchtext = preg_replace("/[^a-zA-Z0-9s]/", " ", $element[3]); //Remove all non Alphanum chars
 $searchtext = preg_replace('/\\s\\s+/', ' ', $searchtext); // Remove double whitespaces
 if ($link[strlen($link)-1] != '/'){ //Only process is link isnt a /Directory/
  echo '<a href="http://website.net' . $link . '">' . $searchtext . '</a><br />';
  $sql = @mysql_query("INSERT INTO lenafile
        SET Path = '$link',
            Search = '$searchtext'
         ");
  if($sql){echo 'OK';}else{echo 'NO';}       
 } // end if
} // end foreach
// flag directory as read if not root as root isnt in database
 $sql = mysql_query("UPDATE lenadir
      SET  Filesread = '1'
      WHERE Directory = '$directory' 
      "); 
// Read directorys not yet searched into array
$sql = mysql_query("SELECT Directory
      FROM lenadir
      WHERE Filesread = '0'
         ");
if (mysql_num_rows($sql) == 0){ // If all directories are files we are finished
 $keeplooping = 0; //cancel loop
}else{ // if not search first unsearched directory
 $row = mysql_fetch_array($sql); //get first directory not read from list
 $directory = $row['Directory']; //set search directory for next pass  
} // end ifelse

} // end while looping=1
echo 'All Files In Directories Stored';
?>


Any ideas

Yeah.

Make a cron-job store a database of all the files and folders etc. Then make te PHP file search that.

That way, you’ll be delivering faster results and you’d also be saving alot of processing power.

Why is the client so adement not to change the existing server? Make sure they realise that the speed issues are a direct cause of that and not any fault in your code.

But I still have to get the data to the script in the first place through cURL, would starting the script from CRON get rid of the timeouts.

I already have 3 scripts, the first scans all directories using cURL and stores each directory with its path in a database table.

The second examines files within each directory listed in the database and puts the file name, search words in a second table. (posted above)

The third is simply an web interface to search the database and supply results.

Solved by adding the command

 ignore_user_abort(1); 

and removing all output to the browser.

This keeps the script going even when the browser times out or loses connection.