SitePoint Sponsor

User Tag List

Results 1 to 14 of 14
  1. #1
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Detecting orphan files

    Does anyone know of a PHP script that will find orphan files on a web site? - i.e. files that are not referenced in any other files, nor in a database.

    I've just inherited a site that has not been maintained very well. There appear to be hundreds of files scattered about which are not referenced anywhere - I need to be sure that they are true orphans before I remove them.

    Regards
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  2. #2
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok - I guess such a script doesn't exist. Looks like I'll have to write one myself.

    If anyone is interested in the end result, let me know.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  3. #3
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Searching "PHP find orphaned files" turns up various solutions, some involve a good IDE, IDE addons and so on.

    If its html/js/image files then I think it would be easy enough, but when it comes to include files outside of the document root (which is what I originally imagined you meant), then I think you'd be pushed.

    For that you'd maybe want something which runs for a while which "touches" the included file somehow, not even sure if that would do-able, unless all include classes were brought in via an autoloader.

  4. #4
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks cups

    I have scoured the net for a script but failed miserably.

    I'm looking at attempting to code the following:

    1. Build a repository file containing all the contents of every text-based file on the site - I already have a function that will perform this, and it doesn't take too long, e.g. a site with 2000+ files takes around 10 seconds, creating a file of around 24mb. Depends on how big the files are, of course.

    2. Do a recursive scan of the site, listing all file names on the site and their paths.

    3. Then scan the repository file for each file name (not path) found in step 2. This obviously wouldn't be 100% as files with the same file name might be in various locations. However, it would be a reasonably good indication. I don't think I would be able to search for the path as links could contain relative or absolute paths and might be referenced from various locations. Stand to be corrected here.

    4. Produce a list of all the files in the web, indicating whether or not a reference to the file was found.

    5. If no reference is found in step 4, provide the ability to search an entire database for reference to that file name, if the site is db driven that is.

    The above would only cover internal references to the files and possibly include some false positives, but that would be enough for my purposes.

    If you or anyone can throw in same more ideas, it would be appreciated.

    Regards to all
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  5. #5
    SitePoint Wizard lorenw's Avatar
    Join Date
    Feb 2005
    Location
    was rainy Oregon now sunny Florida
    Posts
    1,104
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    It may be time consuming but try opening Putty and run this from the command line.
    Code:
    find /var/www/  -name "*".php  -type f -print0  | xargs -0 grep oddfile.php | uniq -c  | sort -u  | cut -d":" -f1 | awk '{print "-rf " $2}' | uniq
    This will find in all of the .php files if there is a reference to oddfile.php be it an include or require.
    You may have some files that end in .inc so check them also.
    If nothing is found, you can mark the file as unused or delete it.
    What I lack in acuracy I make up for in misteaks

  6. #6
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the suggestion lorenw

    Unfortunately, putty is not an option. I have ftp access and a control panel - that's it. As you say, it would be kinda time consuming too.

    Regards
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  7. #7
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,077
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Is there any sort of CMS in use or is the site currently a collection of static pages?

    How many pages are there atm?

    Do you expect many pages to be added to the site in the future?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  8. #8
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi SpacePhoenix

    The site belongs to a glazing company and it's running on joomla. There are hundreds, possibly thousands of image files in various directories. Not sure how they want to proceed yet, but my first task is to tidy up the damn thing.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  9. #9
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Happy new year by the way
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  10. #10
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,077
    Mentioned
    103 Post(s)
    Tagged
    0 Thread(s)
    Before you proceed any further you should make sure that the Joomla installation is up to date, the same goes for any plugins that may be in use. Also before attempting to identify what files are in use and what is an "orphan" file make a complete backup of all the files and of the complete database in case anything goes wrong
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  11. #11
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have limited experience with joomla, but it appears to be up to date, and everything is already backed up.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  12. #12
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    At last, I have a working prototype, with front-end up and running and it seems to do the trick.

    Anyone care to test it for me?

    find_orphans.zip

    Best put it in a password protected directory - form validation is minimal.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  13. #13
    SitePoint Member
    Join Date
    Sep 2013
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I used this code from my own

    Code PHP:
    <?php
    /*
    Autor/Author: Fernando Gámbaro fgambaro - hotmail . com
    Fecha/date: 20/09/2013 - 2013/09/20
     
    El objetivo principal de este programa es crear una lista con aquellos archivos que 
    no están siendo referenciados dentro de un sitio web.
    Para ello creamos un array con todos los archivos del sitio 
    (incluidos sub carpetas) seleccionamos que archivos buscar, y para cada archivo 
    del tipo php, html, css o js, buscamos referencias dentro de ellos.
    Algunas funciones las obtuve desde internet
     
    The main objective of this program is to create a list of files that are not being referenced in a website.
    To do this we create an array of all the site files (including sub folders) select which files 
    search, and for each file type php, html, css or js, seek references within them.
    Some functions are got from the internet
     
    */
    function listdir($dir='.') { 
        if (!is_dir($dir)) { 
            return false; 
        } 
     
        $files = array(); 
        listdiraux($dir, $files); 
     
        return $files; 
    } 
     
    function listdiraux($dir, &$files) { 
        $handle = opendir($dir); 
        while (($file = readdir($handle)) !== false) { 
            if ($file == '.' || $file == '..') { 
                continue; 
            } 
            $filepath = $dir == '.' ? $file : $dir . '/' . $file; 
            if (is_link($filepath)) 
                continue; 
            if (is_file($filepath)) 
                $files[] = $filepath; 
            else if (is_dir($filepath)) 
                listdiraux($filepath, $files); 
        } 
        closedir($handle); 
    } 
     
    $files = listdir('.'); 
    /*
    Desde aquí lo elabore yo.-
     
    From here I did.-
    */
     
    $files = array_unique($files, SORT_REGULAR);
    sort($files, SORT_LOCALE_STRING); 
     
    global $tipo;
    $tipo = array();
     
    /*
    Para cada uno de los archivo encontrados en los directorios 
    Busco referencias a los archivos del tipo png, jpg, php, y html
    dentro de los archivo de los archivos php y html.-
     
    For each of the directory file found in
    I am looking for references to files like png, jpg, php, and html
    file within php and html files.-
    */
     
    foreach ($files as $f) { 
     
    	if ( strpos($f, ".html") == true  or strpos($f, ".png") == true or strpos($f, ".jpg") == true 
    	or ( strpos($f, ".php") == true and strpos($f, ".php~") != true ) ) {
    		$tipo[] = $f;
    	}
    }
     
    $tipo = array_unique($tipo, SORT_REGULAR);
    sort($tipo, SORT_LOCALE_STRING); 
     
     
    	foreach ($files as $ff) {
    		if ( strpos($ff, ".html") == true  or (strpos($ff, ".php") == true and strpos($ff, ".php~") != true)
    		 or strpos($ff, ".css") == true or strpos($ff, ".js") == true ){
    		 	/*
    				Muestro en que archivo estoy realizando la búsqueda y de lo buscado que encuentro.-
     
    				I show that I am doing file search and the sought that meeting.-		 	
    		 	*/
    			echo "Viendo archivo -> ".$ff."<br>";
    			$found=false;
    			$lines = file($ff); 
    			foreach($lines as $line) { 
    				$line=str_replace('"','',$line); 
    				foreach ($tipo as $key => $t) { 
     
    					if ( strpos($f, "/") == true ) {
    						$pieces = explode("/", $f);
    						$f = array_pop($pieces);
    					}
    					$f=str_replace('"','',$f);
     
    					if ( strpos($line, $t) != false ) {
    	            	echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;".$t."  <- encontrado<br>";
    	            	unset($tipo[$key]);
    		         	$found=true;
          	      } 
    				}
     
    			}
     
     	  }	
    	}	
     
    echo "Resultado<p>";
    	/* Muestro el archivo no encontrado.-
    		We show file not found.-	*/
    	foreach ($tipo as $key => $tip) {
    		echo $key." => ".$tip."<br>";
    	}
    ?>
    Last edited by Mittineague; Sep 21, 2013 at 23:55. Reason: reformatting bbcode tags

  14. #14
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,242
    Mentioned
    155 Post(s)
    Tagged
    0 Thread(s)
    This is an old thread, that I don't see the need to re-open, closing it.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •