SitePoint Sponsor

User Tag List

Results 1 to 12 of 12

Thread: Detecting orphan files

  1. #1
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Detecting orphan files

    Does anyone know of a PHP script that will find orphan files on a web site? - i.e. files that are not referenced in any other files, nor in a database.

    I've just inherited a site that has not been maintained very well. There appear to be hundreds of files scattered about which are not referenced anywhere - I need to be sure that they are true orphans before I remove them.

    Regards
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  2. #2
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok - I guess such a script doesn't exist. Looks like I'll have to write one myself.

    If anyone is interested in the end result, let me know.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  3. #3
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,849
    Mentioned
    16 Post(s)
    Tagged
    1 Thread(s)
    Searching "PHP find orphaned files" turns up various solutions, some involve a good IDE, IDE addons and so on.

    If its html/js/image files then I think it would be easy enough, but when it comes to include files outside of the document root (which is what I originally imagined you meant), then I think you'd be pushed.

    For that you'd maybe want something which runs for a while which "touches" the included file somehow, not even sure if that would do-able, unless all include classes were brought in via an autoloader.

  4. #4
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks cups

    I have scoured the net for a script but failed miserably.

    I'm looking at attempting to code the following:

    1. Build a repository file containing all the contents of every text-based file on the site - I already have a function that will perform this, and it doesn't take too long, e.g. a site with 2000+ files takes around 10 seconds, creating a file of around 24mb. Depends on how big the files are, of course.

    2. Do a recursive scan of the site, listing all file names on the site and their paths.

    3. Then scan the repository file for each file name (not path) found in step 2. This obviously wouldn't be 100% as files with the same file name might be in various locations. However, it would be a reasonably good indication. I don't think I would be able to search for the path as links could contain relative or absolute paths and might be referenced from various locations. Stand to be corrected here.

    4. Produce a list of all the files in the web, indicating whether or not a reference to the file was found.

    5. If no reference is found in step 4, provide the ability to search an entire database for reference to that file name, if the site is db driven that is.

    The above would only cover internal references to the files and possibly include some false positives, but that would be enough for my purposes.

    If you or anyone can throw in same more ideas, it would be appreciated.

    Regards to all
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  5. #5
    SitePoint Wizard lorenw's Avatar
    Join Date
    Feb 2005
    Location
    was rainy Oregon now sunny Florida
    Posts
    1,022
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It may be time consuming but try opening Putty and run this from the command line.
    Code:
    find /var/www/  -name "*".php  -type f -print0  | xargs -0 grep oddfile.php | uniq -c  | sort -u  | cut -d":" -f1 | awk '{print "-rf " $2}' | uniq
    This will find in all of the .php files if there is a reference to oddfile.php be it an include or require.
    You may have some files that end in .inc so check them also.
    If nothing is found, you can mark the file as unused or delete it.
    What I lack in acuracy I make up for in misteaks

  6. #6
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the suggestion lorenw

    Unfortunately, putty is not an option. I have ftp access and a control panel - that's it. As you say, it would be kinda time consuming too.

    Regards
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  7. #7
    From space with love SitePoint Award Recipient SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    4,268
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    Is there any sort of CMS in use or is the site currently a collection of static pages?

    How many pages are there atm?

    Do you expect many pages to be added to the site in the future?
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  8. #8
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi SpacePhoenix

    The site belongs to a glazing company and it's running on joomla. There are hundreds, possibly thousands of image files in various directories. Not sure how they want to proceed yet, but my first task is to tidy up the damn thing.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  9. #9
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Happy new year by the way
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  10. #10
    From space with love SitePoint Award Recipient SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    4,268
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    Before you proceed any further you should make sure that the Joomla installation is up to date, the same goes for any plugins that may be in use. Also before attempting to identify what files are in use and what is an "orphan" file make a complete backup of all the files and of the complete database in case anything goes wrong
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  11. #11
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have limited experience with joomla, but it appears to be up to date, and everything is already backed up.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  12. #12
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    186
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    At last, I have a working prototype, with front-end up and running and it seems to do the trick.

    Anyone care to test it for me?

    find_orphans.zip

    Best put it in a password protected directory - form validation is minimal.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •